1)
Message boards :
CMS Application :
Upcoming WMAgent/CouchDB update - jobs will drain Sunday night
(Message 46976)
Posted 4 Jul 2022 by ivan Post: That is always on my master log page every time and I look at all of them but they still run Valid . Yes, it is not a problem apart from filling up the log file. One day i might convince someone to fix it... |
2)
Message boards :
CMS Application :
Upcoming WMAgent/CouchDB update - jobs will drain Sunday night
(Message 46975)
Posted 4 Jul 2022 by ivan Post: CERN IT also wants to recreate the VM that we run the agent on, to move it to a new Hypervisor. No, not the hypervisor, the VM image file (.vdi). |
3)
Message boards :
CMS Application :
Upcoming WMAgent/CouchDB update - jobs will drain Sunday night
(Message 46915)
Posted 20 Jun 2022 by ivan Post: CERN IT also wants to recreate the VM that we run the agent on... Noted. I've no idea if that will happen, but thanks for the heads-up. |
4)
Message boards :
CMS Application :
Upcoming WMAgent/CouchDB update - jobs will drain Sunday night
(Message 46904)
Posted 17 Jun 2022 by ivan Post: CERN IT also wants to recreate the VM that we run the agent on, to move it to a new Hypervisor. So the downtime may be a bit longer than I anticipated, but hopefully just a few hours. |
5)
Message boards :
CMS Application :
Upcoming WMAgent/CouchDB update - jobs will drain Sunday night
(Message 46903)
Posted 17 Jun 2022 by ivan Post: CMS wants to upgrade WMAgent to bring in a new version of CouchDB. I've just submitted a new workflow which should start draining around midnight Sunday (European). Please be ready to set NoNewTasks late Sunday to avoid any problems. Hopefully we'll be up again by Monday night. |
6)
Message boards :
CMS Application :
Is there a Native version in the pipeline?
(Message 46899)
Posted 16 Jun 2022 by ivan Post: You mean one of these?: https://www.amd.com/en/products/cpu/amd-epyc-7742 Something bigger than that, I think. I found an article discussing an upcoming AMD CPU that tiles several chips into one CPU package. 256 cores, 512 threads, 600W TDP -- water-cooling obligatory! The headline was the economy of 1.7 W per thread (which actually gets closer to 900 W in total), which is apparently better than most contemporary chips. |
7)
Message boards :
CMS Application :
Is there a Native version in the pipeline?
(Message 46864)
Posted 10 Jun 2022 by ivan Post: That's it, straight question. Is there a native version in development/under consideration/even possible for CMS? No, not at the moment anyway. We figured long ago that a VM was the easiest way to run on the most popular CPU on the most popular operating systems with the least hassle. Of course the technology has changed a lot since, I don't even know if it's possible to run emulated x64 VirtualBox on the latest Macs. I believe there are versions of CMSSW now for Arm, PP, and GPU machines as well as x64, but have no idea how easy it would be to set up native versions of these to run on BOINC. At the moment we are still in a development phase, but getting closer to real production running. Until that happens and we get a lot more users, it's not worth spending the time. BTW, someone has got hold of an AMD engineering sample 256-core CPU! Take a look at the job graphs since 09/06. |
8)
Message boards :
CMS Application :
getting 'Error while computing' for CMS tasks
(Message 46697)
Posted 28 Apr 2022 by ivan Post: Hmm, we're having some problems getting x509 certs on LHC@Home-dev, but I'd not seen it in production. I'll take a look. |
9)
Message boards :
CMS Application :
EXIT_NO_SUB_TASKS
(Message 46651)
Posted 19 Apr 2022 by ivan Post: We seem to have jobs running again: https://lhcathome.cern.ch/lhcathome/cms_job.php |
10)
Message boards :
CMS Application :
EXIT_NO_SUB_TASKS
(Message 46650)
Posted 19 Apr 2022 by ivan Post: I've just succeeded in injecting a workflow. Jobs should be available again soon. |
11)
Message boards :
CMS Application :
EXIT_NO_SUB_TASKS
(Message 46649)
Posted 17 Apr 2022 by ivan Post: https://cern.service-now.com/service-portal?id=ticket&table=incident&sys_id=b4c8b86787f24150eb3b33390cbb35a5&view=sp |
12)
Message boards :
CMS Application :
EXIT_NO_SUB_TASKS
(Message 46648)
Posted 17 Apr 2022 by ivan Post: Ah, it looks like it might be a certificate expiry problem:
2022-04-17 15:50:23,368:INFO:inject-test-wfs: TC_SLC7.json request FAILED injection!
|
13)
Message boards :
CMS Application :
EXIT_NO_SUB_TASKS
(Message 46647)
Posted 17 Apr 2022 by ivan Post: I'm draining the job queue in anticipation of a WMAgent upgrade tomorrow. Jobs will probably start to run out around 1200GMT, so set No New Tasks late tonight or early in the morning. Thanks. Unfortunately there is a problem at CERN this (holiday) weekend, and I've not been able to inject a new batch of jobs. Unless something changes soon we will run out of work in about six hours. Sorry 'bout that, but I don't see anything I can do to change it. I'll try to raise a trouble ticket. |
14)
Message boards :
CMS Application :
EXIT_NO_SUB_TASKS
(Message 46626)
Posted 13 Apr 2022 by ivan Post: I'm draining the job queue in anticipation of a WMAgent upgrade tomorrow. Jobs will probably start to run out around 1200GMT, so set No New Tasks late tonight or early in the morning. ...and... we're off again! Thanks everyone. |
15)
Message boards :
CMS Application :
EXIT_NO_SUB_TASKS
(Message 46625)
Posted 13 Apr 2022 by ivan Post: I'm draining the job queue in anticipation of a WMAgent upgrade tomorrow. Jobs will probably start to run out around 1200GMT, so set No New Tasks late tonight or early in the morning. The update is done, and I've injected a new workflow. With any luck new tasks/jobs will be available within the next hour or so. |
16)
Message boards :
CMS Application :
EXIT_NO_SUB_TASKS
(Message 46618)
Posted 12 Apr 2022 by ivan Post: I'm draining the job queue in anticipation of a WMAgent upgrade tomorrow. Jobs will probably start to run out around 1200GMT, so set No New Tasks late tonight or early in the morning. |
17)
Message boards :
Number crunching :
VM environment needed to be cleaned up.
(Message 46527)
Posted 23 Mar 2022 by ivan Post: OK, that seems to have fixed it. Thanks again. |
18)
Message boards :
Number crunching :
VM environment needed to be cleaned up.
(Message 46515)
Posted 22 Mar 2022 by ivan Post: Thanks! I'll try that tomorrow when my current task has finished. |
19)
Message boards :
Number crunching :
VM environment needed to be cleaned up.
(Message 46512)
Posted 22 Mar 2022 by ivan Post: Hi; running two CMS tasks in a Windows 10 box. Unfortunately, it's a "managed" machine and I have very limited admin rights to it. Over the weekend, my organisation installed patches and the machine rebooted. I doubt that BOINC was shut down cleanly. When I logged in on Monday, one VM restarted, got a CMS job, and closed the BOINC task when the job finished. The other got into some sort of state, with messages such as this:
I've tried several times to get the second task to run, including waiting for the running task to finish, closing BOINC and removing the BOINC VMs from the VBox manager before restarting. However, the second task always end up with the above message and shows as "Powered Off" in the VBox manager. |
20)
Message boards :
Number crunching :
LHC VM tasks no longer working on all of my computers since yesterday
(Message 46469)
Posted 18 Mar 2022 by ivan Post: For me it seems to be a (temporary) network issue to LHC. Yes, we had a lot of CMS job failures between 2000 last night and 1000 this morning. |
©2022 CERN