1) Message boards : CMS Application : Upcoming WMAgent/CouchDB update - jobs will drain Sunday night (Message 46976)
Posted 4 Jul 2022 by ivan
Post:
That is always on my master log page every time and I look at all of them but they still run Valid .

Yes, it is not a problem apart from filling up the log file. One day i might convince someone to fix it...
2) Message boards : CMS Application : Upcoming WMAgent/CouchDB update - jobs will drain Sunday night (Message 46975)
Posted 4 Jul 2022 by ivan
Post:
CERN IT also wants to recreate the VM that we run the agent on, to move it to a new Hypervisor.


I don't understand. Are they abandon VirtualBox on boinc for a new hypervisor?

No, not the hypervisor, the VM image file (.vdi).
3) Message boards : CMS Application : Upcoming WMAgent/CouchDB update - jobs will drain Sunday night (Message 46915)
Posted 20 Jun 2022 by ivan
Post:
CERN IT also wants to recreate the VM that we run the agent on...

If the WMAgent server gets a new name and/or a new IP the network tests called by bootstrap-cms need to be revised.
When the old name can't be contacted any more all CMS tasks will fail.

Noted. I've no idea if that will happen, but thanks for the heads-up.
4) Message boards : CMS Application : Upcoming WMAgent/CouchDB update - jobs will drain Sunday night (Message 46904)
Posted 17 Jun 2022 by ivan
Post:
CERN IT also wants to recreate the VM that we run the agent on, to move it to a new Hypervisor. So the downtime may be a bit longer than I anticipated, but hopefully just a few hours.
5) Message boards : CMS Application : Upcoming WMAgent/CouchDB update - jobs will drain Sunday night (Message 46903)
Posted 17 Jun 2022 by ivan
Post:
CMS wants to upgrade WMAgent to bring in a new version of CouchDB. I've just submitted a new workflow which should start draining around midnight Sunday (European). Please be ready to set NoNewTasks late Sunday to avoid any problems. Hopefully we'll be up again by Monday night.
6) Message boards : CMS Application : Is there a Native version in the pipeline? (Message 46899)
Posted 16 Jun 2022 by ivan
Post:
You mean one of these?: https://www.amd.com/en/products/cpu/amd-epyc-7742

Dual CPU configuration 128t for each CPU gives 256 threads.
The RAM alone to run it would cost a fortune!

Something bigger than that, I think. I found an article discussing an upcoming AMD CPU that tiles several chips into one CPU package. 256 cores, 512 threads, 600W TDP -- water-cooling obligatory! The headline was the economy of 1.7 W per thread (which actually gets closer to 900 W in total), which is apparently better than most contemporary chips.
7) Message boards : CMS Application : Is there a Native version in the pipeline? (Message 46864)
Posted 10 Jun 2022 by ivan
Post:
That's it, straight question. Is there a native version in development/under consideration/even possible for CMS?

No, not at the moment anyway. We figured long ago that a VM was the easiest way to run on the most popular CPU on the most popular operating systems with the least hassle. Of course the technology has changed a lot since, I don't even know if it's possible to run emulated x64 VirtualBox on the latest Macs. I believe there are versions of CMSSW now for Arm, PP, and GPU machines as well as x64, but have no idea how easy it would be to set up native versions of these to run on BOINC. At the moment we are still in a development phase, but getting closer to real production running. Until that happens and we get a lot more users, it's not worth spending the time.
BTW, someone has got hold of an AMD engineering sample 256-core CPU! Take a look at the job graphs since 09/06.
8) Message boards : CMS Application : getting 'Error while computing' for CMS tasks (Message 46697)
Posted 28 Apr 2022 by ivan
Post:
Hmm, we're having some problems getting x509 certs on LHC@Home-dev, but I'd not seen it in production. I'll take a look.
9) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 46651)
Posted 19 Apr 2022 by ivan
Post:
We seem to have jobs running again: https://lhcathome.cern.ch/lhcathome/cms_job.php
10) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 46650)
Posted 19 Apr 2022 by ivan
Post:
I've just succeeded in injecting a workflow. Jobs should be available again soon.
11) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 46649)
Posted 17 Apr 2022 by ivan
Post:
https://cern.service-now.com/service-portal?id=ticket&table=incident&sys_id=b4c8b86787f24150eb3b33390cbb35a5&view=sp
12) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 46648)
Posted 17 Apr 2022 by ivan
Post:
Ah, it looks like it might be a certificate expiry problem:
    ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:618)
    2022-04-17 15:50:23,368:INFO:inject-test-wfs: TC_SLC7.json request FAILED injection!

13) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 46647)
Posted 17 Apr 2022 by ivan
Post:
I'm draining the job queue in anticipation of a WMAgent upgrade tomorrow. Jobs will probably start to run out around 1200GMT, so set No New Tasks late tonight or early in the morning.

The update is done, and I've injected a new workflow. With any luck new tasks/jobs will be available within the next hour or so.

...and... we're off again! Thanks everyone.


We believe in you and look forward to the opportunity to continue cooperation :-)

Thanks. Unfortunately there is a problem at CERN this (holiday) weekend, and I've not been able to inject a new batch of jobs. Unless something changes soon we will run out of work in about six hours. Sorry 'bout that, but I don't see anything I can do to change it. I'll try to raise a trouble ticket.
14) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 46626)
Posted 13 Apr 2022 by ivan
Post:
I'm draining the job queue in anticipation of a WMAgent upgrade tomorrow. Jobs will probably start to run out around 1200GMT, so set No New Tasks late tonight or early in the morning.

The update is done, and I've injected a new workflow. With any luck new tasks/jobs will be available within the next hour or so.

...and... we're off again! Thanks everyone.
15) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 46625)
Posted 13 Apr 2022 by ivan
Post:
I'm draining the job queue in anticipation of a WMAgent upgrade tomorrow. Jobs will probably start to run out around 1200GMT, so set No New Tasks late tonight or early in the morning.

The update is done, and I've injected a new workflow. With any luck new tasks/jobs will be available within the next hour or so.
16) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 46618)
Posted 12 Apr 2022 by ivan
Post:
I'm draining the job queue in anticipation of a WMAgent upgrade tomorrow. Jobs will probably start to run out around 1200GMT, so set No New Tasks late tonight or early in the morning.
17) Message boards : Number crunching : VM environment needed to be cleaned up. (Message 46527)
Posted 23 Mar 2022 by ivan
Post:
OK, that seems to have fixed it. Thanks again.
18) Message boards : Number crunching : VM environment needed to be cleaned up. (Message 46515)
Posted 22 Mar 2022 by ivan
Post:
Thanks! I'll try that tomorrow when my current task has finished.
19) Message boards : Number crunching : VM environment needed to be cleaned up. (Message 46512)
Posted 22 Mar 2022 by ivan
Post:
Hi; running two CMS tasks in a Windows 10 box. Unfortunately, it's a "managed" machine and I have very limited admin rights to it. Over the weekend, my organisation installed patches and the machine rebooted. I doubt that BOINC was shut down cleanly.
When I logged in on Monday, one VM restarted, got a CMS job, and closed the BOINC task when the job finished. The other got into some sort of state, with messages such as this:
    22/03/2022 15:07:35 | LHC@home | Task CMS_2534121_1647957079.423644_0 postponed for 86400 seconds: VM environment needed to be cleaned up.

I've tried several times to get the second task to run, including waiting for the running task to finish, closing BOINC and removing the BOINC VMs from the VBox manager before restarting. However, the second task always end up with the above message and shows as "Powered Off" in the VBox manager.
Any ideas as to what to try next? I'm considering detaching from LHC@Home and re-attaching after cleaning off the current CMS VM image.

20) Message boards : Number crunching : LHC VM tasks no longer working on all of my computers since yesterday (Message 46469)
Posted 18 Mar 2022 by ivan
Post:
For me it seems to be a (temporary) network issue to LHC.

Yes, we had a lot of CMS job failures between 2000 last night and 1000 this morning.


Next 20


©2022 CERN