Message boards :
CMS Application :
Problems connecting to servers?
Message board moderation
Previous · 1 · 2 · 3
Author | Message |
---|---|
![]() ![]() Send message Joined: 9 Feb 08 Posts: 55 Credit: 1,523,653 RAC: 3,260 ![]() ![]() ![]() |
The CMS multithread tasks are using just 1 CPU at the moment. The sysfolk must be sending out test jobs. <stderr_txt> 2024-11-28 10:14:06 (15635): vboxwrapper version 26207 ... 2024-11-28 10:15:35 (15635): Guest Log: [INFO] CMS application starting. Check log files. 2024-11-28 10:42:29 (15635): Guest Log: [INFO] glidein exited with return value 0. 2024-11-28 10:42:30 (15635): Guest Log: [INFO] Shutting Down. 2024-11-28 10:42:30 (15635): VM Completion File Detected. 2024-11-28 10:42:30 (15635): VM Completion Message: glidein exited with return value 0. ... </stderr_txt>The jobs are all completing and being verified successfully. But note the timestamps - the jobs take a minute and a half to initialise, then 28 minutes to complete |
![]() Send message Joined: 28 Sep 04 Posts: 744 Credit: 51,964,501 RAC: 31,375 ![]() ![]() ![]() |
This probably is not any testing. There just isn't any actual jobs available to run. Only Boinc tasks which just starts the virtual machine that cannot get any jobs. You can see this on the site menu Jobs -> CMS jobs -> Running Jobs (you can use the Cern SSO to login, for example using your Google account). There hasn't been any jobs for over a week now. ![]() |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 55 Credit: 1,523,653 RAC: 3,260 ![]() ![]() ![]() |
Because CMS appears to be using only 1 CPU I tried adjusting my app_config.xml to use 1 CPU for CMS jobs. They all failed! CMS multithread jobs need 4 CPUs (minimum). It "looked" like it was working... But all have since failed with this logged in stderr - 2024-11-28 11:15:41 (18379): Guest Log: [INFO] CMS application starting. Check log files. 2024-11-28 11:27:55 (18379): Guest Log: [ERROR] VM expects at least 4 CPUs but reports only 1.Changed it back to 4 CPUs & threads. All OK now! But this is a waste of compute units (CPUs). Three CPUs are doing exactly nothing. |
![]() Send message Joined: 15 Jun 08 Posts: 2605 Credit: 262,148,552 RAC: 133,342 ![]() ![]() |
Best would be to set CMS to NNT until the issues are solved. Never set "<avg_ncpus>", "--nthreads" lower than 4. That value is tested by the scientific app inside the VM and VMs configured to use less cores will forcefully fail by intention. As for the glidein return value of 0. This results in a BOINC success although the scientific output is missing. The reason for this is that there are uncountable error reasons in deeper levels and most of them are by intention not forwarded to the BOINC level. |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 55 Credit: 1,523,653 RAC: 3,260 ![]() ![]() ![]() |
Yes. Thanks. As noted - that failed utterly! Yikes! |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 55 Credit: 1,523,653 RAC: 3,260 ![]() ![]() ![]() |
OK thanks, Harri. These "empty" CMS jobs still use 4 of my CPUs... I'll stop pulling CMS tasks until work is available. ![]() |
Send message Joined: 18 Dec 15 Posts: 1841 Credit: 126,292,186 RAC: 124,435 ![]() ![]() ![]() |
I am aware that I am repeating myself: but I keep wondering why CMS tasks are being sent out as long as not jobs are available ... :-( |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 55 Credit: 1,523,653 RAC: 3,260 ![]() ![]() ![]() |
They must be dry-running the servers. It would be nice if they kept that local. |
Send message Joined: 27 Apr 24 Posts: 13 Credit: 1,065,859 RAC: 1,258 ![]() ![]() ![]() |
I am getting credit for nearly all of my "failed" work units. https://lhcathome.cern.ch/lhcathome/results.php?userid=1191237 |
Send message Joined: 18 Dec 15 Posts: 1841 Credit: 126,292,186 RAC: 124,435 ![]() ![]() ![]() |
I am getting credit for nearly all of my "failed" work units.clicking on your link shows "access denied". As mentioned somewhere above, for some (strange) reason credits are warranted for this kind of tasks, but they are of no value to the science. Meanwhile a pretty high number of these faulty tasks must have been processed and sent back, without the recepient noticing that something is wrong |
Send message Joined: 8 Apr 06 Posts: 7 Credit: 248,210 RAC: 1 ![]() ![]() |
This is a local VirtualBox issue. found: drwx------ 4 boinc_master wheel 128B 27 Nov 09:07 .vbox-boinc_project-ipc drwx------ 4 geryoei wheel 128B 27 Nov 22:12 .vbox-geryoei-ipc drwx------ 4 root wheel 128B 27 Nov 09:50 .vbox-root-ipc sudo rm -Rf .vbox-* Does the job, thank you for your help! Géry |
Send message Joined: 27 Apr 24 Posts: 13 Credit: 1,065,859 RAC: 1,258 ![]() ![]() ![]() |
clicking on your link shows "access denied". My computers are not hidden. You can click on my username to reveal them. |
©2025 CERN