Message boards :
CMS Application :
CMS computation error
Message board moderation
Author | Message |
---|---|
Send message Joined: 26 Mar 16 Posts: 30 Credit: 1,258,609 RAC: 0 ![]() ![]() |
I'm experiencing CMS failures/dropouts on three of my PCs after about 11 minutes run-time. Theory Simulation is doing fine (so far - 4:30 hours of crunching). QUESTION: Which Version of VBox should be used? On the download page of LHC the release to be used is stated as V5.0.18 !! I also read (don't remeber where) NOT to use a newer version. Whereas reading in the message boards it says to use at least V5.1 !! ANSWER: ?? |
![]() Send message Joined: 29 Aug 05 Posts: 956 Credit: 6,231,115 RAC: 4 ![]() |
The CMS job queue was allowed to drain for a server upgrade, so your tasks weren't finding any jobs to run. See the News item from last week. The upgrade is over now, and I've just submitted a new batch of jobs. They should be available soon. As far as I'm aware, the latest version of VirtualBox is now OK to use. There were some problems a while back but I believe that's all been sorted now. ![]() |
Send message Joined: 14 Jan 10 Posts: 1178 Credit: 7,524,384 RAC: 3,684 ![]() ![]() ![]() |
Your CMS errors are because of the project has drained the well of CMS-jobs (something different to BOINC CMS-tasks) for maintenance. Read from the News section -> https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4124 On CERN's Join Us! webpage the install VirtualBox is directing to VirtualBox Downloads, where you'll find the newest version. In Berkeley's BOINC package an old VirtualBox is included. |
![]() Send message Joined: 29 Aug 05 Posts: 956 Credit: 6,231,115 RAC: 4 ![]() |
The CMS job queue was allowed to drain for a server upgrade, so your tasks weren't finding any jobs to run. See the News item from last week. The upgrade is over now, and I've just submitted a new batch of jobs. They should be available soon. Hmm, there are jobs in the queue but my VMs aren't downloading any. I've pinged Laurence. ![]() |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 ![]() ![]() |
I just picked up one a couple of hours ago, and it is sitting in my buffer. https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=57985030 |
Send message Joined: 14 Jan 10 Posts: 1178 Credit: 7,524,384 RAC: 3,684 ![]() ![]() ![]() |
I just picked up one a couple of hours ago, and it is sitting in my buffer. Getting a BOINC CMS-task is not the same as getting a job into your running CMS-VM. |
Send message Joined: 27 Sep 08 Posts: 777 Credit: 603,932,510 RAC: 327,372 ![]() ![]() ![]() |
My feedback would be to drain boinc queues if possible? It's less confusing for users |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 ![]() ![]() |
Getting a BOINC CMS-task is not the same as getting a job into your running CMS-VM. Yes, I see. I normally don't bother to install the extension pack to check under Linux, but they are ending after 13 minutes, so no go. |
![]() Send message Joined: 29 Aug 05 Posts: 956 Credit: 6,231,115 RAC: 4 ![]() |
Getting a BOINC CMS-task is not the same as getting a job into your running CMS-VM. Yes, sorry, we have 700 jobs in the queue and another 300 created, but they are not being sent out. I don't think there's anything else I can do from here but wait for some response from the CERN crew. ![]() |
Send message Joined: 14 Jan 10 Posts: 1178 Credit: 7,524,384 RAC: 3,684 ![]() ![]() ![]() |
... we have 700 jobs in the queue and another 300 created, but they are not being sent out. I don't think there's anything else I can do from here but wait for some response from the CERN crew. Finally got one running: wmagent_ireid_MonteCarlo_eff_IDR_CMS_Home_170220_154632_5171/b8e42212-f825-11e6-b3b7-02163e018309-512_0 , but I see after the first 10 tries were aborted the other 690 are cancelled. Do you have green light for the 300? |
![]() Send message Joined: 29 Aug 05 Posts: 956 Credit: 6,231,115 RAC: 4 ![]() |
... we have 700 jobs in the queue and another 300 created, but they are not being sent out. I don't think there's anything else I can do from here but wait for some response from the CERN crew. We had two problems caused by changes to the server. I aborted one batch as it was "doomed to fail" anyway and we've started on another batch that I'd also submitted yesterday. Some of our monitors are not showing jobs properly yet (Dashboard, of course...) but we have a queue maintained at 700 jobs and are up to over 90 running jobs as far as I can see. I'm running a total of 14 jobs on my machines -- should be more but the scheduler on the 12-core machine isn't asking for more than 6 jobs; "Not needed". ![]() |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 452 Credit: 180,667,069 RAC: 108,073 ![]() ![]() ![]() |
I'm running a total of 14 jobs on my machines -- should be more but the scheduler on the 12-core machine isn't asking for more than 6 jobs; "Not needed". Try to increase the BOINC-Queue: My Preferences show up in German: Speichere mindestens (Tage) Speichere zusätzlich für weitere (Tage) ![]() Supporting BOINC, a great concept ! |
![]() Send message Joined: 29 Aug 05 Posts: 956 Credit: 6,231,115 RAC: 4 ![]() |
|
Send message Joined: 2 May 07 Posts: 1840 Credit: 140,316,752 RAC: 111,278 ![]() ![]() ![]() |
This CMS-Task ended with Error after 12 hours: https://lhcathome.cern.ch/lhcathome/result.php?resultid=124403989 Will test it with Virtualbox 5.1.16 again. |
Send message Joined: 2 May 07 Posts: 1840 Credit: 140,316,752 RAC: 111,278 ![]() ![]() ![]() |
This CMS-Task show this message: WU not found and is not deleted from Server. https://lhcathome.cern.ch/lhcathome/result.php?resultid=176726472 Thank you for help. |
Send message Joined: 2 May 07 Posts: 1840 Credit: 140,316,752 RAC: 111,278 ![]() ![]() ![]() |
Nils told us in News Forum, this tasks where finished now. Thank you! |
Send message Joined: 18 Dec 15 Posts: 1604 Credit: 78,433,131 RAC: 79,103 ![]() ![]() ![]() |
For the first time, a CMS tasks errored out, after 6+ hours, with 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED what does this mean? Here is the Stderr: https://lhcathome.cern.ch/lhcathome/result.php?resultid=179062024 |
![]() Send message Joined: 28 Sep 04 Posts: 643 Credit: 40,212,069 RAC: 14,582 ![]() ![]() ![]() |
For the first time, a CMS tasks errored out, after 6+ hours, with Usually this means that the project has set the disk limit parameter too low for this task. It also can be that the task actually has a fault. ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 956 Credit: 6,231,115 RAC: 4 ![]() |
If you haven't done so already, check your disk usage in boincmgr, and adjust parameters in Options -> Computing Preferences... -> Disk and memory if necessary.For the first time, a CMS tasks errored out, after 6+ hours, withUsually this means that the project has set the disk limit parameter too low for this task. It also can be that the task actually has a fault. ![]() |
Send message Joined: 18 Dec 15 Posts: 1604 Credit: 78,433,131 RAC: 79,103 ![]() ![]() ![]() |
If you haven't done so already, check your disk usage in boincmgr, and adjust parameters in Options -> Computing Preferences... -> Disk and memory if necessary.this was first thing I checked; although I would have been surprised if that is the reason. Disk and memory usage are set to almost the maximum available So I'll wait and see whether this problem comes up once more.. |
©2023 CERN