Message boards :
ATLAS application :
Bad WUs?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next
Author | Message |
---|---|
Send message Joined: 14 Jan 10 Posts: 1411 Credit: 9,433,926 RAC: 11,615 |
https://lhcathome.cern.ch/lhcathome/results.php?userid=75468&offset=0&show_names=0&state=5&appid=No access to that link for users not logged in as 'maeax' Maybe you mean these results: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10618519&offset=0&show_names=0&state=5&appid= |
Send message Joined: 18 Dec 15 Posts: 1785 Credit: 117,278,447 RAC: 71,589 |
tasks error out after about 10 minutes, like this: https://lhcathome.cern.ch/lhcathome/result.php?resultid=331216653 What's the problem? |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,797,545 RAC: 18,369 |
https://lhcathome.cern.ch/lhcathome/results.php?userid=75468&offset=0&show_names=0&state=5&appid=No access to that link for users not logged in as 'maeax' Sorry, morning, morning.. Yes. Erich56 have the same issue! |
Send message Joined: 14 Jan 10 Posts: 1411 Credit: 9,433,926 RAC: 11,615 |
Yes. Erich56 have the same issue!Me too on Windows: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10690380&offset=0&show_names=0&state=5&appid=14 |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,797,545 RAC: 18,369 |
[2021-10-26 08:27:39] "exeErrorDiag": "CVMFS DBRelease setup file /cvmfs/atlas.cern.ch/repo/sw/database/DBRelease/current/setup.py was not readable", Normaly 5k but now 26058 unsend Atlas-Tasks! |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,913,483 RAC: 128,200 |
Meanwhile mine are also affected. Sent a mail to David Cameron. It looks like a link to the directory /cvmfs/atlas.cern.ch/repo/sw/database/DBRelease/current/ is missing on the CVMFS repository. That link should point to the most recent DBRelease directory. <edit> Better: A link pointing from /cvmfs/atlas.cern.ch/repo/sw/database/DBRelease/current/ to the most recent DBRelease directory. </edit> |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,797,545 RAC: 18,369 |
Normaly 5k but now 26058 unsend Atlas-Tasks! |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,913,483 RAC: 128,200 |
And a corresponding high number of ATLAS tasks in progress, which may be the reason why the ATLAS download speed dropped to poor 25% of the usual speed this morning. |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,797,545 RAC: 18,369 |
We are poor User of Tier3. Maybe the traffic is going into T0-T2 now 33k Atlas-tasks in use. |
Send message Joined: 22 Mar 17 Posts: 60 Credit: 13,864,571 RAC: 28,529 |
1.1GB and 250MB downloads just for them to be invalid in 3 min. The download speed is slow as people are constantly downloading huge files. I had a PC with plenty of work last night now waiting on downloads as they were completing before more work could be downloaded. |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,797,545 RAC: 18,369 |
Stopping download of Atlas is atm the best option, until there is an answer from Cern-IT. |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,797,545 RAC: 18,369 |
2021-10-26 12:09:08 (22368): Guest Log: No HITS file was produced 2021-10-26 12:09:08 (22368): Guest Log: Successfully finished the ATLAS job! https://lhcathome.cern.ch/lhcathome/result.php?resultid=331199612 Completed Atlas-tasks don't generate a Hits-File! |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
There was a clean up this morning of some “legacy†files on cvmfs, and it turns out those were not legacy at all but used by most atlas tasks. This has just been rolled back but it may take a little while to propagate to cvmfs clients. Sorry for this unforeseen mess. |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,913,483 RAC: 128,200 |
At least the missing link is back on CVMFS. Just got an ATLAS task that started fine. |
Send message Joined: 18 Dec 15 Posts: 1785 Credit: 117,278,447 RAC: 71,589 |
Maeax wrote this morning: Normaly 5k but now 26058 unsend Atlas-Tasks!now, in the afternoon, the project status page shows 45.753 unsent tasks |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,797,545 RAC: 18,369 |
We as Tier3 are too small to do so many work. The Scheduler can handle hundredthousands of sixtracks, so AgileBoincers or MPI and other Instituts are doing this in the next time. https://lhcathome.cern.ch/lhcathome/top_users.php |
Send message Joined: 14 Jan 10 Posts: 1411 Credit: 9,433,926 RAC: 11,615 |
Maeax wrote this morning:Normaly 5k but now 26058 unsend Atlas-Tasks!now, in the afternoon, the project status page shows 45.753 unsent tasks I suppose most of those unsent tasks are resends because of the initial validate errors. |
Send message Joined: 18 Dec 15 Posts: 1785 Credit: 117,278,447 RAC: 71,589 |
in the recent past, I received WUs which maybe were misconfigured, like this one: https://lhcathome.cern.ch/lhcathome/result.php?resultid=331283437 As seen, the CPU was utilized only for 1,5 minutes, and for the remaining time the WU ran "idle". Unfortunately, I did not notice it immediately, but only after the BOINC manager was showing an unusual long runtime. A check with the Windows task manager showed that there was CPU usage only for 3 instead of 4 WUs (I run 4 WUs 3 cores ea. concurrently). Further, the VM console could not be opened; however, the VM_image.vdi was still in the slot directory. Hence, I aborted the WU manually via the BOINC manager. It is too bad that in such a case the WU does not stop automatically (same problem, BTW, exists with faulty Theory WUs - they would continue running forever if one does not notice in time that something is wrong). |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,797,545 RAC: 18,369 |
This was the reason for me to stop Atlas for Windows in the past. Sometime the start stocks and noone know why the idle-phase isn't stopped. Native-VM haven't this problem. Yes, there is some watching for Atlas, but also for Theory (Sherpa for example). Edit: vboxwrapper for Atlas: 2021-10-28 05:25:09 (8928): Detected: vboxwrapper 26197 vboxwrapper for CMS: 2021-10-24 20:58:14 (7992): Detected: vboxwrapper 26202 CMS-vboxwrapper was changed from Laurence in the last weeks. Don't know if it is helpfull. |
Send message Joined: 18 Dec 15 Posts: 1785 Credit: 117,278,447 RAC: 71,589 |
within the past few hours, a had several task in a row, where CPU usage was less than 1 minute, but the task was running forever. See here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=334449376 if I interpret the stderr correctly, the problem was 2021-11-25 14:19:01 (27532): Guest Log: 00:00:10.004806 timesync vgsvcTimeSyncWorker: Radical guest time change: -3 587 877 475 000ns (GuestNow=1 637 846 340 764 851 000 ns GuestLast=1 637 849 928 642 326 000 ns fSetTimeLastLoop=true ) has anyone else made same experience? |
©2024 CERN