Message boards :
ATLAS application :
Bad WUs?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next
Author | Message |
---|---|
![]() Send message Joined: 28 Sep 04 Posts: 640 Credit: 40,177,268 RAC: 17,275 ![]() ![]() ![]() |
I have some also. Here's one: https://lhcathome.cern.ch/lhcathome/result.php?resultid=334410656 Probably more to come later. ![]() |
Send message Joined: 18 Dec 15 Posts: 1599 Credit: 78,073,227 RAC: 72,726 ![]() ![]() ![]() |
I have some also. Here's one: https://lhcathome.cern.ch/lhcathome/result.php?resultid=334410656 Probably more to come later.okay, I see. So most probably misonfigured WUs, and no reason for me to be afraid that something's wrong with my system. The bad thing thoug is that such a bad WU would run for hours and hours, overnight, ... thus blocking a slot for nothing. |
Send message Joined: 18 Dec 15 Posts: 1599 Credit: 78,073,227 RAC: 72,726 ![]() ![]() ![]() |
Over night, there were some more of these faulty tasks :-( |
Send message Joined: 2 May 07 Posts: 1830 Credit: 139,973,885 RAC: 126,538 ![]() ![]() ![]() |
What you can testing, Boinc have a upgrade only for Windows, from 7.16.11 to 7.16.20, if this faulty Atlas are involved. Have also upgrated Virtualbox to 6.1.30. Saw also some faulty before in Win10pro and Win11pro. |
Send message Joined: 18 Dec 15 Posts: 1599 Credit: 78,073,227 RAC: 72,726 ![]() ![]() ![]() |
What you can testing, Boinc have a upgrade only for Windows, from 7.16.11 to 7.16.20,okay, I could try this, thanks for the hint. What I am also seeing now are tasks that are faulty in a different way than before: they do use full CPU power (in contrast to the faulty tasks from before), but the VM console shows zero events processed, all the time long the task is running. |
![]() Send message Joined: 15 Jun 08 Posts: 2244 Credit: 199,058,553 RAC: 126,572 ![]() ![]() ![]() |
Looking at this example shows that the WU fails on Windows as well as on Linux (running native): Erich's task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=334527292 Corresponding WU: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=176712440 Checked a few other WUs and all failed on all computers they were sent to. Hence, it's more likely that its a faulty batch than a local issue. |
Send message Joined: 18 Dec 15 Posts: 1599 Credit: 78,073,227 RAC: 72,726 ![]() ![]() ![]() |
I keep getting WUs which fail with all kinds of error reasons. This one here now failed after 6 minutes: https://lhcathome.cern.ch/lhcathome/result.php?resultid=334540111 so something seems to go rather wrong with ATLAS presently. |
Send message Joined: 18 Dec 15 Posts: 1599 Credit: 78,073,227 RAC: 72,726 ![]() ![]() ![]() |
I am wondering that while it is known since this morning that there obviously is a batch with faulty WUs, no steps have been taken to get those removed or stopped. |
Send message Joined: 18 Dec 15 Posts: 1599 Credit: 78,073,227 RAC: 72,726 ![]() ![]() ![]() |
within the past few hours, a had several task in a row, where CPU usage was less than 1 minute, but the task was running forever.last night, I had another task with: 2021-12-06 23:23:04 (16356): Guest Log: 00:00:10.018997 timesync vgsvcTimeSyncWorker: Radical guest time change: -3 587 915 035 000ns (GuestNow=1 638 829 384 890 054 000 ns GuestLast=1 638 832 972 805 089 000 ns fSetTimeLastLoop=true ) CPU use was only 54 seconds, but the task did not stop automatically, but ran through all night thus wasting a slot for nothing. https://lhcathome.cern.ch/lhcathome/result.php?resultid=335188584 what kind of failure is this "Radical guest time change" thing - what's the cause for it? |
Send message Joined: 2 May 07 Posts: 1830 Credit: 139,973,885 RAC: 126,538 ![]() ![]() ![]() |
what kind of failure is this "Radical guest time change" thing - what's the cause for it? This is not essentiell. When one of hundreds Atlas-Tasks have this error, and run for several hours, we can live with it. Had also one last week. We can only watching it over the time, atm. |
Send message Joined: 18 Dec 15 Posts: 1599 Credit: 78,073,227 RAC: 72,726 ![]() ![]() ![]() |
unfortunately, the error rate is much higher than just 1 out of hundreds.what kind of failure is this "Radical guest time change" thing - what's the cause for it? I had one last night, I had the next one just now. |
Send message Joined: 2 May 07 Posts: 1830 Credit: 139,973,885 RAC: 126,538 ![]() ![]() ![]() |
One on Win11pro atm with 2 Cores and 13 hours runtime. The faulty-Counter for me is now TWO! Hoping, the last and only one for this week :-). |
Send message Joined: 18 Dec 15 Posts: 1599 Credit: 78,073,227 RAC: 72,726 ![]() ![]() ![]() |
I got the next one right now (so the third one today). what I notice is that with these faulty tasks, the vm_image.vdi is only about 2.5GB in size, as opposed to the others with about 3.3GB. Also, when trying to open the VM console, localhost login does not work. So these are the two characteristics of this kind of faulty task, and I now have no other choice than checking every once a new task starts :-( |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 452 Credit: 179,963,997 RAC: 72,464 ![]() ![]() ![]() |
|
Send message Joined: 18 Dec 15 Posts: 1599 Credit: 78,073,227 RAC: 72,726 ![]() ![]() ![]() |
just now, I got the next one. So this was the fourth one since last night. I am afraid more will follow :-( |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 452 Credit: 179,963,997 RAC: 72,464 ![]() ![]() ![]() |
|
Send message Joined: 2 May 07 Posts: 1830 Credit: 139,973,885 RAC: 126,538 ![]() ![]() ![]() |
Have stopped Atlas for Windows and have send a PM to David. |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 452 Credit: 179,963,997 RAC: 72,464 ![]() ![]() ![]() |
|
![]() ![]() Send message Joined: 2 Sep 04 Posts: 452 Credit: 179,963,997 RAC: 72,464 ![]() ![]() ![]() |
|
Send message Joined: 18 Dec 15 Posts: 1599 Credit: 78,073,227 RAC: 72,726 ![]() ![]() ![]() |
This Morning I had to to cancel more than 15 WUs hanging around.indeed, at this point it's a waste of time and ressources :-( I am wondering that no one is stopping these faulty tasks. |
©2023 CERN