61) Message boards : Number crunching : dev-site certificate problem (Message 49187)
Posted 14 Jan 2024 by Erich56
Post:
The server certificate for lhcathomedev.cern.ch expired last night and needs to be replaced by a fresh one.
an I thought that it happens only at GPUGRID that licenses/certificates are not being renewed in time before they expire :-)
62) Message boards : CMS Application : no new WUs available (Message 49164)
Posted 10 Jan 2024 by Erich56
Post:
now, no longer any new tasks available
63) Message boards : CMS Application : no new WUs available (Message 49079)
Posted 29 Dec 2023 by Erich56
Post:
no tasks available; this time, the automatic stop mechanism for submitting tasks if no subtasks are available seemed to work well.
some time later, new tasks could be downloaded.

Today, again no new tasks ... :-(
64) Message boards : CMS Application : no new WUs available (Message 49074)
Posted 27 Dec 2023 by Erich56
Post:
no tasks available; this time, the automatic stop mechanism for submitting tasks if no subtasks are available seemed to work well.
65) Message boards : CMS Application : Multithreading/Multicore? (Message 49052)
Posted 19 Dec 2023 by Erich56
Post:
I have no problem with suspending CMS units. My desktop is usually up 24/7 and I only occasionally reboot due to some update that requires it
I assume by doing this, your tasks are being suspended only for short time.
As computezrmle wrote above, suspending up to2 hours should not be a problem anyway.
66) Message boards : CMS Application : no new WUs available (Message 49039)
Posted 14 Dec 2023 by Erich56
Post:
Looks like the backend queue again doesn't send CMS subtasks.
But the project server doesn't notice it and continues generating empty envelope tasks.
there was the same problem last week -
Ivan, could you please look into this, so that once no subtasks are available, the generation of empty envelope tasks is stopped. This worked well some time ago, so this mechanism obviously got broken at some point of time, and was not repaired so far.
67) Message boards : CMS Application : no new WUs available (Message 48997)
Posted 8 Dec 2023 by Erich56
Post:
I have re-started CMS on some of my machines - everything seems to work fine.
What seems to me is that the new series ("CMS_141....) consumes less memory.
68) Message boards : CMS Application : no new WUs available (Message 48989)
Posted 7 Dec 2023 by Erich56
Post:
Sorry, there have been some disruptions that we can't control. At the moment I have several workflows stalled in the Agent for reasons that I have yet to ascertain.
hello Ivan, nice to see you back :-) Hope you are fully okay now, healthwise!
Thanks for your efforts to make CMS run again (CMS is definitely my favorite subproject)!
69) Message boards : CMS Application : no new WUs available (Message 48985)
Posted 6 Dec 2023 by Erich56
Post:
since this afternoon, there are no jobs being provided for the tasks which can still be downloaded. Obviously, the automatic stop of task delivery in case of no jobs available does not work :-(
70) Message boards : ATLAS application : queue is empty (Message 48951)
Posted 20 Nov 2023 by Erich56
Post:
We have only 2 new Tasks.
well, it's 2.228 at this point of time :-)
71) Message boards : ATLAS application : queue is empty (Message 48857)
Posted 30 Oct 2023 by Erich56
Post:
I am getting some tasks with about 690MB and some with 1,44GB
72) Message boards : CMS Application : some tasks failing after about 20 minutes with heartbeat error (Message 48848)
Posted 29 Oct 2023 by Erich56
Post:
Crystal Pellet wrote:

...
I surely did not set the Windows flag to use UTC on that machine.
Maybe computezrmle can recall to me the register entry needed.
I, too, would be badly interested in this information :-)
73) Message boards : CMS Application : some tasks failing after about 20 minutes with heartbeat error (Message 48844)
Posted 29 Oct 2023 by Erich56
Post:

Both errored out and reported at 29 Oct 2023, 1:10:55 UTC
this was about the time all my tasks failed:
...
reported 29 Oct 2023, 1:04:17 UTC
...
so seemingly I was not the only one with this kind of problem.
74) Message boards : CMS Application : some tasks failing after about 20 minutes with heartbeat error (Message 48841)
Posted 29 Oct 2023 by Erich56
Post:
this morning, when looking up my tasks list, I noticed that there were a lot of failing tasks on all of my computers - in all cases they broke at about the time of the shift back from summer time to "normal" time.
I remember that this happened before with some other BOINC projects, like GPUGRID, but I don't think it ever happened with LHC@home - however, I am not sure.

Did someone else experience the same thing last night?
If yes, I can preclude any network issues like I seemed to have last Friday. If not, I might have a problem with my network.
75) Message boards : CMS Application : some tasks failing after about 20 minutes with heartbeat error (Message 48834)
Posted 28 Oct 2023 by Erich56
Post:
Looks like you have network issues, either inside your LAN or to your ISP.
Thanks, computezrmle, for the thorough analysis of my problem.
Most probably, it was caused by numerous short internet outages from side of my ISP which I didn't catch right away. Since late afternoon of yesterday, everything seems to be right again. So I'll keep my fingers crossed :-)
76) Message boards : CMS Application : some tasks failing after about 20 minutes with heartbeat error (Message 48828)
Posted 27 Oct 2023 by Erich56
Post:
No faulty batch! ...
so what else could then be the reason for all the tasks failing on various machines at various times after about 20 minutes ?
77) Message boards : CMS Application : some tasks failing after about 20 minutes with heartbeat error (Message 48826)
Posted 27 Oct 2023 by Erich56
Post:
in fact, the title of this posting should read "ALL tasks failing after about 20 minutes ..."
Which means that all tasks which were downloaded by all of my computers since late morning failed.
Faulty batch? Does no one else make the same experience ?
78) Message boards : CMS Application : some tasks failing after about 20 minutes with heartbeat error (Message 48825)
Posted 27 Oct 2023 by Erich56
Post:
within the past few hours, I am experiencing failing tasks after about 20 minutes from start, on different computers.
Stderr says:

2023-10-27 12:49:25 (31400): VM Heartbeat file specified, but missing.
2023-10-27 12:49:25 (31400): VM Heartbeat file specified, but missing file system status. (errno = '2')
2023-10-27 12:49:25 (31400): Powering off VM.

Examples see here:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=400934683
https://lhcathome.cern.ch/lhcathome/result.php?resultid=400934354

what's going wrong?
79) Message boards : CMS Application : No monitoring with ALT-F2 (Message 48789)
Posted 17 Oct 2023 by Erich56
Post:
Further, I notice that the new series of CMS tasks yields a lot less credit points compared to before.
80) Message boards : CMS Application : No monitoring with ALT-F2 (Message 48786)
Posted 16 Oct 2023 by Erich56
Post:
I noticed the same thing a few minutes ago.
So the question is whether these tasks are still okay and will yield scientific results, or are they faulty ???


Previous 20 · Next 20


©2024 CERN