Message boards :
CMS Application :
1 (0x00000001) Unknown error code
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 15 Jun 08 Posts: 2150 Credit: 176,143,102 RAC: 111,326 ![]() ![]() ![]() |
Most of my CMS tasks fail with "1 (0x00000001) Unknown error code". Grafana shows a failure rate of more than 90% since this afternoon. Should be investigated. https://monit-grafana.cern.ch/d/000000628/cms-job-monitoring?orgId=11&from=now-6h&to=now-12m&refresh=15m&var-group_by=CMS_JobType&var-Tier=All&var-Site=T3_CH_Volunteer&var-Type=All&var-CMS_JobType=All&var-CMSPrimaryDataTier=All&var-binning=12m&var-measurement=condor_12m&var-retention_policy=sample https://monit-grafana.cern.ch/d/000000628/cms-job-monitoring?orgId=11&from=now-6h&to=now-12m&refresh=15m&var-group_by=CMS_JobType&var-Tier=All&var-Site=T3_CH_Volunteer&var-Type=All&var-CMS_JobType=All&var-CMSPrimaryDataTier=All&var-binning=12m&var-measurement=condor_12m&var-retention_policy=sample&fullscreen&panelId=81 |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,108,853 RAC: 1,057 ![]() |
Yes, I just saw that. Unfortunately I'm at a meeting several hours north of London this week with only a very old and slow netbook to access our web pages. I didn't get around to checking the queues this morning and they drained sooner than I expected. I've submitted a new batch of jobs which should last until I'm home again. As well, a WMAgent module has failed this morning -- something to do with DB polling so I don't think that affects the job queues per se (although, the batch I just sent hasn't shown up in the monitor yet). I've sent an e-mail to the CERN crew. ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,108,853 RAC: 1,057 ![]() |
|
![]() Send message Joined: 15 Jun 08 Posts: 2150 Credit: 176,143,102 RAC: 111,326 ![]() ![]() ![]() |
Since 6:48 UTC there's an increasing number of failed tasks. Looks like we are out of jobs. |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,108,853 RAC: 1,057 ![]() |
Since 6:48 UTC there's an increasing number of failed tasks. No. What happened is that yesterday we identified why the "new" CentOS7 jobs were failing. This morning Laurence made a change to our condor config to let them run. I had a batch of 1000 jobs with ~750 still queued; the change let them run but most of them immediately failed because they had been queued too long. However, seven did run! It looks like they created ~6 MB of output in 15 minutes of wall-clock time. Both Federica and I had already submitted new workflows before I tracked down the relevant log files. I've now submitted another batch requesting four times the number of events. So, you'll probably see a number of short jobs later in the weekend once these new jobs make it into the queue, followed by some longer jobs (there are still nearly 5,000 "old" jobs pending). I'll monitor as best I can over the weekend and make adjustments as necessary. Not ideal timing, but we have to go with what fate deals us. ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,108,853 RAC: 1,057 ![]() |
The "old" jobs are taking a bit longer than I expected, so we should be starting the newer batches some time tomorrow. So, I can go home, listen to the last of the cricket, and watch "Antiques Roadshow"! (I bought a little TV last week; amazing the technology you can get for £119 these days -- 120 TV channels and 80 radio stations too! Still no internet, tho'but...) ![]() |
Send message Joined: 18 Dec 15 Posts: 1562 Credit: 58,369,887 RAC: 56,662 ![]() ![]() ![]() |
Ivan, have a nice and enjoyable evening :-) |
Send message Joined: 18 Dec 15 Posts: 1562 Credit: 58,369,887 RAC: 56,662 ![]() ![]() ![]() |
Can anyone tell my what this task https://lhcathome.cern.ch/lhcathome/result.php?resultid=245102192 failed after almost 16 hours? Really too bad :-( ... Guest Log: [ERROR] Condor ended after 57868 seconds ... |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,108,853 RAC: 1,057 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,108,853 RAC: 1,057 ![]() |
|
Send message Joined: 18 Dec 15 Posts: 1562 Credit: 58,369,887 RAC: 56,662 ![]() ![]() ![]() |
one of my tasks failed after some 6 hours with 1 (0x00000001) Unknown error code ... 2019-12-24 19:57:17 (17992): Guest Log: [ERROR] Condor ended after 20653 seconds https://lhcathome.cern.ch/lhcathome/result.php?resultid=256603067 any idea what could be the reason? |
Send message Joined: 5 May 10 Posts: 1 Credit: 1,489,795 RAC: 0 ![]() ![]() |
I recently started contributing to this project again and having the same issue with the unknown error code. The CMS app is the only app running on my CPU right now and the error seems to happen frequently. |
![]() ![]() Send message Joined: 24 Oct 04 Posts: 1032 Credit: 48,541,560 RAC: 1,596 ![]() ![]() |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=263294061 Well I got about 10 of these time wasters so far today with 2 still running on this pc with maybe 5 total still running on the 24 cores and 56GB ram (and both I had running over at -dev did the same thing) I guess I won't get up early to do that again in the morning. |
Send message Joined: 12 Jul 11 Posts: 79 Credit: 1,067,251 RAC: 0 ![]() ![]() |
Hi since CMS tasks are back I tried to give it a go on my iMac : all the tasks are ending in error. It is the "1 (0x00000001) Unknown error code". |
Send message Joined: 18 Nov 17 Posts: 118 Credit: 45,337,125 RAC: 18,541 ![]() ![]() ![]() |
Thank you for information. So, I keep running Atlas and waiting for stable CMS back. |
©2023 CERN