Message boards : CMS Application : 1 (0x00000001) Unknown error code
Message board moderation

To post messages, you must log in.

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1443
Credit: 76,900,060
RAC: 97,925
Message 39745 - Posted: 29 Aug 2019, 13:50:13 UTC

ID: 39745 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 672
Credit: 5,353,951
RAC: 10,309
Message 39746 - Posted: 29 Aug 2019, 14:19:41 UTC - in response to Message 39745.  

Yes, I just saw that. Unfortunately I'm at a meeting several hours north of London this week with only a very old and slow netbook to access our web pages. I didn't get around to checking the queues this morning and they drained sooner than I expected. I've submitted a new batch of jobs which should last until I'm home again.
As well, a WMAgent module has failed this morning -- something to do with DB polling so I don't think that affects the job queues per se (although, the batch I just sent hasn't shown up in the monitor yet). I've sent an e-mail to the CERN crew.
ID: 39746 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 672
Credit: 5,353,951
RAC: 10,309
Message 39747 - Posted: 29 Aug 2019, 15:04:27 UTC - in response to Message 39746.  

That batch is now being seen -- about 70 are already running.
ID: 39747 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1443
Credit: 76,900,060
RAC: 97,925
Message 39836 - Posted: 6 Sep 2019, 8:29:54 UTC

Since 6:48 UTC there's an increasing number of failed tasks.
Looks like we are out of jobs.
ID: 39836 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 672
Credit: 5,353,951
RAC: 10,309
Message 39841 - Posted: 6 Sep 2019, 10:32:53 UTC - in response to Message 39836.  
Last modified: 6 Sep 2019, 10:34:43 UTC

Since 6:48 UTC there's an increasing number of failed tasks.
Looks like we are out of jobs.

No. What happened is that yesterday we identified why the "new" CentOS7 jobs were failing. This morning Laurence made a change to our condor config to let them run. I had a batch of 1000 jobs with ~750 still queued; the change let them run but most of them immediately failed because they had been queued too long.
However, seven did run! It looks like they created ~6 MB of output in 15 minutes of wall-clock time. Both Federica and I had already submitted new workflows before I tracked down the relevant log files. I've now submitted another batch requesting four times the number of events. So, you'll probably see a number of short jobs later in the weekend once these new jobs make it into the queue, followed by some longer jobs (there are still nearly 5,000 "old" jobs pending). I'll monitor as best I can over the weekend and make adjustments as necessary. Not ideal timing, but we have to go with what fate deals us.
ID: 39841 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 672
Credit: 5,353,951
RAC: 10,309
Message 39865 - Posted: 8 Sep 2019, 15:30:31 UTC - in response to Message 39841.  

The "old" jobs are taking a bit longer than I expected, so we should be starting the newer batches some time tomorrow. So, I can go home, listen to the last of the cricket, and watch "Antiques Roadshow"! (I bought a little TV last week; amazing the technology you can get for £119 these days -- 120 TV channels and 80 radio stations too! Still no internet, tho'but...)
ID: 39865 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1284
Credit: 23,085,766
RAC: 2,480
Message 39866 - Posted: 8 Sep 2019, 16:51:10 UTC - in response to Message 39865.  

Ivan, have a nice and enjoyable evening :-)
ID: 39866 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1284
Credit: 23,085,766
RAC: 2,480
Message 39891 - Posted: 10 Sep 2019, 12:43:47 UTC

Can anyone tell my what this task

https://lhcathome.cern.ch/lhcathome/result.php?resultid=245102192

failed after almost 16 hours? Really too bad :-(

...
Guest Log: [ERROR] Condor ended after 57868 seconds
...
ID: 39891 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 672
Credit: 5,353,951
RAC: 10,309
Message 39893 - Posted: 10 Sep 2019, 15:59:32 UTC - in response to Message 39891.  

That appears to be a consequence of the queues draining more quickly than I anticipated last night. The task appears to have run several jobs, but then requested a new one when the queue was (nearly?) empty, and wasn't (properly?) allocated a new job. Eventually it timed out due to no reply.
ID: 39893 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 672
Credit: 5,353,951
RAC: 10,309
Message 39895 - Posted: 10 Sep 2019, 16:08:41 UTC

Sorry, an earlier message I meant to send to this thread appears to have gone missing. Perhaps I forgot to hit "Post". Longer jobs are on their way, we are tuning the parameters to try to get an acceptable time vs file-size ratio. Let me know of any problems.
ID: 39895 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1284
Credit: 23,085,766
RAC: 2,480
Message 41067 - Posted: 25 Dec 2019, 6:48:47 UTC

one of my tasks failed after some 6 hours with

1 (0x00000001) Unknown error code
...
2019-12-24 19:57:17 (17992): Guest Log: [ERROR] Condor ended after 20653 seconds

https://lhcathome.cern.ch/lhcathome/result.php?resultid=256603067

any idea what could be the reason?
ID: 41067 · Report as offensive     Reply Quote
Aaron

Send message
Joined: 5 May 10
Posts: 1
Credit: 1,117,847
RAC: 0
Message 41423 - Posted: 29 Jan 2020, 20:17:06 UTC

I recently started contributing to this project again and having the same issue with the unknown error code. The CMS app is the only app running on my CPU right now and the error seems to happen frequently.
ID: 41423 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 942
Credit: 40,272,697
RAC: 14,251
Message 41587 - Posted: 14 Feb 2020, 22:23:04 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=263294061

Well I got about 10 of these time wasters so far today with 2 still running on this pc with maybe 5 total still running on the 24 cores and 56GB ram
(and both I had running over at -dev did the same thing)

I guess I won't get up early to do that again in the morning.
ID: 41587 · Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 12 Jul 11
Posts: 43
Credit: 640,094
RAC: 0
Message 41698 - Posted: 23 Feb 2020, 11:31:28 UTC
Last modified: 23 Feb 2020, 11:31:43 UTC

Hi

since CMS tasks are back I tried to give it a go on my iMac : all the tasks are ending in error.

It is the "1 (0x00000001) Unknown error code".
ID: 41698 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 86
Credit: 29,298,399
RAC: 18,393
Message 41700 - Posted: 23 Feb 2020, 11:51:14 UTC - in response to Message 41698.  

Thank you for information. So, I keep running Atlas and waiting for stable CMS back.
ID: 41700 · Report as offensive     Reply Quote

Message boards : CMS Application : 1 (0x00000001) Unknown error code


©2020 CERN