Message boards : LHCb Application : Job not able to detect successful end
Message board moderation

To post messages, you must log in.

AuthorMessage
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1418
Credit: 9,470,586
RAC: 3,147
Message 28020 - Posted: 30 Nov 2016, 15:24:06 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=109039064

It was running the job normal, but could not detect a successful end, although BOINC did not remark the task as a computation error:

BOINC message: LHC@home 30 Nov 16:06:36 Computation for task LHCb_6564_1480502539.648605_0 finished

Maybe Cinzia can look for a returned status for LHCb job 28329.
ID: 28020 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 380
Credit: 238,712
RAC: 0
Message 28022 - Posted: 30 Nov 2016, 15:55:23 UTC - in response to Message 28020.  

There have been a few issues with the LHCb job submission today. We are working to resolve them.
ID: 28022 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 380
Credit: 238,712
RAC: 0
Message 28024 - Posted: 30 Nov 2016, 22:28:15 UTC - in response to Message 28022.  

LHCb jobs should be working again.
ID: 28024 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1418
Credit: 9,470,586
RAC: 3,147
Message 28038 - Posted: 2 Dec 2016, 7:10:13 UTC - in response to Message 28024.  

LHCb jobs should be working again.

This one was running fine: https://lhcathome.cern.ch/lhcathome/result.php?resultid=109059539

I don't yet fully understand the job processing setup, but it looks like the Theory simulation at first.
Running several jobs until 12 hours elapsed time is over, but each job seems to may run 10 sub-jobs.
I could not catch the VM-logs, because it was returned before I wake up.
ID: 28038 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 380
Credit: 238,712
RAC: 0
Message 28043 - Posted: 2 Dec 2016, 12:36:38 UTC - in response to Message 28038.  


Running several jobs until 12 hours elapsed time is over, but each job seems to


This is correct. LHCb are using pilot jobs. We still need to make some improvements to the logging.
ID: 28043 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1418
Credit: 9,470,586
RAC: 3,147
Message 28057 - Posted: 5 Dec 2016, 17:13:45 UTC - in response to Message 28043.  


Running several jobs until 12 hours elapsed time is over, but each job seems to

This is correct. LHCb are using pilot jobs. We still need to make some improvements to the logging.

Is there a special reason why a LHCb task is first killed after 36 hours and not like Theory after 18 hours?
ID: 28057 · Report as offensive     Reply Quote
Profile Viking69
Avatar

Send message
Joined: 24 Jul 05
Posts: 56
Credit: 5,602,899
RAC: 0
Message 28065 - Posted: 5 Dec 2016, 23:05:01 UTC

I was having a different issue.

My PC's screens would go blank and essentially be w/o signal and the PC would stop responding to any inputs. I had to hard power cycle the PC.
I did this 3 times before all of these errored out.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=109168643

I then was able to finish the ATLAS units that were constantly waiting for memory when other WU's were in queue. That is for the ATLAS pages.
Let's crunch for our future.
ID: 28065 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 380
Credit: 238,712
RAC: 0
Message 28070 - Posted: 6 Dec 2016, 12:08:01 UTC - in response to Message 28057.  

The LHCb jobs can be longer.
ID: 28070 · Report as offensive     Reply Quote

Message boards : LHCb Application : Job not able to detect successful end


©2024 CERN