Message boards : ATLAS application : Atlas task with 2700 jobs
Message board moderation

To post messages, you must log in.

AuthorMessage
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,544,262
RAC: 15,572
Message 30337 - Posted: 14 May 2017, 19:41:41 UTC

Fortunately it was automatically aborted. It would have taken quite a long time to finish...

https://lhcathome.cern.ch/lhcathome/result.php?resultid=139705048
ID: 30337 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 30347 - Posted: 15 May 2017, 7:47:08 UTC - in response to Message 30337.  

Hi,

I guess you are referring to this line:

No events to process: 2700 (skipEvents) >= 2000 (inputEvents of EVNT)

The 2700 is not a number of events, but an offset. Each EVNT file contains 5000 events, and each WU is told to read a certain sequence of events in the file. This is done by giving an offset and number of events to process, eg

WU1: offset: 0, nevents: 100
WU2: offset: 101, nevents: 100
...

In this case the offset was 2700, but for some reason the EVNT file only contained 2000 events, so the WU failed. We'll check why, it could be that some part of creating the EVNT failed which led to fewer events in the file.
ID: 30347 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,544,262
RAC: 15,572
Message 30355 - Posted: 15 May 2017, 17:55:19 UTC - in response to Message 30347.  

Thank you for the reply. Always nice to know about the inner works of the system.
ID: 30355 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,544,262
RAC: 15,572
Message 30419 - Posted: 19 May 2017, 13:17:23 UTC

I got another one of these: https://lhcathome.cern.ch/lhcathome/result.php?resultid=143049622 offset was 3400.
ID: 30419 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,518,113
RAC: 122,917
Message 30420 - Posted: 19 May 2017, 13:45:47 UTC - in response to Message 30419.  

Beside David's explanation in this thread regarding the huge number of events there are some other lines in your log that may cause errors.

2017-05-19 12:55:54 (7160): Setting Memory Size for VM. (3400MB)
2017-05-19 12:55:54 (7160): Setting CPU Count for VM. (1)


Although a lot of WUs run with a RAM setting that follows David's formula, it is obviously not enough for some of them.
If your host has enough resources you may increase the RAM setting to 4600-5000 MB.

A second issue may be caused by your VirtualBox version (5.0.12).
As Nils Høimyr mentioned here, you may upgrade to 5.1.x.
ID: 30420 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,544,262
RAC: 15,572
Message 30422 - Posted: 19 May 2017, 14:15:52 UTC - in response to Message 30420.  

Thanks for the comments, unfortunately memory is a bit short (only 16 GByte, running 2 tasks at a time, 80% used) to give each task what you suggest, I'll try to increase it to 3600. Too bad that the TOP doesn't work in ATLAS VM.

For updating to a newer VM I don't see the need (I'm running on windows 7). I don't want to update just because a newer version is available. Besides isn't my version the same that Boinc still downloads with?
ID: 30422 · Report as offensive     Reply Quote

Message boards : ATLAS application : Atlas task with 2700 jobs


©2024 CERN