Message boards : ATLAS application : Atlas task with 2700 jobs
Message board moderation

To post messages, you must log in.

AuthorMessage
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 267
Credit: 8,972,323
RAC: 12,280
Message 30337 - Posted: 14 May 2017, 19:41:41 UTC

Fortunately it was automatically aborted. It would have taken quite a long time to finish...

https://lhcathome.cern.ch/lhcathome/result.php?resultid=139705048
ID: 30337 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 179
Credit: 4,177,605
RAC: 6,347
Message 30347 - Posted: 15 May 2017, 7:47:08 UTC - in response to Message 30337.  

Hi,

I guess you are referring to this line:

No events to process: 2700 (skipEvents) >= 2000 (inputEvents of EVNT)

The 2700 is not a number of events, but an offset. Each EVNT file contains 5000 events, and each WU is told to read a certain sequence of events in the file. This is done by giving an offset and number of events to process, eg

WU1: offset: 0, nevents: 100
WU2: offset: 101, nevents: 100
...

In this case the offset was 2700, but for some reason the EVNT file only contained 2000 events, so the WU failed. We'll check why, it could be that some part of creating the EVNT failed which led to fewer events in the file.
ID: 30347 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 267
Credit: 8,972,323
RAC: 12,280
Message 30355 - Posted: 15 May 2017, 17:55:19 UTC - in response to Message 30347.  

Thank you for the reply. Always nice to know about the inner works of the system.
ID: 30355 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 267
Credit: 8,972,323
RAC: 12,280
Message 30419 - Posted: 19 May 2017, 13:17:23 UTC

I got another one of these: https://lhcathome.cern.ch/lhcathome/result.php?resultid=143049622 offset was 3400.
ID: 30419 · Report as offensive     Reply Quote
computezrmle

Send message
Joined: 15 Jun 08
Posts: 608
Credit: 6,508,373
RAC: 15,362
Message 30420 - Posted: 19 May 2017, 13:45:47 UTC - in response to Message 30419.  

Beside David's explanation in this thread regarding the huge number of events there are some other lines in your log that may cause errors.

2017-05-19 12:55:54 (7160): Setting Memory Size for VM. (3400MB)
2017-05-19 12:55:54 (7160): Setting CPU Count for VM. (1)


Although a lot of WUs run with a RAM setting that follows David's formula, it is obviously not enough for some of them.
If your host has enough resources you may increase the RAM setting to 4600-5000 MB.

A second issue may be caused by your VirtualBox version (5.0.12).
As Nils Høimyr mentioned here, you may upgrade to 5.1.x.
ID: 30420 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 267
Credit: 8,972,323
RAC: 12,280
Message 30422 - Posted: 19 May 2017, 14:15:52 UTC - in response to Message 30420.  

Thanks for the comments, unfortunately memory is a bit short (only 16 GByte, running 2 tasks at a time, 80% used) to give each task what you suggest, I'll try to increase it to 3600. Too bad that the TOP doesn't work in ATLAS VM.

For updating to a newer VM I don't see the need (I'm running on windows 7). I don't want to update just because a newer version is available. Besides isn't my version the same that Boinc still downloads with?
ID: 30422 · Report as offensive     Reply Quote

Message boards : ATLAS application : Atlas task with 2700 jobs


©2018 CERN