log in

Atlas task with 2700 jobs


Advanced search

Message boards : ATLAS application : Atlas task with 2700 jobs

Author Message
Harri Liljeroos
Avatar
Send message
Joined: 28 Sep 04
Posts: 205
Credit: 6,174,208
RAC: 2,706
Message 30337 - Posted: 14 May 2017, 19:41:41 UTC

Fortunately it was automatically aborted. It would have taken quite a long time to finish...

https://lhcathome.cern.ch/lhcathome/result.php?resultid=139705048
____________

David Cameron
Project administrator
Project developer
Project scientist
Send message
Joined: 13 May 14
Posts: 139
Credit: 3,159,531
RAC: 6,484
Message 30347 - Posted: 15 May 2017, 7:47:08 UTC - in response to Message 30337.

Hi,

I guess you are referring to this line:

No events to process: 2700 (skipEvents) >= 2000 (inputEvents of EVNT)

The 2700 is not a number of events, but an offset. Each EVNT file contains 5000 events, and each WU is told to read a certain sequence of events in the file. This is done by giving an offset and number of events to process, eg

WU1: offset: 0, nevents: 100
WU2: offset: 101, nevents: 100
...

In this case the offset was 2700, but for some reason the EVNT file only contained 2000 events, so the WU failed. We'll check why, it could be that some part of creating the EVNT failed which led to fewer events in the file.

Harri Liljeroos
Avatar
Send message
Joined: 28 Sep 04
Posts: 205
Credit: 6,174,208
RAC: 2,706
Message 30355 - Posted: 15 May 2017, 17:55:19 UTC - in response to Message 30347.

Thank you for the reply. Always nice to know about the inner works of the system.
____________

Harri Liljeroos
Avatar
Send message
Joined: 28 Sep 04
Posts: 205
Credit: 6,174,208
RAC: 2,706
Message 30419 - Posted: 19 May 2017, 13:17:23 UTC

I got another one of these: https://lhcathome.cern.ch/lhcathome/result.php?resultid=143049622 offset was 3400.
____________

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,501,271
RAC: 1,830
Message 30420 - Posted: 19 May 2017, 13:45:47 UTC - in response to Message 30419.

Beside David's explanation in this thread regarding the huge number of events there are some other lines in your log that may cause errors.

2017-05-19 12:55:54 (7160): Setting Memory Size for VM. (3400MB)
2017-05-19 12:55:54 (7160): Setting CPU Count for VM. (1)


Although a lot of WUs run with a RAM setting that follows David's formula, it is obviously not enough for some of them.
If your host has enough resources you may increase the RAM setting to 4600-5000 MB.

A second issue may be caused by your VirtualBox version (5.0.12).
As Nils Høimyr mentioned here, you may upgrade to 5.1.x.

Harri Liljeroos
Avatar
Send message
Joined: 28 Sep 04
Posts: 205
Credit: 6,174,208
RAC: 2,706
Message 30422 - Posted: 19 May 2017, 14:15:52 UTC - in response to Message 30420.

Thanks for the comments, unfortunately memory is a bit short (only 16 GByte, running 2 tasks at a time, 80% used) to give each task what you suggest, I'll try to increase it to 3600. Too bad that the TOP doesn't work in ATLAS VM.

For updating to a newer VM I don't see the need (I'm running on windows 7). I don't want to update just because a newer version is available. Besides isn't my version the same that Boinc still downloads with?
____________

Message boards : ATLAS application : Atlas task with 2700 jobs