log in

Atlas tasks are failing, server status show also 0 tasks ready to send


Advanced search

Message boards : ATLAS application : Atlas tasks are failing, server status show also 0 tasks ready to send

1 · 2 · Next
Author Message
Harri Liljeroos
Avatar
Send message
Joined: 28 Sep 04
Posts: 189
Credit: 6,003,691
RAC: 4,402
Message 32001 - Posted: 20 Aug 2017, 8:05:02 UTC

My last two latest tasks failed and show this error:

WARNING Transform now exiting early with exit code 15 (No events to process: 4400 (skipEvents) >= 4400 (inputEvents of EVNT)

The server status page shows that the ready to send queue is empty. Same for sixtrack also. So time to select another subproject.
____________

David Cameron
Project administrator
Project developer
Project scientist
Send message
Joined: 13 May 14
Posts: 124
Credit: 2,875,749
RAC: 10,318
Message 32020 - Posted: 21 Aug 2017, 7:33:41 UTC - in response to Message 32001.

Indeed we ran out of tasks over the weekend, I have asked for more to be submitted.

Profile rbpeake
Send message
Joined: 17 Sep 04
Posts: 55
Credit: 15,620,725
RAC: 1,342
Message 32025 - Posted: 21 Aug 2017, 21:07:29 UTC - in response to Message 32020.

Seems like a lot of validate errors.
____________
Regards,
Bob P.

Harri Liljeroos
Avatar
Send message
Joined: 28 Sep 04
Posts: 189
Credit: 6,003,691
RAC: 4,402
Message 32037 - Posted: 22 Aug 2017, 6:49:02 UTC

Server status page shows new Atlas tasks are available.
____________

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,399,908
RAC: 3,711
Message 32044 - Posted: 22 Aug 2017, 10:26:59 UTC

Rather short for a WU with 198 MB initial download, isn't it?
Usual walltimes on this host are >5 h.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=153867426

maeax
Send message
Joined: 2 May 07
Posts: 182
Credit: 11,301,914
RAC: 11,411
Message 32048 - Posted: 22 Aug 2017, 18:57:30 UTC

Server show ZERO Tasks to send.

tullio
Send message
Joined: 19 Feb 08
Posts: 419
Credit: 2,049,855
RAC: 336
Message 32080 - Posted: 24 Aug 2017, 14:07:16 UTC
Last modified: 24 Aug 2017, 14:08:14 UTC

After a failed attempt to raise the VirtualMachine memory via the VirtualBox Manager, which resulted in a extremely long computing time and a final failure, I have crunched three Atlas tasks on my Linux laptop with a E-450 AMD CPU. They all finished in time and are validated. They are running alongside a climateprediction.net task with a very extended deadline (one year) and they stop momentarily waiting for memory, then resume crunching. I am glad to be able to run at least some LHC@home tasks, while my SUN WS on Linux and Windows 10 PC are crunching Einstein@home and SETI@home tasks both CPU and GPU.
Tullio

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,399,908
RAC: 3,711
Message 32082 - Posted: 24 Aug 2017, 15:24:55 UTC - in response to Message 32080.

Unfortunately your ATLAS WUs don't deliver valid scientific results, they are only rewarded for CPU time and that's the reason why they are marked as "valid" in the tasklist.

You may look into your error logs, e.g. https://lhcathome.cern.ch/lhcathome/result.php?resultid=153646199.
There you find:

2017-08-23 23:19:09 (30418): Setting Memory Size for VM. (4200MB)
2017-08-23 23:19:10 (30418): Setting CPU Count for VM. (2)

Setting #CPUs to 2 is not recommended on this host, especially if there is any other project that runs concurrently.
You may run ATLAS on a 1-core setting and suspend all other BOINC tasks while it is executed.

Setting RAM size to only 4200 MB is also not recommended as it leads to the following errors:
2017-08-24 00:07:21 (30418): Guest Log: PyJobTransforms.transform.execute 2017-08-24 00:03:19,485 CRITICAL Transform executor raised TransformValidationException: Non-zero return code from EVNTtoHITS (65); Logfile error in log.EVNTtoHITS: "AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider"
2017-08-24 00:07:21 (30418): Guest Log: PyJobTransforms.transform.execute 2017-08-24 00:03:24,036 WARNING Transform now exiting early with exit code 65 (Non-zero return code from EVNTtoHITS (65); Logfile error in log.EVNTtoHITS: "AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider")

Even on a 1-core setting, a RAM size of 4200 MB may be not enough for the current batch.

What is also suspect:
- the very short runtimes of your WUs
- a result file named HITS.* is missing in the log

tullio
Send message
Joined: 19 Feb 08
Posts: 419
Credit: 2,049,855
RAC: 336
Message 32083 - Posted: 24 Aug 2017, 16:21:43 UTC - in response to Message 32082.

OK, I've limited the number of cores to 1 and I shall watch what happens. But I have only 8 GB RAM on that PC, the mobo allows only that, and cannot starve other projects.
Tullio

tullio
Send message
Joined: 19 Feb 08
Posts: 419
Credit: 2,049,855
RAC: 336
Message 32084 - Posted: 24 Aug 2017, 21:32:46 UTC

One core task started, looks OK, using 3400 MB.
Tullio

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,399,908
RAC: 3,711
Message 32086 - Posted: 25 Aug 2017, 7:08:48 UTC - in response to Message 32084.

@ tullio

Compared to other hosts and the current ATLAS batch I would expect completion times between 12 h and 15 h for your E-450.
Since you wrote your last post it reported a couple of ATLAS WUs with completion times of less than 1 h.
Although all of them are rewarded they still don't deliver what you probably expect.

If you like to spend the time for a test you may rise the RAM setting for a 1-core ATLAS VM to 5000 MB and start only this VM.
No other BOINC app should run or even be left in RAM during the test.

The critical phase is short after the start when the VM extracts the EVNT.* file.
You may check the stderr.txt in the slots dir of the running VM for the messages in my previous post.

If this test ends successfully you may repeat it with another project (no vbox project) running concurrently and/or with a slightly reduced RAM setting for your VM until the errors occur again.

If the test doesn't end successfully your E-450 is to weak to run ATLAS (and probably also CMS and LHCb).
Then you may repeat the test with Theory Simulation at it's default RAM setting.

Hope you'll get a success.

tullio
Send message
Joined: 19 Feb 08
Posts: 419
Credit: 2,049,855
RAC: 336
Message 32098 - Posted: 26 Aug 2017, 6:55:52 UTC - in response to Message 32086.

In the error log of the last failed task I found a phrase about an extension pack not installed. Mea culpa, mea culpa, mea maxima culpa. I installed it and now I shall watch the next task when the laptop finishes two SETI@home tasks I downloaded to keep it crunching something. Thanks anyway for your suggestion, I get very little help from anyone else.
Tullio

Toby Broom
Volunteer moderator
Send message
Joined: 27 Sep 08
Posts: 358
Credit: 78,287,504
RAC: 112,390
Message 32103 - Posted: 26 Aug 2017, 8:04:00 UTC

tulio, yetis check list is good if you have issues:

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161#29359


This one failed with too much disk usuage, you can adjust the setting to 8GB for BOINC in settings.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=153909057

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,399,908
RAC: 3,711
Message 32105 - Posted: 26 Aug 2017, 8:18:30 UTC - in response to Message 32103.

tulio, yetis check list is good if you have issues:

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161#29359


This one failed with too much disk usuage, you can adjust the setting to 8GB for BOINC in settings.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=153909057

But beside that the processing time and the error log looks very good.
Could have been a success.
Nonetheless I would spend more RAM than the configured 3400 MB.

tullio
Send message
Joined: 19 Feb 08
Posts: 419
Credit: 2,049,855
RAC: 336
Message 32111 - Posted: 26 Aug 2017, 12:41:58 UTC - in response to Message 32103.

Toby, I have 14.30 GB available to BOINC on the HP Laptop with a 1 TB hybrid disk, by Seagate if I remember. But after LHC "consolidation" all LHC tasks fail on all PCs, two Linux and one Windows, save SixTrack. I am testing Atlas@home on the slowest machine, the HP laptop with AMD E-450 CPU while the two other PCs run SETI@home and Einstein@home CPU and GPU tasks with no problem. The Windows 10 PC, updated every month by Microsoft, has 22 GB RAM and 1 TB disk. The SUN WS, my oldest machine, has 1 TB disk, 8 GB RAM and a GTX 750 Ti GPU board. The Windows PC has a GTX 1050 Ti GPU board, with Pascal microprocessor and 640 GPU cores. Maybe CERN should start thinking about GPUs, now that they have taken a role in SKA array computing, according to CERN Courier.
Tullio

Toby Broom
Volunteer moderator
Send message
Joined: 27 Sep 08
Posts: 358
Credit: 78,287,504
RAC: 112,390
Message 32119 - Posted: 26 Aug 2017, 17:22:38 UTC - in response to Message 32105.
Last modified: 26 Aug 2017, 17:26:55 UTC

I think the 3400MB is visable in BOINC UI? When you made the appconfig this over rides the usage in the VM to 5000MB(?). The UI however isn't updated so there is the difference.

Toby Broom
Volunteer moderator
Send message
Joined: 27 Sep 08
Posts: 358
Credit: 78,287,504
RAC: 112,390
Message 32120 - Posted: 26 Aug 2017, 17:25:41 UTC - in response to Message 32111.
Last modified: 26 Aug 2017, 17:27:29 UTC

There was some talk of a sixtrack app for GPU but I think they will target AVX 1st as it's simpler to do.

I feel like it will be a long time for GPU, it's not so simple to run Fortran on GPU.

The other project will never use GPU as the VM is there to make it easy for the scientist not easy for us ;). If it wasn't easy for them then they wouldn't exist.

I just set mine to 250GB, it never uses that much and I'm sure I would notice before it actually use it.

tullio
Send message
Joined: 19 Feb 08
Posts: 419
Credit: 2,049,855
RAC: 336
Message 32149 - Posted: 31 Aug 2017, 6:19:16 UTC - in response to Message 32119.

I've written the app_config.xml file as suggested which should bring the memory to 5000 MB. But Atlas tasks still start at 3400 MB. This on the SUN WS, a Linux box.
Tullio

Jim1348
Send message
Joined: 15 Nov 14
Posts: 71
Credit: 3,033,536
RAC: 10,549
Message 32151 - Posted: 31 Aug 2017, 6:56:50 UTC - in response to Message 32149.
Last modified: 31 Aug 2017, 6:57:35 UTC

I've written the app_config.xml file as suggested which should bring the memory to 5000 MB. But Atlas tasks still start at 3400 MB. This on the SUN WS, a Linux box.
Tullio

That is OK, the app_config.xml just sets the maximum amount of memory that can be used. But setting it to 5000 MB fixed the problem for me, even though the Atlas tasks still show as 3400 MB.

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,399,908
RAC: 3,711
Message 32152 - Posted: 31 Aug 2017, 7:01:05 UTC - in response to Message 32149.

tullio wrote:
I've written the app_config.xml file as suggested which should bring the memory to 5000 MB. But Atlas tasks still start at 3400 MB. This on the SUN WS, a Linux box.
Tullio

Do you have a message like the following in your BOINC client's logfile?
Do 31 Aug 2017 08:49:44 CEST | LHC@home | Found app_config.xml

If not, reload your config files (e.g. via client menu -> options) or restart the client/computer.
The changed settings don't affect WUs that are already running, i.e. reside in a "slots" dir.

You may also post your app_config.xml here.

1 · 2 · Next

Message boards : ATLAS application : Atlas tasks are failing, server status show also 0 tasks ready to send