Message boards : ATLAS application : Atlas tasks are failing, server status show also 0 tasks ready to send
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,150,303
RAC: 15,977
Message 32001 - Posted: 20 Aug 2017, 8:05:02 UTC

My last two latest tasks failed and show this error:

WARNING Transform now exiting early with exit code 15 (No events to process: 4400 (skipEvents) >= 4400 (inputEvents of EVNT)

The server status page shows that the ready to send queue is empty. Same for sixtrack also. So time to select another subproject.
ID: 32001 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 32020 - Posted: 21 Aug 2017, 7:33:41 UTC - in response to Message 32001.  

Indeed we ran out of tasks over the weekend, I have asked for more to be submitted.
ID: 32020 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 99
Credit: 30,618,118
RAC: 3,938
Message 32025 - Posted: 21 Aug 2017, 21:07:29 UTC - in response to Message 32020.  

Seems like a lot of validate errors.
Regards,
Bob P.
ID: 32025 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,150,303
RAC: 15,977
Message 32037 - Posted: 22 Aug 2017, 6:49:02 UTC

Server status page shows new Atlas tasks are available.
ID: 32037 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,905,671
RAC: 138,000
Message 32044 - Posted: 22 Aug 2017, 10:26:59 UTC

Rather short for a WU with 198 MB initial download, isn't it?
Usual walltimes on this host are >5 h.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=153867426
ID: 32044 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,087,008
RAC: 104,209
Message 32048 - Posted: 22 Aug 2017, 18:57:30 UTC

Server show ZERO Tasks to send.
ID: 32048 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 32080 - Posted: 24 Aug 2017, 14:07:16 UTC
Last modified: 24 Aug 2017, 14:08:14 UTC

After a failed attempt to raise the VirtualMachine memory via the VirtualBox Manager, which resulted in a extremely long computing time and a final failure, I have crunched three Atlas tasks on my Linux laptop with a E-450 AMD CPU. They all finished in time and are validated. They are running alongside a climateprediction.net task with a very extended deadline (one year) and they stop momentarily waiting for memory, then resume crunching. I am glad to be able to run at least some LHC@home tasks, while my SUN WS on Linux and Windows 10 PC are crunching Einstein@home and SETI@home tasks both CPU and GPU.
Tullio
ID: 32080 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,905,671
RAC: 138,000
Message 32082 - Posted: 24 Aug 2017, 15:24:55 UTC - in response to Message 32080.  

Unfortunately your ATLAS WUs don't deliver valid scientific results, they are only rewarded for CPU time and that's the reason why they are marked as "valid" in the tasklist.

You may look into your error logs, e.g. https://lhcathome.cern.ch/lhcathome/result.php?resultid=153646199.
There you find:
2017-08-23 23:19:09 (30418): Setting Memory Size for VM. (4200MB)
2017-08-23 23:19:10 (30418): Setting CPU Count for VM. (2)

Setting #CPUs to 2 is not recommended on this host, especially if there is any other project that runs concurrently.
You may run ATLAS on a 1-core setting and suspend all other BOINC tasks while it is executed.

Setting RAM size to only 4200 MB is also not recommended as it leads to the following errors:
2017-08-24 00:07:21 (30418): Guest Log: PyJobTransforms.transform.execute 2017-08-24 00:03:19,485 CRITICAL Transform executor raised TransformValidationException: Non-zero return code from EVNTtoHITS (65); Logfile error in log.EVNTtoHITS: "AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider"
2017-08-24 00:07:21 (30418): Guest Log: PyJobTransforms.transform.execute 2017-08-24 00:03:24,036 WARNING Transform now exiting early with exit code 65 (Non-zero return code from EVNTtoHITS (65); Logfile error in log.EVNTtoHITS: "AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider")

Even on a 1-core setting, a RAM size of 4200 MB may be not enough for the current batch.

What is also suspect:
- the very short runtimes of your WUs
- a result file named HITS.* is missing in the log
ID: 32082 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 32083 - Posted: 24 Aug 2017, 16:21:43 UTC - in response to Message 32082.  

OK, I've limited the number of cores to 1 and I shall watch what happens. But I have only 8 GB RAM on that PC, the mobo allows only that, and cannot starve other projects.
Tullio
ID: 32083 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 32084 - Posted: 24 Aug 2017, 21:32:46 UTC

One core task started, looks OK, using 3400 MB.
Tullio
ID: 32084 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,905,671
RAC: 138,000
Message 32086 - Posted: 25 Aug 2017, 7:08:48 UTC - in response to Message 32084.  

@ tullio

Compared to other hosts and the current ATLAS batch I would expect completion times between 12 h and 15 h for your E-450.
Since you wrote your last post it reported a couple of ATLAS WUs with completion times of less than 1 h.
Although all of them are rewarded they still don't deliver what you probably expect.

If you like to spend the time for a test you may rise the RAM setting for a 1-core ATLAS VM to 5000 MB and start only this VM.
No other BOINC app should run or even be left in RAM during the test.

The critical phase is short after the start when the VM extracts the EVNT.* file.
You may check the stderr.txt in the slots dir of the running VM for the messages in my previous post.

If this test ends successfully you may repeat it with another project (no vbox project) running concurrently and/or with a slightly reduced RAM setting for your VM until the errors occur again.

If the test doesn't end successfully your E-450 is to weak to run ATLAS (and probably also CMS and LHCb).
Then you may repeat the test with Theory Simulation at it's default RAM setting.

Hope you'll get a success.
ID: 32086 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 32098 - Posted: 26 Aug 2017, 6:55:52 UTC - in response to Message 32086.  

In the error log of the last failed task I found a phrase about an extension pack not installed. Mea culpa, mea culpa, mea maxima culpa. I installed it and now I shall watch the next task when the laptop finishes two SETI@home tasks I downloaded to keep it crunching something. Thanks anyway for your suggestion, I get very little help from anyone else.
Tullio
ID: 32098 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,698,620
RAC: 234,880
Message 32103 - Posted: 26 Aug 2017, 8:04:00 UTC

tulio, yetis check list is good if you have issues:

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161#29359


This one failed with too much disk usuage, you can adjust the setting to 8GB for BOINC in settings.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=153909057
ID: 32103 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,905,671
RAC: 138,000
Message 32105 - Posted: 26 Aug 2017, 8:18:30 UTC - in response to Message 32103.  

tulio, yetis check list is good if you have issues:

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161#29359


This one failed with too much disk usuage, you can adjust the setting to 8GB for BOINC in settings.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=153909057

But beside that the processing time and the error log looks very good.
Could have been a success.
Nonetheless I would spend more RAM than the configured 3400 MB.
ID: 32105 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 32111 - Posted: 26 Aug 2017, 12:41:58 UTC - in response to Message 32103.  

Toby, I have 14.30 GB available to BOINC on the HP Laptop with a 1 TB hybrid disk, by Seagate if I remember. But after LHC "consolidation" all LHC tasks fail on all PCs, two Linux and one Windows, save SixTrack. I am testing Atlas@home on the slowest machine, the HP laptop with AMD E-450 CPU while the two other PCs run SETI@home and Einstein@home CPU and GPU tasks with no problem. The Windows 10 PC, updated every month by Microsoft, has 22 GB RAM and 1 TB disk. The SUN WS, my oldest machine, has 1 TB disk, 8 GB RAM and a GTX 750 Ti GPU board. The Windows PC has a GTX 1050 Ti GPU board, with Pascal microprocessor and 640 GPU cores. Maybe CERN should start thinking about GPUs, now that they have taken a role in SKA array computing, according to CERN Courier.
Tullio
ID: 32111 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,698,620
RAC: 234,880
Message 32119 - Posted: 26 Aug 2017, 17:22:38 UTC - in response to Message 32105.  
Last modified: 26 Aug 2017, 17:26:55 UTC

I think the 3400MB is visable in BOINC UI? When you made the appconfig this over rides the usage in the VM to 5000MB(?). The UI however isn't updated so there is the difference.
ID: 32119 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,698,620
RAC: 234,880
Message 32120 - Posted: 26 Aug 2017, 17:25:41 UTC - in response to Message 32111.  
Last modified: 26 Aug 2017, 17:27:29 UTC

There was some talk of a sixtrack app for GPU but I think they will target AVX 1st as it's simpler to do.

I feel like it will be a long time for GPU, it's not so simple to run Fortran on GPU.

The other project will never use GPU as the VM is there to make it easy for the scientist not easy for us ;). If it wasn't easy for them then they wouldn't exist.

I just set mine to 250GB, it never uses that much and I'm sure I would notice before it actually use it.
ID: 32120 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 32149 - Posted: 31 Aug 2017, 6:19:16 UTC - in response to Message 32119.  

I've written the app_config.xml file as suggested which should bring the memory to 5000 MB. But Atlas tasks still start at 3400 MB. This on the SUN WS, a Linux box.
Tullio
ID: 32149 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 32151 - Posted: 31 Aug 2017, 6:56:50 UTC - in response to Message 32149.  
Last modified: 31 Aug 2017, 6:57:35 UTC

I've written the app_config.xml file as suggested which should bring the memory to 5000 MB. But Atlas tasks still start at 3400 MB. This on the SUN WS, a Linux box.
Tullio

That is OK, the app_config.xml just sets the maximum amount of memory that can be used. But setting it to 5000 MB fixed the problem for me, even though the Atlas tasks still show as 3400 MB.
ID: 32151 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,905,671
RAC: 138,000
Message 32152 - Posted: 31 Aug 2017, 7:01:05 UTC - in response to Message 32149.  

tullio wrote:
I've written the app_config.xml file as suggested which should bring the memory to 5000 MB. But Atlas tasks still start at 3400 MB. This on the SUN WS, a Linux box.
Tullio

Do you have a message like the following in your BOINC client's logfile?
Do 31 Aug 2017 08:49:44 CEST | LHC@home | Found app_config.xml

If not, reload your config files (e.g. via client menu -> options) or restart the client/computer.
The changed settings don't affect WUs that are already running, i.e. reside in a "slots" dir.

You may also post your app_config.xml here.
ID: 32152 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : ATLAS application : Atlas tasks are failing, server status show also 0 tasks ready to send


©2024 CERN