Message boards : ATLAS application : ATLAS Validation errors when running other work unit types
Message board moderation

To post messages, you must log in.

AuthorMessage
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 29735 - Posted: 31 Mar 2017, 12:03:26 UTC
Last modified: 31 Mar 2017, 12:16:56 UTC

I have found a validation problem with ATLAS that does not seem to have been addressed on this forum, though maybe it was mentioned on the old ATLAS forum.

Whenever I run only ATLAS by itself (on any number of cores from 1 to 7), the work units validate properly. But when I run it on more than one core with other projects, they are all invalid. On this machine (i7-4770 running Ubuntu 16.10), that other project is Cosmology. https://lhcathome.cern.ch/lhcathome/results.php?hostid=10453313 The ones that are valid are all when running ATLAS on a single core, and the invalids are when running ATLAS on two cores.

And the curious thing is that Cosmology is multi-core too, so when it runs it occupies all 7 cores, with the last core devoted to supporting a GTX 1060 on Folding. Therefore, it is not actually running when ATLAS is running, except for a brief period of a minute or two when the BOINC scheduler switches between the two. But that seems to be enough to cause the validation errors.

I have seen the same thing on an i7-4790 machine under Ubuntu 16.10 also, where the "other" projects are the LHC projects of Theory and Sixtrack. That is, the ATLAS validate properly only when they are the only type of work units running, in the case where the ATLAS are running on two or more cores (they might work properly when running on only one core, but I have not tried that recently and don't have the data now).

Has this problem been addressed before?
ID: 29735 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1280
Credit: 8,496,817
RAC: 2,374
Message 29736 - Posted: 31 Mar 2017, 12:19:14 UTC - in response to Message 29735.  

The memory set by the project for a dual core is too low (4200MB)
Choose either for 3 cores or use an app_config.xml and give the VM at least 4400MB of RAM, although already mentioned that even 4400MB sometimes is too low (not yet noticed myself).
ID: 29736 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 29737 - Posted: 31 Mar 2017, 13:22:17 UTC - in response to Message 29736.  
Last modified: 31 Mar 2017, 13:27:01 UTC

OK, I will try three cores, and then do the app_config as necessary.

EDIT: Yes, the three cores seems to be running OK. Whether that solves all my problems remains to be seen, but it is a step in the right direction.

Phillipe had suggest that to me also, but I did not interpret it correctly.

Thanks.
ID: 29737 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 29743 - Posted: 1 Apr 2017, 16:26:17 UTC
Last modified: 1 Apr 2017, 16:33:38 UTC

I prefer to run ATLAS on two cores at a time for increased efficiency, so I used the app_config posted here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4166&postid=29453#29453

That allowed me to run three ATLAS, each on two cores at a time, but only two of them ran correctly. The third one started to die on me, with the CPU% (as measured by BoincTasks) going down from 40% to just below 25%.

To save it, I increased the memory allocation to 4600 MB (up from 4400 MB), and that worked. The CPU% immediately started to increase, and ended up at 100%. (For some reason, all the multicore tasks end up at 100% on BoincTasks if they are running properly.)
ID: 29743 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1280
Credit: 8,496,817
RAC: 2,374
Message 29744 - Posted: 1 Apr 2017, 16:58:23 UTC - in response to Message 29743.  

...(For some reason, all the multicore tasks end up at 100% on BoincTasks if they are running properly.)

The reason that a multi-core always has 100% after the init phase is that the CPU% in BoincTasks is
simple calculated from the elapsed time and real used CPU (in parentheses). See example:

1.01 ATLAS Simulation (vbox64_mt_mcore_atlas) 04:36:29 (08:29:27)
ID: 29744 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 29746 - Posted: 1 Apr 2017, 18:58:53 UTC - in response to Message 29744.  

OK, thanks.

But another peculiarity, and I think this has been mentioned before but I can't find it at the moment, is that BOINC sometimes runs on more cores than you have allotted. That is, I set BOINC preference to use "at most 90% of the processors" in order to reserve one core of my i7-4790 to support a GTX 1070 on Folding. That usually works, but in a couple of instances when I have been running three ATLAS tasks (two cores each), BOINC also runs two additional projects (e.g., SixTrack or Theory), for a total of eight cores rather than seven.

In fact, I found that one of the ATLAS tasks was again going to die, apparently from running the extra task, or maybe still not enough memory. So I added
<project_max_concurrent>7</project_max_concurrent>
to the app_config.xml, and increased the memory amount to 5000 MB just to be safe (I have 32 GB, and might as well use it).

So the app_config.xml now looks like this, and is working thus far:

<app_config>
<project_max_concurrent>7</project_max_concurrent>
<app>
<name>ATLAS</name>
<max_concurrent>6</max_concurrent>
</app>
<app_version>
<app_name>ATLAS</app_name>
<avg_ncpus>2.000000</avg_ncpus>
<plan_class>vbox64_mt_mcore_atlas</plan_class>
<cmdline>--memory_size_mb 5000</cmdline>
</app_version>
</app_config>
ID: 29746 · Report as offensive     Reply Quote

Message boards : ATLAS application : ATLAS Validation errors when running other work unit types


©2024 CERN