Message boards :
Number crunching :
Checklist Version 3 for Atlas@Home (and other VM-based Projects) on your PC
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 |
@ gyllic when you say "one core task" is that an ATLAS one core task you are referring to?yes. If you struggle with an insufficient amount of RAM in your PC for using all your CPU cores efficiently, you could try running the native ATLAS app. This app, however, is only running on Linux, but has better efficiency and needs much less RAM compared to atlas vbox tasks. Unfortunately, the only way to force to get atlas native tasks is to remove vbox from your system entirely (or by "manipulating" the boinc config), so this system would then be ATLAS native only (and sixtrack). |
Send message Joined: 5 Feb 18 Posts: 4 Credit: 234,192 RAC: 0 |
Hi, just to let you know that LHC@home / BOINC/ VB seems to have stopped working since upgrading to the latest BOINC with latest VM. My laptop was whirring away happily before, even coping with ATLAS with intermittent postponements. Windows 10 has also updated. Maybe some other users are having a similar problems? I've tried re-installing using the option from the BIONC page, but all tasks lock up immediately with "computation error". Unfortunately I'm very busy right now, so unable to wade through your fantastic checklist, Yeti. I'm going to have to specify "no new tasks" until less busy, unless there's a really quick fix. I appreciate that Physics is not easy! |
Send message Joined: 24 Jun 10 Posts: 43 Credit: 6,127,823 RAC: 1,431 |
Greetings All Thankyou for the checklist Yeti, have worked through it and have three PCs successfully crunching ATLAS workunits. On my machines I am utilizing MT 2 core per job and only doing one job at time, this is due to ram limitations on my PCs. I am working on the 4th machine at the moment. I do apologize for aborting/erroring some workunits, misread one of the steps and also incomplete/corrupted download. Cheers |
Send message Joined: 15 Jun 08 Posts: 2531 Credit: 253,722,201 RAC: 41,981 |
Your recent tasks only succeed at the BOINC level. The levels below (VM, scientific app) fail. This is caused by wrong RAM settings. Some of your logs show that 7100 MB RAM are reserved for the VM (1 core), but your computer has only 8 GB. Other logs show 3500 MB RAM which is not enough for some types of ATLAS tasks. You wrote that you would like to run a 2-core setup. Then you may rework your setup and configure 4800 MB via app_config.xml. |
Send message Joined: 24 Jun 10 Posts: 43 Credit: 6,127,823 RAC: 1,431 |
If they are as you say failing why are then not marked invalid. I had a hiccup or two at the start but have set up my project preferences for max cpu 2 and max jobs 2. I have an app config file that is saying use 2 cores per job. Boinc indicates I am using 2 cores for each job. I only run one job at a time. Latest valids are stating in the task file successfully completed? A question for admins are the workunits I am returning and getting marked valid really valid then as per previous reply. Regards |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 |
A question for admins are the workunits I am returning and getting marked valid really valid then as per previous reply.Obviously I am not an admin, but yes, they are invalid in terms of scientific results. I.e. if you keep your current setup running as it is, it is a total waste of resources. A good indication if the task produced scientifc results is the existencee of the HITS.xxx file. This is one of your tasks with no HITS file https://lhcathome.cern.ch/lhcathome/result.php?resultid=200156318 and here a good task from another user https://lhcathome.cern.ch/lhcathome/result.php?resultid=200147295. In this post https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4178#29560, which was written by one of the admins, it says: Therefore a truly successful WU must have a valid HITS file produced, however you can still get credit even if no HITS file is present because we don't want people to suffer from problems in ATLAS software or infrastructure.You should, as computezrmle said, ajdust your RAM settings. |
Send message Joined: 15 Jun 08 Posts: 2531 Credit: 253,722,201 RAC: 41,981 |
If they are as you say failing why are then not marked invalid. Gyllic already answered this. I have an app config file that is saying use 2 cores per job. This should also be set at the project website. "Use max # of CPUs" => 2 Disadvantage: The server will send a too low RAM setting if you configure 1 or 2 CPUs. You'll have to correct that with your app_config.xml Example: <app_config> <app_version> <app_name>ATLAS</app_name> <plan_class>vbox64_mt_mcore_atlas</plan_class> <avg_ncpus>2.0</avg_ncpus> <cmdline>--nthreads 2 --memory_size_mb 4800</cmdline> </app_version> <project_max_concurrent>1</project_max_concurrent> </app_config> Boinc indicates I am using 2 cores for each job. Good. Stay at this settings, at least until you see good results for a few days. Latest valids are stating in the task file successfully completed? As already mentioned, this refers to the BOINC level. Some indicators for failed WUs 1. very short total runtime. Compare that with runtimes of other users 2. Missing HITS file in the log. Already explained by Gyllic 3. CPU count doesn't fit to RAM count Too high. The VM eats up nearly all RAM on your 8GB computer. 2018-07-14 19:11:31 (7548): Setting Memory Size for VM. (7100MB) 2018-07-14 19:11:31 (7548): Setting CPU Count for VM. (2) Default value, but too low. Typical error: log contains lots of garbage lines 2018-07-16 02:30:11 (6300): Setting Memory Size for VM. (4400MB) 2018-07-16 02:30:11 (6300): Setting CPU Count for VM. (2) 2018-07-16 03:08:35 (6300): Guest Log: ta.mxvml:' ctaon n`omte tamdoavtea -`smuertl.axdmalt':a .Nxom slu'c ht of i`lme etora ddaitrae-cstourryl 2018-07-16 03:08:35 (6300): Guest Log: xmol'ut:p uNto lsiuscth 2018-07-16 03:08:35 (6300): Guest Log: filleo g.o1r4 5d6i8r7e8c1t._o0r1y8 |
Send message Joined: 24 Jun 10 Posts: 43 Credit: 6,127,823 RAC: 1,431 |
Greetings All Thankyou for the feedback, I changed my app_config file to the one as per previous reply from computerzmle. After the changes, I only downloaded two more workunits to see if I am on the right track. Regards |
Send message Joined: 24 Jun 10 Posts: 43 Credit: 6,127,823 RAC: 1,431 |
Hi All Found some very peculiar lines in the stderr output files for the latest two tasks after making above changes. When I get access to my machine I will provide an update. Still no hits file, so investigation continues. Regards. |
Send message Joined: 24 Jun 10 Posts: 43 Credit: 6,127,823 RAC: 1,431 |
Hi All Seems as I am not the only one who is completing workunits that have been marked as valid (at boinc level) but no HITS file is present. As per the following workunits https://lhcathome.cern.ch/lhcathome/result.php?resultid=200171936 https://lhcathome.cern.ch/lhcathome/result.php?resultid=200168456 https://lhcathome.cern.ch/lhcathome/result.php?resultid=200077351 I am going to try a later version of VB and also this is an extract from my last workunit - https://lhcathome.cern.ch/lhcathome/result.php?resultid=200161943 2018-07-16 05:50:28 (7532): Guest Log: Starting ATLAS job. (PandaID=3983550564 taskID=14530897) 2018-07-16 06:08:08 (7532): Guest Log: log_extracts: 2018-07-16 06:08:08 (7532): Guest Log: - Last 10 lines from /home/atlas01/RunAtlas/Panda_Pilot_3444_1531691438/PandaJob/athena_stdout.txt - 2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.preExecute 2018-07-15 23:57:31,806 INFO Batch/grid running - command outputs will not be echoed. Logs for EVNTtoHITS are in log.EVNTtoHITS 2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.preExecute 2018-07-15 23:57:31,808 INFO Now writing wrapper for substep executor EVNTtoHITS 2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe._writeAthenaWrapper 2018-07-15 23:57:31,808 INFO Valgrind not engaged 2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.preExecute 2018-07-15 23:57:31,808 INFO Athena will be executed in a subshell via ['./runwrapper.EVNTtoHITS.sh'] 2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.execute 2018-07-15 23:57:31,808 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh']) 2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.execute 2018-07-16 00:05:00,192 INFO EVNTtoHITS executor returns 139 2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.validate 2018-07-16 00:05:01,628 ERROR Validation of return code failed: EVNTtoHITS got a SIGSEGV signal (exit code 139) (Error code 65) 2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.validate 2018-07-16 00:05:01,679 INFO Scanning logfile log.EVNTtoHITS for errors 2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.transform.execute 2018-07-16 00:05:01,724 CRITICAL Transform executor raised TransformValidationException: EVNTtoHITS got a SIGSEGV signal (exit code 139); Long ERROR message at line 1783 (see jobReport for further details) 2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.transform.execute 2018-07-16 00:05:05,645 WARNING Transform now exiting early with exit code 65 (EVNTtoHITS got a SIGSEGV signal (exit code 139); Long ERROR message at line 1783 (see jobReport for further details)) Regards |
Send message Joined: 2 May 07 Posts: 2242 Credit: 173,902,375 RAC: 2,798 |
Hi tazzduke, please write your messages in the atlas-forum, or in number crunshing. This thread was for information about checklist only. Thank you. |
Send message Joined: 24 Jun 10 Posts: 43 Credit: 6,127,823 RAC: 1,431 |
Understood Regards |
Send message Joined: 7 Apr 18 Posts: 20 Credit: 137,327 RAC: 0 |
Hi All, At the beginning I should tell that I don't know anything about virtualization so my question may seem very stupid. My English isn't impresive, too, I'm sorry. I'm not sure how properly stop VB and turn off my computer. When I just turn off my computer, virtualisation doesn't stop running automatically. When I turned off the virtual machine at first, the simulation broke down. Have I done something wrong? I don't want to waste my computer's work again. |
Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,177,793 RAC: 27,438 |
|
Send message Joined: 7 Apr 18 Posts: 20 Credit: 137,327 RAC: 0 |
Thank you very much, it works now well :) This way would never entrer my mind! |
Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 |
Check, if you have have enough RAM for Atlas available. Each SingeCore-Atlas-Task needs 2,1 GB free RAM Does this still apply, or is a single core setup more like 3,9 GB? Thanks... Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
Send message Joined: 15 Jun 08 Posts: 2531 Credit: 253,722,201 RAC: 41,981 |
The recent standard setup for 1-core is 3900 MB (+ 900 MB per additional core). There are task series that need less RAM but you may crash other series if you set it too low. |
Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,177,793 RAC: 27,438 |
Check, if you have have enough RAM for Atlas available. Each SingeCore-Atlas-Task needs 2,1 GB free RAM We don't have any SingleCore-Tasks anymore. Nowerdays we always run MultiCore-WUs, but they can run with 1-Core and then they really need 3,9 GB RAM Supporting BOINC, a great concept ! |
Send message Joined: 7 Apr 18 Posts: 20 Credit: 137,327 RAC: 0 |
I can't find the client_state.xml in BOINC file. I use Windows 10, has it a different name? |
Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,177,793 RAC: 27,438 |
Check here please: https://boinc.berkeley.edu/wiki/BOINC_Data_directory Supporting BOINC, a great concept ! |
©2024 CERN