Message boards : Number crunching : Checklist Version 3 for Atlas@Home (and other VM-based Projects) on your PC
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 35516 - Posted: 13 Jun 2018, 17:59:10 UTC - in response to Message 35510.  
Last modified: 13 Jun 2018, 18:33:34 UTC

@ gyllic when you say "one core task" is that an ATLAS one core task you are referring to?
yes.
If you struggle with an insufficient amount of RAM in your PC for using all your CPU cores efficiently, you could try running the native ATLAS app. This app, however, is only running on Linux, but has better efficiency and needs much less RAM compared to atlas vbox tasks. Unfortunately, the only way to force to get atlas native tasks is to remove vbox from your system entirely (or by "manipulating" the boinc config), so this system would then be ATLAS native only (and sixtrack).
ID: 35516 · Report as offensive     Reply Quote
Nick5kin

Send message
Joined: 5 Feb 18
Posts: 4
Credit: 234,192
RAC: 0
Message 35537 - Posted: 16 Jun 2018, 12:05:41 UTC

Hi, just to let you know that LHC@home / BOINC/ VB seems to have stopped working since upgrading to the latest BOINC with latest VM. My laptop was whirring away happily before, even coping with ATLAS with intermittent postponements. Windows 10 has also updated. Maybe some other users are having a similar problems? I've tried re-installing using the option from the BIONC page, but all tasks lock up immediately with "computation error". Unfortunately I'm very busy right now, so unable to wade through your fantastic checklist, Yeti. I'm going to have to specify "no new tasks" until less busy, unless there's a really quick fix. I appreciate that Physics is not easy!
ID: 35537 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 34
Credit: 4,962,081
RAC: 10,961
Message 35903 - Posted: 15 Jul 2018, 0:20:12 UTC

Greetings All

Thankyou for the checklist Yeti, have worked through it and have three PCs successfully crunching ATLAS workunits.

On my machines I am utilizing MT 2 core per job and only doing one job at time, this is due to ram limitations on my PCs.

I am working on the 4th machine at the moment.

I do apologize for aborting/erroring some workunits, misread one of the steps and also incomplete/corrupted download.

Cheers
ID: 35903 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2375
Credit: 221,661,320
RAC: 143,659
Message 35904 - Posted: 15 Jul 2018, 6:19:10 UTC - in response to Message 35903.  

Your recent tasks only succeed at the BOINC level.
The levels below (VM, scientific app) fail.


This is caused by wrong RAM settings.
Some of your logs show that 7100 MB RAM are reserved for the VM (1 core), but your computer has only 8 GB.
Other logs show 3500 MB RAM which is not enough for some types of ATLAS tasks.

You wrote that you would like to run a 2-core setup.
Then you may rework your setup and configure 4800 MB via app_config.xml.
ID: 35904 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 34
Credit: 4,962,081
RAC: 10,961
Message 35918 - Posted: 15 Jul 2018, 13:32:10 UTC - in response to Message 35904.  

If they are as you say failing why are then not marked invalid.

I had a hiccup or two at the start but have set up my project preferences for max cpu 2 and max jobs 2.

I have an app config file that is saying use 2 cores per job.

Boinc indicates I am using 2 cores for each job.

I only run one job at a time.

Latest valids are stating in the task file successfully completed?

A question for admins are the workunits I am returning and getting marked valid really valid then as per previous reply.

Regards
ID: 35918 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 35920 - Posted: 15 Jul 2018, 15:41:08 UTC - in response to Message 35918.  
Last modified: 15 Jul 2018, 15:47:45 UTC

A question for admins are the workunits I am returning and getting marked valid really valid then as per previous reply.
Obviously I am not an admin, but yes, they are invalid in terms of scientific results. I.e. if you keep your current setup running as it is, it is a total waste of resources. A good indication if the task produced scientifc results is the existencee of the HITS.xxx file. This is one of your tasks with no HITS file https://lhcathome.cern.ch/lhcathome/result.php?resultid=200156318 and here a good task from another user https://lhcathome.cern.ch/lhcathome/result.php?resultid=200147295.

In this post https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4178#29560, which was written by one of the admins, it says:

Therefore a truly successful WU must have a valid HITS file produced, however you can still get credit even if no HITS file is present because we don't want people to suffer from problems in ATLAS software or infrastructure.
You should, as computezrmle said, ajdust your RAM settings.
ID: 35920 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2375
Credit: 221,661,320
RAC: 143,659
Message 35921 - Posted: 15 Jul 2018, 19:53:35 UTC - in response to Message 35918.  

If they are as you say failing why are then not marked invalid.

Gyllic already answered this.

I have an app config file that is saying use 2 cores per job.

This should also be set at the project website.
"Use max # of CPUs" => 2
Disadvantage: The server will send a too low RAM setting if you configure 1 or 2 CPUs.
You'll have to correct that with your app_config.xml
Example:
<app_config>
  <app_version>
    <app_name>ATLAS</app_name>
    <plan_class>vbox64_mt_mcore_atlas</plan_class>
    <avg_ncpus>2.0</avg_ncpus>
    <cmdline>--nthreads 2 --memory_size_mb 4800</cmdline>
  </app_version>
  <project_max_concurrent>1</project_max_concurrent>
</app_config>



Boinc indicates I am using 2 cores for each job.
I only run one job at a time.

Good. Stay at this settings, at least until you see good results for a few days.

Latest valids are stating in the task file successfully completed?

As already mentioned, this refers to the BOINC level.


Some indicators for failed WUs

1. very short total runtime.
Compare that with runtimes of other users

2. Missing HITS file in the log.
Already explained by Gyllic

3. CPU count doesn't fit to RAM count
Too high.
The VM eats up nearly all RAM on your 8GB computer.
2018-07-14 19:11:31 (7548): Setting Memory Size for VM. (7100MB)
2018-07-14 19:11:31 (7548): Setting CPU Count for VM. (2)

Default value, but too low.
Typical error: log contains lots of garbage lines
2018-07-16 02:30:11 (6300): Setting Memory Size for VM. (4400MB)
2018-07-16 02:30:11 (6300): Setting CPU Count for VM. (2)

2018-07-16 03:08:35 (6300): Guest Log: ta.mxvml:'  ctaon n`omte tamdoavtea -`smuertl.axdmalt':a .Nxom slu'c ht of i`lme etora ddaitrae-cstourryl
2018-07-16 03:08:35 (6300): Guest Log: xmol'ut:p uNto  lsiuscth
2018-07-16 03:08:35 (6300): Guest Log:  filleo g.o1r4 5d6i8r7e8c1t._o0r1y8
ID: 35921 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 34
Credit: 4,962,081
RAC: 10,961
Message 35922 - Posted: 15 Jul 2018, 21:03:12 UTC - in response to Message 35921.  

Greetings All

Thankyou for the feedback, I changed my app_config file to the one as per previous reply from computerzmle.

After the changes, I only downloaded two more workunits to see if I am on the right track.

Regards
ID: 35922 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 34
Credit: 4,962,081
RAC: 10,961
Message 35926 - Posted: 16 Jul 2018, 3:04:34 UTC - in response to Message 35922.  

Hi All
Found some very peculiar lines in the stderr output files for the latest two tasks after making above changes.
When I get access to my machine I will provide an update.
Still no hits file, so investigation continues.
Regards.
ID: 35926 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 34
Credit: 4,962,081
RAC: 10,961
Message 35931 - Posted: 16 Jul 2018, 7:36:38 UTC - in response to Message 35927.  

Hi All

Seems as I am not the only one who is completing workunits that have been marked as valid (at boinc level) but no HITS file is present.

As per the following workunits

https://lhcathome.cern.ch/lhcathome/result.php?resultid=200171936
https://lhcathome.cern.ch/lhcathome/result.php?resultid=200168456
https://lhcathome.cern.ch/lhcathome/result.php?resultid=200077351

I am going to try a later version of VB and also this is an extract from my last workunit - https://lhcathome.cern.ch/lhcathome/result.php?resultid=200161943

2018-07-16 05:50:28 (7532): Guest Log: Starting ATLAS job. (PandaID=3983550564 taskID=14530897)
2018-07-16 06:08:08 (7532): Guest Log: log_extracts:
2018-07-16 06:08:08 (7532): Guest Log: - Last 10 lines from /home/atlas01/RunAtlas/Panda_Pilot_3444_1531691438/PandaJob/athena_stdout.txt -
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.preExecute 2018-07-15 23:57:31,806 INFO Batch/grid running - command outputs will not be echoed. Logs for EVNTtoHITS are in log.EVNTtoHITS
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.preExecute 2018-07-15 23:57:31,808 INFO Now writing wrapper for substep executor EVNTtoHITS
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe._writeAthenaWrapper 2018-07-15 23:57:31,808 INFO Valgrind not engaged
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.preExecute 2018-07-15 23:57:31,808 INFO Athena will be executed in a subshell via ['./runwrapper.EVNTtoHITS.sh']
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.execute 2018-07-15 23:57:31,808 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh'])
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.execute 2018-07-16 00:05:00,192 INFO EVNTtoHITS executor returns 139
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.validate 2018-07-16 00:05:01,628 ERROR Validation of return code failed: EVNTtoHITS got a SIGSEGV signal (exit code 139) (Error code 65)
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.trfExe.validate 2018-07-16 00:05:01,679 INFO Scanning logfile log.EVNTtoHITS for errors
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.transform.execute 2018-07-16 00:05:01,724 CRITICAL Transform executor raised TransformValidationException: EVNTtoHITS got a SIGSEGV signal (exit code 139); Long ERROR message at line 1783 (see jobReport for further details)
2018-07-16 06:08:08 (7532): Guest Log: PyJobTransforms.transform.execute 2018-07-16 00:05:05,645 WARNING Transform now exiting early with exit code 65 (EVNTtoHITS got a SIGSEGV signal (exit code 139); Long ERROR message at line 1783 (see jobReport for further details))

Regards
ID: 35931 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2066
Credit: 155,451,247
RAC: 168,173
Message 35932 - Posted: 16 Jul 2018, 8:00:33 UTC

Hi tazzduke,
please write your messages in the atlas-forum, or in number crunshing.
This thread was for information about checklist only.
Thank you.
ID: 35932 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 34
Credit: 4,962,081
RAC: 10,961
Message 35934 - Posted: 16 Jul 2018, 8:06:47 UTC - in response to Message 35932.  

Understood

Regards
ID: 35934 · Report as offensive     Reply Quote
Ola

Send message
Joined: 7 Apr 18
Posts: 20
Credit: 137,327
RAC: 0
Message 36587 - Posted: 30 Aug 2018, 16:52:17 UTC

Hi All,
At the beginning I should tell that I don't know anything about virtualization so my question may seem very stupid. My English isn't impresive, too, I'm sorry.
I'm not sure how properly stop VB and turn off my computer. When I just turn off my computer, virtualisation doesn't stop running automatically. When I turned off the virtual machine at first, the simulation broke down. Have I done something wrong? I don't want to waste my computer's work again.
ID: 36587 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 27,111
Message 36588 - Posted: 30 Aug 2018, 16:54:18 UTC

Open your BOINC-client, Goto "File" and their "Shut down connected client".

This will shut down your Virtual Machines


Supporting BOINC, a great concept !
ID: 36588 · Report as offensive     Reply Quote
Ola

Send message
Joined: 7 Apr 18
Posts: 20
Credit: 137,327
RAC: 0
Message 36589 - Posted: 30 Aug 2018, 17:08:35 UTC - in response to Message 36588.  

Thank you very much, it works now well :) This way would never entrer my mind!
ID: 36589 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 36754 - Posted: 18 Sep 2018, 13:19:57 UTC - in response to Message 29359.  

Check, if you have have enough RAM for Atlas available. Each SingeCore-Atlas-Task needs 2,1 GB free RAM

Does this still apply, or is a single core setup more like 3,9 GB?

Thanks...
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 36754 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2375
Credit: 221,661,320
RAC: 143,659
Message 36755 - Posted: 18 Sep 2018, 13:47:57 UTC - in response to Message 36754.  

The recent standard setup for 1-core is 3900 MB (+ 900 MB per additional core).

There are task series that need less RAM but you may crash other series if you set it too low.
ID: 36755 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 27,111
Message 36756 - Posted: 18 Sep 2018, 14:27:37 UTC - in response to Message 36754.  

Check, if you have have enough RAM for Atlas available. Each SingeCore-Atlas-Task needs 2,1 GB free RAM

Does this still apply, or is a single core setup more like 3,9 GB?

We don't have any SingleCore-Tasks anymore. Nowerdays we always run MultiCore-WUs, but they can run with 1-Core and then they really need 3,9 GB RAM


Supporting BOINC, a great concept !
ID: 36756 · Report as offensive     Reply Quote
Ola

Send message
Joined: 7 Apr 18
Posts: 20
Credit: 137,327
RAC: 0
Message 37023 - Posted: 14 Oct 2018, 18:53:08 UTC

I can't find the client_state.xml in BOINC file. I use Windows 10, has it a different name?
ID: 37023 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 27,111
Message 37024 - Posted: 14 Oct 2018, 19:02:23 UTC - in response to Message 37023.  

Check here please: https://boinc.berkeley.edu/wiki/BOINC_Data_directory


Supporting BOINC, a great concept !
ID: 37024 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Checklist Version 3 for Atlas@Home (and other VM-based Projects) on your PC


©2024 CERN