Message boards :
ATLAS application :
Task processing slowing down considerably beyond ~85% progress
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 254,141,251 RAC: 54,753 |
Your logfile shows this line: 2019-09-09 16:24:31 (4236): Guest Log: No HITS file was produced This indicates that it was not really a success from the project's perspective. Other lines in the logfile indicate that the host struggled very hard to keep the task running: 2019-09-06 11:46:27 (3720): Successfully stopped VM. . # less than 30 s later! 2019-09-06 11:47:00 (1128): vboxwrapper (7.7.26196): starting . . . 2019-09-06 17:31:10 (1128): Successfully stopped VM. . # again a very short suspend period! 2019-09-06 17:32:03 (8544): vboxwrapper (7.7.26196): starting . . . 2019-09-07 15:11:39 (8544): Successfully stopped VM. . # here too! 2019-09-07 15:12:00 (6552): vboxwrapper (7.7.26196): starting . # This section is critical. It shows that the computer is too busy. 2019-09-09 16:11:16 (7352): Powering off VM. 2019-09-09 16:11:16 (7352): Error in poweroff VM for VM: -108 Command: VBoxManage -q controlvm "boinc_58d2a2a6505fb873" poweroff Output: 2019-09-09 16:11:16 (7352): VM did not power off when requested. 2019-09-09 16:11:16 (7352): VM was NOT successfully terminated. 2019-09-09 16:17:15 (4236): vboxwrapper (7.7.26196): starting . # The next restart might have caused the task to start completely from the scratch, hence the long total walltime. Timestamps from other tasks show that a couple of ATLAS tasks are running concurrently. You may check if the BOINC client allows enough resources - RAM in this case - to be used to satisfy all tasks. This should avoid series of suspend/resume. In addition you may reduce the # of concurrently running tasks until you get results that produce HITS files. Then slightly increase the # of concurrently running tasks. |
Send message Joined: 9 Aug 05 Posts: 36 Credit: 7,698,293 RAC: 0 |
Now you can run 4 or more cores instead of 2. :-) Yes i have changed my config to run 2 concurrent 4-core tasks. You may check if the BOINC client allows enough resources - RAM in this case - to be used to satisfy all tasks. What is a HITS file? i will check my future logs to see if they are running properly. RAM settings were 90%x16GB=14,4GB (i increased this boinc setting to use 100% of RAM) Is it 16GB RAM enough to run 2 concurrent 4-cores tasks + 1 Seti GPU task? Or should i upgrade my machine to 32GB? |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 1,652 |
Filipe, have the same Ryzen 2700 with two Atlas and using 6 CPU's, but with 32 GByte RAM. You are running 4 Atlas. When you have only 16 Gbyte. This is to low. |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 254,141,251 RAC: 54,753 |
What is a HITS file? It contains the scientific result that will be uploaded to the project server. RAM settings were 90%x16GB=14,4GB (i increased this boinc setting to use 100% of RAM) A 4-core VM requires 6600 MB RAM (3000 + 900 * #cores). 16 GB should be enough to run 2 of them + 1 GPU task. This should be a good configuration to start with as long as no tasks from other projects are running. |
Send message Joined: 9 Aug 05 Posts: 36 Credit: 7,698,293 RAC: 0 |
What is a HITS file? Should every task produce a HITS file? Or is it depending on what as been calculated/found? |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 254,141,251 RAC: 54,753 |
Every task downloads an EVNT file that contains 200 events, converts them and stores the results in the HITS file. As the events are independent from each other ATLAS can be configured to process them using concurrently running threads (n-core setup). If for whatever reason the task doesn't produce that HITS file the events have to be rescheduled. |
Send message Joined: 27 Sep 08 Posts: 847 Credit: 692,010,305 RAC: 113,584 |
on one of my computers the task are all at 100% but still running? |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 1,652 |
Boinc-Manager - show graphic (Your Contribution-Your Job) - or show RDP Console (ALT+F2) Very Long runner are there (more than 24 hours for 4 Cores) Edit: 1000 sec every Collision - 200 are to do. |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 254,141,251 RAC: 54,753 |
Usually nothing to worry about. See David Cameron's comment: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5135&postid=39837 It's just the BOINC client that gets confused when the tasks have different runtimes. |
©2024 CERN