Message boards : ATLAS application : Task processing slowing down considerably beyond ~85% progress
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,022,569
RAC: 136,334
Message 39879 - Posted: 9 Sep 2019, 16:36:32 UTC - in response to Message 39876.  

Your logfile shows this line:
2019-09-09 16:24:31 (4236): Guest Log: No HITS file was produced

This indicates that it was not really a success from the project's perspective.

Other lines in the logfile indicate that the host struggled very hard to keep the task running:
2019-09-06 11:46:27 (3720): Successfully stopped VM.
.    # less than 30 s later!
2019-09-06 11:47:00 (1128): vboxwrapper (7.7.26196): starting
.
.
.
2019-09-06 17:31:10 (1128): Successfully stopped VM.
.    # again a very short suspend period!
2019-09-06 17:32:03 (8544): vboxwrapper (7.7.26196): starting
.
.
.
2019-09-07 15:11:39 (8544): Successfully stopped VM.
.    # here too!
2019-09-07 15:12:00 (6552): vboxwrapper (7.7.26196): starting


.   # This section is critical. It shows that the computer is too busy.
2019-09-09 16:11:16 (7352): Powering off VM.
2019-09-09 16:11:16 (7352): Error in poweroff VM for VM: -108
Command:
VBoxManage -q controlvm "boinc_58d2a2a6505fb873" poweroff
Output:

2019-09-09 16:11:16 (7352): VM did not power off when requested.
2019-09-09 16:11:16 (7352): VM was NOT successfully terminated.
2019-09-09 16:17:15 (4236): vboxwrapper (7.7.26196): starting
.   # The next restart might have caused the task to start completely from the scratch, hence the long total walltime.


Timestamps from other tasks show that a couple of ATLAS tasks are running concurrently.
You may check if the BOINC client allows enough resources - RAM in this case - to be used to satisfy all tasks.
This should avoid series of suspend/resume.
In addition you may reduce the # of concurrently running tasks until you get results that produce HITS files.
Then slightly increase the # of concurrently running tasks.
ID: 39879 · Report as offensive     Reply Quote
Filipe

Send message
Joined: 9 Aug 05
Posts: 36
Credit: 7,693,055
RAC: 146
Message 39880 - Posted: 9 Sep 2019, 19:17:18 UTC - in response to Message 39877.  
Last modified: 9 Sep 2019, 19:19:11 UTC

Now you can run 4 or more cores instead of 2. :-)


Yes i have changed my config to run 2 concurrent 4-core tasks.

You may check if the BOINC client allows enough resources - RAM in this case - to be used to satisfy all tasks.
This should avoid series of suspend/resume.
In addition you may reduce the # of concurrently running tasks until you get results that produce HITS files.
Then slightly increase the # of concurrently running tasks.


What is a HITS file? i will check my future logs to see if they are running properly.
RAM settings were 90%x16GB=14,4GB (i increased this boinc setting to use 100% of RAM)
Is it 16GB RAM enough to run 2 concurrent 4-cores tasks + 1 Seti GPU task?
Or should i upgrade my machine to 32GB?
ID: 39880 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,189,624
RAC: 104,260
Message 39881 - Posted: 9 Sep 2019, 19:41:47 UTC

Filipe,
have the same Ryzen 2700 with two Atlas and using 6 CPU's, but with 32 GByte RAM.
You are running 4 Atlas. When you have only 16 Gbyte. This is to low.
ID: 39881 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,022,569
RAC: 136,334
Message 39882 - Posted: 9 Sep 2019, 19:54:28 UTC - in response to Message 39880.  

What is a HITS file?

It contains the scientific result that will be uploaded to the project server.


RAM settings were 90%x16GB=14,4GB (i increased this boinc setting to use 100% of RAM)
Is it 16GB RAM enough to run 2 concurrent 4-cores tasks + 1 Seti GPU task?

A 4-core VM requires 6600 MB RAM (3000 + 900 * #cores).
16 GB should be enough to run 2 of them + 1 GPU task.
This should be a good configuration to start with as long as no tasks from other projects are running.
ID: 39882 · Report as offensive     Reply Quote
Filipe

Send message
Joined: 9 Aug 05
Posts: 36
Credit: 7,693,055
RAC: 146
Message 39883 - Posted: 9 Sep 2019, 20:47:48 UTC - in response to Message 39882.  
Last modified: 9 Sep 2019, 20:48:21 UTC

What is a HITS file?
It contains the scientific result that will be uploaded to the project server.


Should every task produce a HITS file? Or is it depending on what as been calculated/found?
ID: 39883 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,022,569
RAC: 136,334
Message 39884 - Posted: 9 Sep 2019, 21:02:31 UTC - in response to Message 39883.  

Every task downloads an EVNT file that contains 200 events, converts them and stores the results in the HITS file.
As the events are independent from each other ATLAS can be configured to process them using concurrently running threads (n-core setup).
If for whatever reason the task doesn't produce that HITS file the events have to be rescheduled.
ID: 39884 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,844,875
RAC: 227,045
Message 39886 - Posted: 10 Sep 2019, 6:01:25 UTC

on one of my computers the task are all at 100% but still running?
ID: 39886 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,189,624
RAC: 104,260
Message 39887 - Posted: 10 Sep 2019, 6:17:37 UTC - in response to Message 39886.  
Last modified: 10 Sep 2019, 6:23:40 UTC

Boinc-Manager - show graphic (Your Contribution-Your Job) - or show RDP Console (ALT+F2)
Very Long runner are there (more than 24 hours for 4 Cores)

Edit: 1000 sec every Collision - 200 are to do.
ID: 39887 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,022,569
RAC: 136,334
Message 39889 - Posted: 10 Sep 2019, 6:21:21 UTC - in response to Message 39886.  

Usually nothing to worry about.
See David Cameron's comment:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5135&postid=39837

It's just the BOINC client that gets confused when the tasks have different runtimes.
ID: 39889 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : ATLAS application : Task processing slowing down considerably beyond ~85% progress


©2024 CERN