Message boards : Number crunching : Stuck at 100%
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 321
Credit: 44,448,633
RAC: 5,353
Message 32232 - Posted: 4 Sep 2017, 16:20:14 UTC - in response to Message 32230.  

The server status shows the number of Atlas tasks ready to send is fluctuating, so I assume they're getting sent out to somebody. I tried two of my computers and neither would take Atlas tasks, but they both managed to take various other subproject tasks. Very strange.

Yeah, the scheduler seems to have consumed some drugs. It is not acting rational at the moment.

For some PCs of mine I get Atlas-Tasks immediatly when they ask for them; other machines are getting none. And these machines have already crunched hundreds of Atlas-tasks so they shouldn't be the origin of the problems.


Supporting BOINC, a great concept !
ID: 32232 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 43
Credit: 80,111
RAC: 0
Message 32233 - Posted: 4 Sep 2017, 16:40:10 UTC - in response to Message 32232.  

The server status shows the number of Atlas tasks ready to send is fluctuating, so I assume they're getting sent out to somebody. I tried two of my computers and neither would take Atlas tasks, but they both managed to take various other subproject tasks. Very strange.

Yeah, the scheduler seems to have consumed some drugs. It is not acting rational at the moment.

For some PCs of mine I get Atlas-Tasks immediatly when they ask for them; other machines are getting none. And these machines have already crunched hundreds of Atlas-tasks so they shouldn't be the origin of the problems.


Ok thanks, I'll test the Atlas on my questionable PC in a few days once things have calmed down.
ID: 32233 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 43
Credit: 80,111
RAC: 0
Message 32247 - Posted: 5 Sep 2017, 13:23:01 UTC

Update: I've retried telling all my 4 computers to run only Atlas tasks (now that the server is willing to hand them out), nothing else from LHC or any other project is allowed to run. None of them are working on anything else (except one which just has a browser and email open).

Red: running 2 core Atlas on a 4 core i5 3570K (I leave 2 cores free to allow me to use the machine for other things and to run GPU tasks (which are disabled for this test).

Open: running 2 core Atlas on a 2 core Celeron G1620.

Green: running 2 core Atlas on a 2 core Pentium G3420.

Black: running 4 core Atlas on a 4 core Core2quad Q8400.

All are showing the percentage increasing steadily, and showing a CPU time close to the elapsed time.

I'll keep an eye on them and see if they complete successfully.
ID: 32247 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 43
Credit: 80,111
RAC: 0
Message 32248 - Posted: 5 Sep 2017, 13:24:14 UTC

Update: one computer finished a task unexpectedly quickly. The percentages were all low (less than 5%) and the next time I looked, this one was being reported (it doesn't say it failed):

Computer: Green
Project LHC@home

Name xUKKDmQZF9qnSu7Ccp2YYBZmABFKDmABFKDmXNGKDmHodKDmwBjjXm_0

Application ATLAS Simulation 1.01 (vbox64_mt_mcore_atlas)
Workunit name xUKKDmQZF9qnSu7Ccp2YYBZmABFKDmABFKDmXNGKDmHodKDmwBjjXm
State Ready to report
Received Tue 05/09/2017 1:59:26 pm
Report deadline Tue 12/09/2017 1:59:41 pm
Estimated app speed 0.93 GFLOPs/sec
Estimated task size 16,020 GFLOPs
Resources 2 CPUs
CPU time at last checkpoint 00:00:00
CPU time 00:11:48
Elapsed time 00:15:58
Estimated time remaining 00:00:00
Fraction done 100%
Virtual memory size 0.00 MB
Working set size 0.00 MB
ID: 32248 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 43
Credit: 80,111
RAC: 0
Message 32249 - Posted: 5 Sep 2017, 13:25:14 UTC
Last modified: 5 Sep 2017, 13:27:23 UTC

And the other three done quickly:

Computer: Open
Project LHC@home

Name BP8MDmpxM9qnDDn7oo6G73TpABFKDmABFKDmS7FKDmABFKDm3WyTcm_2

Application ATLAS Simulation 1.01 (vbox64_mt_mcore_atlas)
Workunit name BP8MDmpxM9qnDDn7oo6G73TpABFKDmABFKDmS7FKDmABFKDm3WyTcm
State Ready to report
Received Tue 05/09/2017 1:59:30 pm
Report deadline Tue 12/09/2017 1:59:41 pm
Estimated app speed 0.75 GFLOPs/sec
Estimated task size 16,020 GFLOPs
Resources 2 CPUs
CPU time at last checkpoint 00:00:00
CPU time 00:17:28
Elapsed time 00:20:01
Estimated time remaining 00:00:00
Fraction done 100%
Virtual memory size 0.00 MB
Working set size 0.00 MB





Computer: Red
Project LHC@home

Name 7orNDmX0P9qnDDn7oo6G73TpABFKDmABFKDmiCKKDmABFKDmS0xgNo_1

Application ATLAS Simulation 1.01 (vbox64_mt_mcore_atlas)
Workunit name 7orNDmX0P9qnDDn7oo6G73TpABFKDmABFKDmiCKKDmABFKDmS0xgNo
State Ready to report
Received Tue 05/09/2017 1:59:39 pm
Report deadline Tue 12/09/2017 1:59:41 pm
Estimated app speed 1.05 GFLOPs/sec
Estimated task size 16,020 GFLOPs
Resources 2 CPUs
CPU time at last checkpoint 00:00:00
CPU time 00:04:14
Elapsed time 00:12:38
Estimated time remaining 00:00:00
Fraction done 100%
Virtual memory size 0.00 MB
Working set size 0.00 MB


Computer: Black
Project LHC@home

Name vYSMDmWZF9qnSu7Ccp2YYBZmABFKDmABFKDmXNGKDmModKDmoldlNm_0

Application ATLAS Simulation 1.01 (vbox64_mt_mcore_atlas)
Workunit name vYSMDmWZF9qnSu7Ccp2YYBZmABFKDmABFKDmXNGKDmModKDmoldlNm
State Ready to report
Received Tue 05/09/2017 1:59:31 pm
Report deadline Tue 12/09/2017 1:59:44 pm
Estimated app speed 1.46 GFLOPs/sec
Estimated task size 16,020 GFLOPs
Resources 4 CPUs
CPU time at last checkpoint 00:00:00
CPU time 00:12:06
Elapsed time 00:10:06
Estimated time remaining 00:00:00
Fraction done 100%
Virtual memory size 0.00 MB
Working set size 0.00 MB
ID: 32249 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 43
Credit: 80,111
RAC: 0
Message 32250 - Posted: 5 Sep 2017, 13:31:00 UTC
Last modified: 5 Sep 2017, 13:31:11 UTC

3 of the 4 show validate error on the server, 1 shows ok. Any ideas? Is something wrong at my end?
ID: 32250 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 321
Credit: 44,448,633
RAC: 5,353
Message 32251 - Posted: 5 Sep 2017, 13:38:44 UTC - in response to Message 32250.  

3 of the 4 show validate error on the server, 1 shows ok. Any ideas? Is something wrong at my end?

So far as I could see you are trying to run 2-Core-WUs

At the moment, 2-Core-WUs are misconfigured "out of the box". They have too low memory configured.

You can cure this on following ways:

1) Increase the number of cores to 3
2) Decrease the number of cores to 1
3) Keep the number of cores with 2, but use an app_config.xml to pimp the memory up to the needed figure.

Perhaps you can find more info in this thread: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4378


Supporting BOINC, a great concept !
ID: 32251 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 43
Credit: 80,111
RAC: 0
Message 32252 - Posted: 5 Sep 2017, 13:45:22 UTC

I would like to increase the memory, not change the cores. Two machines only have two cores. One of the quad cores needs some cores free for other tasks. The other quad core is running 4 core tasks (but the first of those also failed - a different problem?).

Can I just change the memory in the BOINC settings or do I have to use the xml file? If so how and to what value? The computer called Red has 32GB physical RAM, the others all have 8GB.
ID: 32252 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 321
Credit: 44,448,633
RAC: 5,353
Message 32253 - Posted: 5 Sep 2017, 13:53:03 UTC

You have to put the app_config.xml into your Project-Directory and then command the boinc-client to "read config-files"

<app_config>
<app>
<name>ATLAS</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>ATLAS</app_name>
<version_num>100</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>2.000000</avg_ncpus>
<max_ncpus>2.000000</max_ncpus>
<plan_class>vbox64_mt_mcore_atlas</plan_class>
<api_version>7.7.0</api_version>
<cmdline>--memory_size_mb 5000</cmdline>
<dont_throttle/>
<is_wrapper/>
<needs_network/>
</app_version>

</app_config>


Supporting BOINC, a great concept !
ID: 32253 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 43
Credit: 80,111
RAC: 0
Message 32255 - Posted: 5 Sep 2017, 14:09:21 UTC - in response to Message 32253.  

I'm doing this for Red (4 core PC with only 2 cores active), and Green & Open (both 2 core PCs). I got the message "found app_config.xml" - does this mean it worked? Do I have to restart the BOINC client?

For Black (4 core PC), what do you suggest? It's currently running 4 core Atlas tasks with 8GB of physical RAM, and they are failing.
ID: 32255 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 321
Credit: 44,448,633
RAC: 5,353
Message 32256 - Posted: 5 Sep 2017, 14:12:53 UTC - in response to Message 32255.  
Last modified: 5 Sep 2017, 14:13:18 UTC

I'm doing this for Red (4 core PC with only 2 cores active), and Green & Open (both 2 core PCs). I got the message "found app_config.xml" - does this mean it worked?

If there is no error-message behind then it should be fine

Do I have to restart the BOINC client?

Not really, but it will only affect WUs, that have been newly downloaded

For Black (4 core PC), what do you suggest? It's currently running 4 core Atlas tasks with 8GB of physical RAM, and they are failing.

Try a 3-Core-WU


Supporting BOINC, a great concept !
ID: 32256 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 43
Credit: 80,111
RAC: 0
Message 32257 - Posted: 5 Sep 2017, 14:18:01 UTC - in response to Message 32256.  
Last modified: 5 Sep 2017, 14:19:08 UTC

The failures for Black (4 core PC) were before I gave it the extra memory instruction. Should that help for 4 core WUs aswell? It's just you said the bug was in 2 core WUs. Is 5GB correct RAM fora 4 core task?
ID: 32257 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 321
Credit: 44,448,633
RAC: 5,353
Message 32258 - Posted: 5 Sep 2017, 14:26:50 UTC - in response to Message 32257.  

Is 5GB correct RAM fora 4 core task?

The memory-formular is: 2,6 GB + 0,8 * number of cores

This says you need 5,8 GB for a 4-Core-WU. A 3-Core-WU would be ideal


Supporting BOINC, a great concept !
ID: 32258 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 43
Credit: 80,111
RAC: 0
Message 32259 - Posted: 5 Sep 2017, 14:32:56 UTC - in response to Message 32258.  

They were already being automatically allocated 4.2GB (the correct amount from the formula) for the 2 core units. And 5.8GB for the 4 core units.

I thought they needed extra as the config file you gave me was 5GB, not 4.2GB.

Should I try giving the 4 core PC extra, something like 7GB? Or would this screw it up as there's only 8GB physical RAM?
ID: 32259 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 321
Credit: 44,448,633
RAC: 5,353
Message 32261 - Posted: 5 Sep 2017, 14:40:57 UTC - in response to Message 32259.  

Should I try giving the 4 core PC extra, something like 7GB? Or would this screw it up as there's only 8GB physical RAM?

This will never work but screw you machine and all Atlas-Tasks down


Supporting BOINC, a great concept !
ID: 32261 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 43
Credit: 80,111
RAC: 0
Message 32263 - Posted: 5 Sep 2017, 14:45:46 UTC - in response to Message 32261.  
Last modified: 5 Sep 2017, 14:47:11 UTC

Ok, so I have set the 2 core machines to use 5GB as per your file.

I'll set the 4 core machine to do 3 core tasks. What memory would you advise?
ID: 32263 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 321
Credit: 44,448,633
RAC: 5,353
Message 32265 - Posted: 5 Sep 2017, 14:47:53 UTC - in response to Message 32263.  

Ok, so I have set the 2 core machines to use 5GB as per your file.

I'll set the 4 core machine to do 3 core tasks. What memory would you advise?

5 GB should be fine


Supporting BOINC, a great concept !
ID: 32265 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 43
Credit: 80,111
RAC: 0
Message 32267 - Posted: 5 Sep 2017, 14:53:11 UTC - in response to Message 32265.  

Done, thanks for your help. I'll run Atlas on everything for the rest of the day and report back later....
ID: 32267 · Report as offensive     Reply Quote
Profile thomasroderick

Send message
Joined: 22 May 17
Posts: 13
Credit: 476,708
RAC: 663
Message 32272 - Posted: 5 Sep 2017, 15:27:00 UTC - in response to Message 32192.  

I have the exact same issue with a single Atlas task, just posted something in the Atlas page about this. Every other task is running / completing as normal. This one was fine up to about 97% complete, then got to 100% over the next 43 hours. It is sitting at 100% after 45 elapsed hours. Task: 154132448, WU: 73907132. Doing some troubleshooting on it now, the deadline is in 12 hours.
ID: 32272 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 43
Credit: 80,111
RAC: 0
Message 32284 - Posted: 5 Sep 2017, 18:18:41 UTC

The three machines running 2 core WUs are now getting valid results, and taking a few hours instead of 15 minutes.

Waiting for the three core machine to finish one....
ID: 32284 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Stuck at 100%


©2018 CERN