Message boards :
Number crunching :
Stuck at 100%
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 2 Sep 04 Posts: 444 Credit: 165,139,264 RAC: 212,416 ![]() ![]() ![]() |
The server status shows the number of Atlas tasks ready to send is fluctuating, so I assume they're getting sent out to somebody. I tried two of my computers and neither would take Atlas tasks, but they both managed to take various other subproject tasks. Very strange. Yeah, the scheduler seems to have consumed some drugs. It is not acting rational at the moment. For some PCs of mine I get Atlas-Tasks immediatly when they ask for them; other machines are getting none. And these machines have already crunched hundreds of Atlas-tasks so they shouldn't be the origin of the problems. ![]() Supporting BOINC, a great concept ! |
Send message Joined: 12 Aug 06 Posts: 334 Credit: 3,022,283 RAC: 181 ![]() ![]() |
The server status shows the number of Atlas tasks ready to send is fluctuating, so I assume they're getting sent out to somebody. I tried two of my computers and neither would take Atlas tasks, but they both managed to take various other subproject tasks. Very strange. Ok thanks, I'll test the Atlas on my questionable PC in a few days once things have calmed down. ![]() |
Send message Joined: 12 Aug 06 Posts: 334 Credit: 3,022,283 RAC: 181 ![]() ![]() |
Update: I've retried telling all my 4 computers to run only Atlas tasks (now that the server is willing to hand them out), nothing else from LHC or any other project is allowed to run. None of them are working on anything else (except one which just has a browser and email open). Red: running 2 core Atlas on a 4 core i5 3570K (I leave 2 cores free to allow me to use the machine for other things and to run GPU tasks (which are disabled for this test). Open: running 2 core Atlas on a 2 core Celeron G1620. Green: running 2 core Atlas on a 2 core Pentium G3420. Black: running 4 core Atlas on a 4 core Core2quad Q8400. All are showing the percentage increasing steadily, and showing a CPU time close to the elapsed time. I'll keep an eye on them and see if they complete successfully. ![]() |
Send message Joined: 12 Aug 06 Posts: 334 Credit: 3,022,283 RAC: 181 ![]() ![]() |
Update: one computer finished a task unexpectedly quickly. The percentages were all low (less than 5%) and the next time I looked, this one was being reported (it doesn't say it failed): Computer: Green Project LHC@home Name xUKKDmQZF9qnSu7Ccp2YYBZmABFKDmABFKDmXNGKDmHodKDmwBjjXm_0 Application ATLAS Simulation 1.01 (vbox64_mt_mcore_atlas) Workunit name xUKKDmQZF9qnSu7Ccp2YYBZmABFKDmABFKDmXNGKDmHodKDmwBjjXm State Ready to report Received Tue 05/09/2017 1:59:26 pm Report deadline Tue 12/09/2017 1:59:41 pm Estimated app speed 0.93 GFLOPs/sec Estimated task size 16,020 GFLOPs Resources 2 CPUs CPU time at last checkpoint 00:00:00 CPU time 00:11:48 Elapsed time 00:15:58 Estimated time remaining 00:00:00 Fraction done 100% Virtual memory size 0.00 MB Working set size 0.00 MB ![]() |
Send message Joined: 12 Aug 06 Posts: 334 Credit: 3,022,283 RAC: 181 ![]() ![]() |
And the other three done quickly: Computer: Open Project LHC@home Name BP8MDmpxM9qnDDn7oo6G73TpABFKDmABFKDmS7FKDmABFKDm3WyTcm_2 Application ATLAS Simulation 1.01 (vbox64_mt_mcore_atlas) Workunit name BP8MDmpxM9qnDDn7oo6G73TpABFKDmABFKDmS7FKDmABFKDm3WyTcm State Ready to report Received Tue 05/09/2017 1:59:30 pm Report deadline Tue 12/09/2017 1:59:41 pm Estimated app speed 0.75 GFLOPs/sec Estimated task size 16,020 GFLOPs Resources 2 CPUs CPU time at last checkpoint 00:00:00 CPU time 00:17:28 Elapsed time 00:20:01 Estimated time remaining 00:00:00 Fraction done 100% Virtual memory size 0.00 MB Working set size 0.00 MB Computer: Red Project LHC@home Name 7orNDmX0P9qnDDn7oo6G73TpABFKDmABFKDmiCKKDmABFKDmS0xgNo_1 Application ATLAS Simulation 1.01 (vbox64_mt_mcore_atlas) Workunit name 7orNDmX0P9qnDDn7oo6G73TpABFKDmABFKDmiCKKDmABFKDmS0xgNo State Ready to report Received Tue 05/09/2017 1:59:39 pm Report deadline Tue 12/09/2017 1:59:41 pm Estimated app speed 1.05 GFLOPs/sec Estimated task size 16,020 GFLOPs Resources 2 CPUs CPU time at last checkpoint 00:00:00 CPU time 00:04:14 Elapsed time 00:12:38 Estimated time remaining 00:00:00 Fraction done 100% Virtual memory size 0.00 MB Working set size 0.00 MB Computer: Black Project LHC@home Name vYSMDmWZF9qnSu7Ccp2YYBZmABFKDmABFKDmXNGKDmModKDmoldlNm_0 Application ATLAS Simulation 1.01 (vbox64_mt_mcore_atlas) Workunit name vYSMDmWZF9qnSu7Ccp2YYBZmABFKDmABFKDmXNGKDmModKDmoldlNm State Ready to report Received Tue 05/09/2017 1:59:31 pm Report deadline Tue 12/09/2017 1:59:44 pm Estimated app speed 1.46 GFLOPs/sec Estimated task size 16,020 GFLOPs Resources 4 CPUs CPU time at last checkpoint 00:00:00 CPU time 00:12:06 Elapsed time 00:10:06 Estimated time remaining 00:00:00 Fraction done 100% Virtual memory size 0.00 MB Working set size 0.00 MB ![]() |
Send message Joined: 12 Aug 06 Posts: 334 Credit: 3,022,283 RAC: 181 ![]() ![]() |
3 of the 4 show validate error on the server, 1 shows ok. Any ideas? Is something wrong at my end? ![]() |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 444 Credit: 165,139,264 RAC: 212,416 ![]() ![]() ![]() |
3 of the 4 show validate error on the server, 1 shows ok. Any ideas? Is something wrong at my end? So far as I could see you are trying to run 2-Core-WUs At the moment, 2-Core-WUs are misconfigured "out of the box". They have too low memory configured. You can cure this on following ways: 1) Increase the number of cores to 3 2) Decrease the number of cores to 1 3) Keep the number of cores with 2, but use an app_config.xml to pimp the memory up to the needed figure. Perhaps you can find more info in this thread: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4378 ![]() Supporting BOINC, a great concept ! |
Send message Joined: 12 Aug 06 Posts: 334 Credit: 3,022,283 RAC: 181 ![]() ![]() |
I would like to increase the memory, not change the cores. Two machines only have two cores. One of the quad cores needs some cores free for other tasks. The other quad core is running 4 core tasks (but the first of those also failed - a different problem?). Can I just change the memory in the BOINC settings or do I have to use the xml file? If so how and to what value? The computer called Red has 32GB physical RAM, the others all have 8GB. ![]() |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 444 Credit: 165,139,264 RAC: 212,416 ![]() ![]() ![]() |
You have to put the app_config.xml into your Project-Directory and then command the boinc-client to "read config-files" <app_config> <app> <name>ATLAS</name> <fraction_done_exact/> <max_concurrent>1</max_concurrent> </app> <app_version> <app_name>ATLAS</app_name> <version_num>100</version_num> <platform>windows_x86_64</platform> <avg_ncpus>2.000000</avg_ncpus> <max_ncpus>2.000000</max_ncpus> <plan_class>vbox64_mt_mcore_atlas</plan_class> <api_version>7.7.0</api_version> <cmdline>--memory_size_mb 5000</cmdline> <dont_throttle/> <is_wrapper/> <needs_network/> </app_version> </app_config> ![]() Supporting BOINC, a great concept ! |
Send message Joined: 12 Aug 06 Posts: 334 Credit: 3,022,283 RAC: 181 ![]() ![]() |
I'm doing this for Red (4 core PC with only 2 cores active), and Green & Open (both 2 core PCs). I got the message "found app_config.xml" - does this mean it worked? Do I have to restart the BOINC client? For Black (4 core PC), what do you suggest? It's currently running 4 core Atlas tasks with 8GB of physical RAM, and they are failing. ![]() |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 444 Credit: 165,139,264 RAC: 212,416 ![]() ![]() ![]() |
I'm doing this for Red (4 core PC with only 2 cores active), and Green & Open (both 2 core PCs). I got the message "found app_config.xml" - does this mean it worked? If there is no error-message behind then it should be fine Do I have to restart the BOINC client? Not really, but it will only affect WUs, that have been newly downloaded For Black (4 core PC), what do you suggest? It's currently running 4 core Atlas tasks with 8GB of physical RAM, and they are failing. Try a 3-Core-WU ![]() Supporting BOINC, a great concept ! |
Send message Joined: 12 Aug 06 Posts: 334 Credit: 3,022,283 RAC: 181 ![]() ![]() |
The failures for Black (4 core PC) were before I gave it the extra memory instruction. Should that help for 4 core WUs aswell? It's just you said the bug was in 2 core WUs. Is 5GB correct RAM fora 4 core task? ![]() |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 444 Credit: 165,139,264 RAC: 212,416 ![]() ![]() ![]() |
|
Send message Joined: 12 Aug 06 Posts: 334 Credit: 3,022,283 RAC: 181 ![]() ![]() |
They were already being automatically allocated 4.2GB (the correct amount from the formula) for the 2 core units. And 5.8GB for the 4 core units. I thought they needed extra as the config file you gave me was 5GB, not 4.2GB. Should I try giving the 4 core PC extra, something like 7GB? Or would this screw it up as there's only 8GB physical RAM? ![]() |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 444 Credit: 165,139,264 RAC: 212,416 ![]() ![]() ![]() |
|
Send message Joined: 12 Aug 06 Posts: 334 Credit: 3,022,283 RAC: 181 ![]() ![]() |
Ok, so I have set the 2 core machines to use 5GB as per your file. I'll set the 4 core machine to do 3 core tasks. What memory would you advise? ![]() |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 444 Credit: 165,139,264 RAC: 212,416 ![]() ![]() ![]() |
|
Send message Joined: 12 Aug 06 Posts: 334 Credit: 3,022,283 RAC: 181 ![]() ![]() |
Done, thanks for your help. I'll run Atlas on everything for the rest of the day and report back later.... ![]() |
![]() Send message Joined: 22 May 17 Posts: 15 Credit: 1,111,237 RAC: 1,080 ![]() ![]() ![]() |
I have the exact same issue with a single Atlas task, just posted something in the Atlas page about this. Every other task is running / completing as normal. This one was fine up to about 97% complete, then got to 100% over the next 43 hours. It is sitting at 100% after 45 elapsed hours. Task: 154132448, WU: 73907132. Doing some troubleshooting on it now, the deadline is in 12 hours. |
Send message Joined: 12 Aug 06 Posts: 334 Credit: 3,022,283 RAC: 181 ![]() ![]() |
The three machines running 2 core WUs are now getting valid results, and taking a few hours instead of 15 minutes. Waiting for the three core machine to finish one.... ![]() |
©2023 CERN