Message boards : ATLAS application : Curious question
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 40202 - Posted: 19 Oct 2019, 0:04:30 UTC

How can the app run for over 2 days and use only .8 seconds of cpu time and then get stuck at 99.997% and show 5 seconds left and do absolutely nothing. Boinc Tasks and Boinc Manager show it running, but Tasks says there is 0% CPU usage even though it "running".

This is very illogical.
ID: 40202 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 40204 - Posted: 19 Oct 2019, 7:18:24 UTC - in response to Message 40202.  

I killed the task.
2 days run time. 99.999 done.
0 cpu time
it was "running" but to me it was stuck or dead.

Error message showed up | LHC@home | [error] garbage_collect(); still have active task for acked result

Whatever that means.
ID: 40204 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 25 Sep 17
Posts: 93
Credit: 3,068,664
RAC: 1,808
Message 40206 - Posted: 19 Oct 2019, 7:46:27 UTC - in response to Message 40204.  

You seem to have memory related errors on a bunch of your recent Atlas tasks.

Are you running low on memory in Host OS or are you hitting a memory limit in your Preferences?
ID: 40206 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 119
Credit: 51,286,855
RAC: 20,720
Message 40208 - Posted: 19 Oct 2019, 14:15:34 UTC - in response to Message 40202.  

How can the app run for over 2 days and use only .8 seconds of cpu time and then get stuck at 99.997% and show 5 seconds left and do absolutely nothing. Boinc Tasks and Boinc Manager show it running, but Tasks says there is 0% CPU usage even though it "running".

This is very illogical.


I have the same situation on all my PC's running ATLAS since yesterday.
ID: 40208 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 40211 - Posted: 19 Oct 2019, 19:13:58 UTC - in response to Message 40206.  
Last modified: 19 Oct 2019, 19:18:48 UTC

Well I had a restriction on. But I freed that up.
I have BOINC set to 98% consumption if need.
Current system wide usage is only 44% of the total system memory which is 24 GB
I also gave it access to 95% of my drive.

10/19/2019 9:10:59 PM | | max memory usage when active: 24036.31 MB
10/19/2019 9:10:59 PM | | max memory usage when idle: 24526.85 MB
10/19/2019 9:10:59 PM | | max disk usage: 157.82 GB

So how the heck can I not have enough memory when I have 24,000MB available and if it needs to write to the disk for virtual memory, there is 157Gigs of space!!!


I updated VMBox as well today.
It seems that when the tasks reach the over 90% mark they bog down.
What is weird is BOINC is giving them only .01 and .02% of the cpu time but taking 8 cores.

BOINC tasks program shows that it moves .01% every 6 seconds or sometimes a minute.
This task seems dead or stalled really bad.

Computer: DESKTOP-LFM92VN
Project LHC@home

Name U0iNDmM9fdvn9Rq4apoT9bVoABFKDmABFKDmBDFaDmABFKDmoPhqUo_1

Application ATLAS Simulation 2.00 (vbox64_mt_mcore_atlas)
Workunit name U0iNDmM9fdvn9Rq4apoT9bVoABFKDmABFKDmBDFaDmABFKDmoPhqUo
State Running High P.
Received 14/10/2019 3:50:46
Report deadline 21/10/2019 3:50:46
Estimated app speed 2,10 GFLOPs/sec
Estimated task size 43.200 GFLOPs
Resources 8 CPUs
CPU time at last checkpoint 00:00:08
CPU time 00:00:08
Elapsed time 01d,06:06:51
Estimated time remaining 00:09:20
Fraction done 99,486%
Virtual memory size 122,71 MB
Working set size 10.200,00 MB
Directory slots/17
Process ID 12688


This one is moving along now:
Computer: DESKTOP-LFM92VN
Project LHC@home

Name idLMDmdzudvn9Rq4apoT9bVoABFKDmABFKDmlr4KDmABFKDm7kSBPn_0

Application ATLAS Simulation 2.00 (vbox64_mt_mcore_atlas)
Workunit name idLMDmdzudvn9Rq4apoT9bVoABFKDmABFKDmlr4KDmABFKDm7kSBPn
State Running High P.
Received 14/10/2019 3:50:46
Report deadline 21/10/2019 3:50:46
Estimated app speed 2,10 GFLOPs/sec
Estimated task size 43.200 GFLOPs
Resources 8 CPUs
CPU time at last checkpoint 00:00:06
CPU time 00:00:06
Elapsed time 10:32:56
Estimated time remaining 01:58:37
Fraction done 84,216%
Virtual memory size 121,66 MB
Working set size 10.200,00 MB
Directory slots/3
Process ID 10960

Maybe I should do a reset on LHC after I complete the current work?
In the 5 minutes it took me to copy and paste and write all this the 99% task moved only .02%.
ID: 40211 · Report as offensive     Reply Quote
opencw

Send message
Joined: 9 Aug 11
Posts: 6
Credit: 2,715,972
RAC: 0
Message 40212 - Posted: 19 Oct 2019, 19:41:57 UTC - in response to Message 40211.  
Last modified: 19 Oct 2019, 19:45:31 UTC

Hello greg_be,

how many tasks do you run in LHC@home simultaneously on the machine? Your log gives us an output there is not enough memory: "Unable to allocate and lock memory. The virtual machine will be paused. Please close applications to free up memory or close the VM"
ID: 40212 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 25 Sep 17
Posts: 93
Credit: 3,068,664
RAC: 1,808
Message 40213 - Posted: 19 Oct 2019, 21:49:43 UTC - in response to Message 40212.  

I am not familiar with how Virtual Box uses memory but I don't think it can page it out on the Host OS and requires real ram. Have you worked through Yeti's checklist in the Number Crunching forum? I would try to get one VB related task going and opt out of the other in this project until you get it sorted out. You do need to leave enough memory for the OS. I have mine set to 75% in use and 90% idle with 16Gb installed. You might be able to run two concurrent, eight core Atlas tasks but it doesn't leave much room.
ID: 40213 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 40214 - Posted: 19 Oct 2019, 23:12:34 UTC - in response to Message 40212.  

Hello greg_be,

how many tasks do you run in LHC@home simultaneously on the machine? Your log gives us an output there is not enough memory: "Unable to allocate and lock memory. The virtual machine will be paused. Please close applications to free up memory or close the VM"


Memory is NOT an issue. I think BOINC or the task are having issues. That or the VM is clogged. I noticed a lot of tasks there that are unreachable.

The memory errors you see on the old tasks are apparently from when I had the memory restriction to high. You will see I freed up just about 100% of the memory to be used.
ID: 40214 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 40215 - Posted: 19 Oct 2019, 23:23:45 UTC
Last modified: 19 Oct 2019, 23:24:51 UTC

I have aborted the one task that again froze up.
There is one ATLAS task running and its chugging along for now at .001% every 2 seconds according to BOINC tasks.
I also removed a bunch of processes that were inaccessible for whatever reasons in VMBox.

Will let it run for 7-8 hours while I sleep and see what happens.

But how can the task have only .01% CPU capacity and use 8 cores?
Also the CPU time despite running 8,12,15 or more hours is always 6 seconds.
How is that possible?

Set the project to no new tasks and will reset it once the tasks are done.
ID: 40215 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 40216 - Posted: 19 Oct 2019, 23:29:18 UTC - in response to Message 40213.  
Last modified: 19 Oct 2019, 23:30:48 UTC

I am not familiar with how Virtual Box uses memory but I don't think it can page it out on the Host OS and requires real ram. Have you worked through Yeti's checklist in the Number Crunching forum? I would try to get one VB related task going and opt out of the other in this project until you get it sorted out. You do need to leave enough memory for the OS. I have mine set to 75% in use and 90% idle with 16Gb installed. You might be able to run two concurrent, eight core Atlas tasks but it doesn't leave much room.


I went through the great Yeti's checklist. That was the first thing I did.
This lead me to look for the latest VMBox and extension pack.
BOINC does not have any new releases, so nothing to do there.
Memory allocation is wide open now. (see earlier post) as is HDD space in case it wants to use virtual memory (see same earlier post below)

Plus CMS runs fine and its a VM process. So what is it about ATLAS?

So I don't know what is going on. Other than CPU allocation is .01% and CPU time is frozen at 6 seconds and its holding 8 cores hostage and going nowhere.
ID: 40216 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 25 Sep 17
Posts: 93
Credit: 3,068,664
RAC: 1,808
Message 40217 - Posted: 20 Oct 2019, 2:52:03 UTC - in response to Message 40216.  

I guess, just turn off Atlas for a few days. Maybe something is messed up and it has to wait until it gets looked at early next week. Are the other tasks running okay for you? It is just Atlas being a problem?
ID: 40217 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 40220 - Posted: 20 Oct 2019, 8:18:39 UTC - in response to Message 40217.  

I guess, just turn off Atlas for a few days. Maybe something is messed up and it has to wait until it gets looked at early next week. Are the other tasks running okay for you? It is just Atlas being a problem?


Just ATLAS. It bogs down big time in the last 10%.
In 8 hrs it has not finished much more than last post.
Now its .001% advance every 6-12 seconds, 23:3x run time, 6 seconds cpu time. 48% memory usage.
It needs 23 minutes (whatever that is in real time) to complete.

As I said earlier, CMS runs, Theory runs...just ATLAS lately craps out.
ID: 40220 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 40221 - Posted: 20 Oct 2019, 10:11:24 UTC - in response to Message 40220.  

I guess, just turn off Atlas for a few days. Maybe something is messed up and it has to wait until it gets looked at early next week. Are the other tasks running okay for you? It is just Atlas being a problem?


Just ATLAS. It bogs down big time in the last 10%.
In 8 hrs it has not finished much more than last post.
Now its .001% advance every 6-12 seconds, 23:3x run time, 6 seconds cpu time. 48% memory usage.
It needs 23 minutes (whatever that is in real time) to complete.

As I said earlier, CMS runs, Theory runs...just ATLAS lately craps out.



ABORTED again.
In 2 hrs it moved barely .05% and had run for over a day continuously.

No more tasks for now. Waiting on CMS to finish and then reset.
Memory was never an issue in the last 2 days.
It had full access.
CPU dedication was the biggest problem. .01% and a CPU run time of maximum 8 seconds.
That's what needs to be explained.
ID: 40221 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 119
Credit: 51,286,855
RAC: 20,720
Message 40222 - Posted: 20 Oct 2019, 11:13:34 UTC - in response to Message 40221.  

greg_be, hello.

I have had the same problem several month ago. It cured by itself in several days.
Now we have it again. Looks like the problem is not on our side.

Just do not run ATLAS several days, than try again.
ID: 40222 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 40224 - Posted: 20 Oct 2019, 17:33:05 UTC - in response to Message 40222.  
Last modified: 20 Oct 2019, 17:33:52 UTC

greg_be, hello.

I have had the same problem several month ago. It cured by itself in several days.
Now we have it again. Looks like the problem is not on our side.

Just do not run ATLAS several days, than try again.


I just went with no new tasks, need to clear out CMS and do a reset to wipe out any residual data and then try again.
ID: 40224 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 119
Credit: 51,286,855
RAC: 20,720
Message 40252 - Posted: 23 Oct 2019, 13:04:34 UTC - in response to Message 40224.  

Hello.

Does anybody knows is ATLAS run fine already?
ID: 40252 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 40254 - Posted: 23 Oct 2019, 14:56:46 UTC - in response to Message 40252.  

My first Atlas task used no CPU, so I aborted it. The second one ran fine and produced a HITS file. A third one is running.
Tullio
ID: 40254 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,713,406
RAC: 234,467
Message 40255 - Posted: 23 Oct 2019, 16:23:30 UTC

I've run a few fine so its possiable
ID: 40255 · Report as offensive     Reply Quote
Hammy
Avatar

Send message
Joined: 18 Sep 04
Posts: 2
Credit: 684,080
RAC: 0
Message 40540 - Posted: 19 Nov 2019, 21:06:35 UTC

I have been running LHC for years. My preferences are clearly set to NOT accept Atlas Simulation, as one of them ran on for weeks, got to 99.9999% with one second to run, then would not complete. Despite changing my preferences to accept only Six Track, I continue to get lots of downloads of Atlas, which I do not want. How can I fix this, bearing in mind I am not a techno whizz.
Only those who risk going too far will ever know how far they can go!
ID: 40540 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 40542 - Posted: 19 Nov 2019, 21:30:52 UTC

Maybe you have ticked the box in your preferences: If no work for selected applications is available, accept work from other applications?
ID: 40542 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : ATLAS application : Curious question


©2024 CERN