Message boards : Number crunching : Stuck at 100%
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 48
Message 32192 - Posted: 3 Sep 2017, 10:20:36 UTC
Last modified: 3 Sep 2017, 10:26:12 UTC

I have an old quad core Windows PC which has a 4 core LHC task stuck at 100%. I noticed it yesterday evening at 99.something%, and taking a lot longer for the last bit. Overnight it's got to 100%, but it still says processing in BOINC. But task manager shows no CPU or disk activity, Vbox is using about 40MB memory. Is it jammed? Should I abort? Or will it get there eventually?

Edit: I was viewing it using Boinctasks form another computer, which says 100%, but the actual BOINC manager on the machine says 99.999%. Still, with no CPU or disk activity it must be stuck, right?
ID: 32192 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 32195 - Posted: 3 Sep 2017, 13:37:38 UTC - in response to Message 32192.  

Which task is it (Theory, ATLAS,...)?
Since its a 4 core task i assume that it is an ATLAS task because this is the only app right now that officially supports multicore tasks.

A good idea is to compare the runtime with the actual cpu time and also cpu time and cpu time from last checkpoint (you can do that with BOINC Manager).
If there are big differences its most probably a failed task and you can/should abort it.
ID: 32195 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 48
Message 32196 - Posted: 3 Sep 2017, 14:06:17 UTC - in response to Message 32195.  

Yes it's Atlas.

Enormous differences:
CPU time at last checkpoint: 9m5s
CPU time: 9m5s
Elapsed time: 1d6h34m50s

I've noticed these multicore tasks never use all 4 cores anywhere near fully (probably about 30% overall CPU usage), but those figures look vastly out, so I guess it's not doing anything. However, although I didn't check the above last night, the runtime was about 15 hours, and the CPU time must have been 9 minutes or less, and the percentage was still climbing slowly (around 99.5%).

Computer: Black
Project LHC@home

Name tbdMDm4QV6qnSu7Ccp2YYBZmABFKDmABFKDmXNGKDmyecKDmhyp1bn_0

Application ATLAS Simulation 1.01 (vbox64_mt_mcore_atlas)
Workunit name tbdMDm4QV6qnSu7Ccp2YYBZmABFKDmABFKDmXNGKDmyecKDmhyp1bn
State Running High P.
Received Mon 28/08/2017 3:34:35 pm
Report deadline Mon 04/09/2017 3:34:35 pm
Estimated app speed 2.32 GFLOPs/sec
Estimated task size 16,020 GFLOPs
Resources 4 CPUs
CPU time at last checkpoint 00:09:05
CPU time 00:09:05
Elapsed time 01d,06:34:50
Estimated time remaining 00:00:00
Fraction done 100.000%
Virtual memory size 127.18 MB
Working set size 5,800.00 MB
Directory slots/1
Process ID 7104
ID: 32196 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 32197 - Posted: 3 Sep 2017, 19:29:41 UTC - in response to Message 32196.  
Last modified: 3 Sep 2017, 19:34:08 UTC

Enormous differences:
CPU time at last checkpoint: 9m5s
CPU time: 9m5s
Elapsed time: 1d6h34m50s

That is clearly a not properly working task and u should abort it to not waste your CPU.

...I've noticed these multicore tasks never use all 4 cores anywhere near fully (probably about 30% overall CPU usage), but those figures look vastly out, so I guess it's not doing anything...

Normally a 4 core ATLAS task uses 4 cores fully (after some downloading time and so on) if configured correctly. Do you already have finished one successfully?
If not, Yeti's checklist is a very good starting point/"debugging help":
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161&postid=29359#29359
ID: 32197 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 48
Message 32198 - Posted: 3 Sep 2017, 20:30:34 UTC - in response to Message 32197.  

I've checked my account on the webpages here, and I can see finished tasks, but not an Atlas one from that computer (it only seems to keep a very short history?)

Your checklist is fine, everything is set correctly. My only concerns are:

1) The machine only has 8GB of memory, running Windows 10 with BOINC and nothing else (apart from AVG antivirus) - where would I see the "waiting for memory" message? I monitor the machine remotely using BOINCtasks from Efmer. Would the status column which usually says "running" or "running high priority" change to "waiting for memory"?

2) Ports - do I have to open these somewhere? It has the standard Windows firewall, and there's the internet router. I had to tell the internet router to use UPNP so I could use Emule and utorrent, do I have to manually open ports for LHC? If so how?
ID: 32198 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,150,252
RAC: 15,977
Message 32202 - Posted: 4 Sep 2017, 6:52:22 UTC - in response to Message 32198.  

...
1) The machine only has 8GB of memory, running Windows 10 with BOINC and nothing else (apart from AVG antivirus) - where would I see the "waiting for memory" message? I monitor the machine remotely using BOINCtasks from Efmer. Would the status column which usually says "running" or "running high priority" change to "waiting for memory"?
...

Yes, I think it shows "Waiting to run - waiting for memory" or something like that.
ID: 32202 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 48
Message 32203 - Posted: 4 Sep 2017, 8:43:39 UTC - in response to Message 32202.  

I don't get that message. It looks like it's processing, but there's no CPU usage in the task manager and the progress bar doesn't move.
ID: 32203 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 32204 - Posted: 4 Sep 2017, 8:48:09 UTC - in response to Message 32198.  
Last modified: 4 Sep 2017, 8:48:27 UTC

where would I see the "waiting for memory" message? I monitor the machine remotely using BOINCtasks from Efmer. Would the status column which usually says "running" or "running high priority" change to "waiting for memory"?

You can see "waiting for memory" direct on your client, where you see "running" or other status-messages

Older Versions of BOINCTasks didn't give "waiting for memory" clearly back to you; don't know if the latest Version does it


Supporting BOINC, a great concept !
ID: 32204 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 32205 - Posted: 4 Sep 2017, 8:56:26 UTC - in response to Message 32198.  

1) The machine only has 8GB of memory, running Windows 10 with BOINC and nothing else (apart from AVG antivirus)

You should check your preferences, memory settings, "Computer in use" and "Computer not in use" (don't know the exact english tokens as the Web-Pages always shows me german tokens), they should be identical (e.g. 90% and 90%)


Supporting BOINC, a great concept !
ID: 32205 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 48
Message 32207 - Posted: 4 Sep 2017, 9:30:25 UTC - in response to Message 32204.  

You can see "waiting for memory" direct on your client, where you see "running" or other status-messages

Older Versions of BOINCTasks didn't give "waiting for memory" clearly back to you; don't know if the latest Version does it


I'll check the client on the remote machine next time it happens.
ID: 32207 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 48
Message 32208 - Posted: 4 Sep 2017, 9:31:26 UTC - in response to Message 32205.  

1) The machine only has 8GB of memory, running Windows 10 with BOINC and nothing else (apart from AVG antivirus)

You should check your preferences, memory settings, "Computer in use" and "Computer not in use" (don't know the exact english tokens as the Web-Pages always shows me german tokens), they should be identical (e.g. 90% and 90%)


They're both on 100%. I allow processing all the time.

The computer does nothing but run boinc, and has no keyboard/monitor. I only access it remotely. So nothing should be interrupting it anyway.
ID: 32208 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 32209 - Posted: 4 Sep 2017, 9:42:46 UTC - in response to Message 32198.  
Last modified: 4 Sep 2017, 9:46:54 UTC

1) The machine only has 8GB of memory, running Windows 10 with BOINC and nothing else (apart from AVG antivirus)...

8Gb RAM is enough for a 4 core ATLAS WU (if you use it just for boinc as you said).
Have you excluded the BOINC Data directory from your antivirus (Checklist Point 8)? You can see that your PC has downloaded a couple of different subprojects. Try to run ATLAS only until it works, then you can try to run other projects to at the same time (since they seem to work). Also with other subprojects running, you could run out of memory.

2) Ports - do I have to open these somewhere? It has the standard Windows firewall, and there's the internet router. I had to tell the internet router to use UPNP so I could use Emule and utorrent, do I have to manually open ports for LHC? If so how?

Make sure that your Windows Firewall allows incoming and outgoing traffic for boinc.exe and vboxheadless.exe (Checklist Point 7). This should be the default windows firewall configuration (at least in Windows 7). If your router doesnt use any special firewalls/rules you normally dont have to do anything. If you are using a proxy make sure that it is set correcty in boinc. The VM will use these settings.

According to the log of your failed WU it looks like the VM starts correctly but cant start the actual computing.
ID: 32209 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 48
Message 32220 - Posted: 4 Sep 2017, 12:50:06 UTC - in response to Message 32209.  

8Gb RAM is enough for a 4 core ATLAS WU (if you use it just for boinc as you said).
Have you excluded the BOINC Data directory from your antivirus (Checklist Point 8)?


No, but the AV has never flagged a detection.

You can see that your PC has downloaded a couple of different subprojects. Try to run ATLAS only until it works, then you can try to run other projects to at the same time (since they seem to work). Also with other subprojects running, you could run out of memory.


I'll give that a go.... Other projects and subprojects disabled....

Make sure that your Windows Firewall allows incoming and outgoing traffic for boinc.exe and vboxheadless.exe (Checklist Point 7). This should be the default windows firewall configuration (at least in Windows 7).


I've not changed any settings on the (Windows 10) firewall. I have 4 machines, and I remember seeing a few messages on some of them asking if wanted to allow vboxheadless to communicate on the internet, to which I said yes.

If your router doesnt use any special firewalls/rules you normally dont have to do anything.


Why do the ports not have to be opened? Is it because they're always initiated from me and not the server? I had to fiddle with the router to allow p2p programs to work, but I guess they have to accept incoming packets. All I changed on the router was to allow UPNP.

If you are using a proxy make sure that it is set correcty in boinc. The VM will use these settings.


No proxy.

According to the log of your failed WU it looks like the VM starts correctly but cant start the actual computing.


Why would that happen? If it's loaded the VM up, what's stopping it from continuing?
ID: 32220 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 32221 - Posted: 4 Sep 2017, 12:56:43 UTC - in response to Message 32220.  

According to the log of your failed WU it looks like the VM starts correctly but cant start the actual computing.


Why would that happen? If it's loaded the VM up, what's stopping it from continuing?

http://lhcathome.web.cern.ch/test4theory/my-firewall-complaining-which-ports-does-project-use


Supporting BOINC, a great concept !
ID: 32221 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 48
Message 32222 - Posted: 4 Sep 2017, 13:00:59 UTC - in response to Message 32221.  

According to the log of your failed WU it looks like the VM starts correctly but cant start the actual computing.


Why would that happen? If it's loaded the VM up, what's stopping it from continuing?

http://lhcathome.web.cern.ch/test4theory/my-firewall-complaining-which-ports-does-project-use


But I've received a complaint from the firewall about vboxheadless, and I authorised the firewall to let it through. It hasn't said anything since.
ID: 32222 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 32224 - Posted: 4 Sep 2017, 13:03:33 UTC - in response to Message 32222.  

But I've received a complaint from the firewall about vboxheadless, and I authorised the firewall to let it through. It hasn't said anything since.

What about your router ?


Supporting BOINC, a great concept !
ID: 32224 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 48
Message 32225 - Posted: 4 Sep 2017, 13:08:11 UTC

Ok, that didn't work. I suspended all projects other than LHC, then told LHC to only give me Atlas projects. But I get this message:

"Black LHC@home Mon 04/09/2017 2:03:35 pm No tasks are available for ATLAS Simulation"

Yet server status says there are 764 available!
ID: 32225 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 32226 - Posted: 4 Sep 2017, 13:09:31 UTC - in response to Message 32225.  

Ok, that didn't work. I suspended all projects other than LHC, then told LHC to only give me Atlas projects. But I get this message:

"Black LHC@home Mon 04/09/2017 2:03:35 pm No tasks are available for ATLAS Simulation"

Yet server status says there are 764 available!

At the moment they have a big infrastructure-problem at CERN, so it is not you !


Supporting BOINC, a great concept !
ID: 32226 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 48
Message 32227 - Posted: 4 Sep 2017, 13:10:14 UTC - in response to Message 32224.  

But I've received a complaint from the firewall about vboxheadless, and I authorised the firewall to let it through. It hasn't said anything since.

What about your router ?


It's got UPNP to allow p2p programs to work, other than that it's on default settings.
ID: 32227 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 48
Message 32230 - Posted: 4 Sep 2017, 15:56:21 UTC - in response to Message 32226.  

Ok, that didn't work. I suspended all projects other than LHC, then told LHC to only give me Atlas projects. But I get this message:

"Black LHC@home Mon 04/09/2017 2:03:35 pm No tasks are available for ATLAS Simulation"

Yet server status says there are 764 available!

At the moment they have a big infrastructure-problem at CERN, so it is not you !


The server status shows the number of Atlas tasks ready to send is fluctuating, so I assume they're getting sent out to somebody. I tried two of my computers and neither would take Atlas tasks, but they both managed to take various other subproject tasks. Very strange.
ID: 32230 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Stuck at 100%


©2024 CERN