Message boards : Number crunching : Tasks reach 100% complete but continue running.
Message board moderation
| Author | Message |
|---|---|
adrianxwSend message Joined: 29 Sep 04 Posts: 191 Credit: 707,594 RAC: 44 |
I have a couple of tasks that have run to 100.000% done but continue running. One says "---" remaining, the other "00:00:00". The 00:00:00 has 02:48:23 elapsed, increasing, the "---" has 05:04:56, increasing. Is this normal, are they okay or just wasting time? Windows 10 system. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
|
Send message Joined: 3 Oct 06 Posts: 122 Credit: 9,375,630 RAC: 8,326 |
In reply to adrianxw's message of 10 May 2026: Is this normal, are they okay or just wasting time? Windows 10 system. This may be normal. The BM progress bar shows HERE estimated progress, not the real one. If you are using VirtualBox and running "Theory", just click "Show graphics" (to access a real-time log). There you will see what the virtual machine is actually doing. If you are using Docker, checking it will be more complicated – you will have to dig through the slots...
|
adrianxwSend message Joined: 29 Sep 04 Posts: 191 Credit: 707,594 RAC: 44 |
I had suspended the tasks, but have released them again and can see they are running. The tasks are "Theory Simulation 302.10 (docker)". Show graphics is greyed out so I can't do that. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
|
Send message Joined: 14 Jan 10 Posts: 1561 Credit: 10,124,544 RAC: 1,285 |
In reply to adrianxw's message of 11 May 2026: I had suspended the tasks, but have released them again and can see they are running. The tasks are "Theory Simulation 302.10 (docker)". Show graphics is greyed out so I can't do that.You can check the progress of a task with Windows Powershell. Search for the right BOINC-Path and slotnumber. Example: Get-Content C:\ProgramData\BOINC\slots\0\shared\runRivet.log -tail 10 |
|
Send message Joined: 15 Jun 08 Posts: 2760 Credit: 307,109,835 RAC: 133,420 |
In reply to Crystal Pellet's message of 11 May 2026: In reply to adrianxw's message of 11 May 2026: Be aware: Some tasks run a scientific payload (e.g. powheg-box) that does not update runRivet.log for hours, sometimes even for days. For those tasks monitoring runRivet.log can't tell you if the task hangs or not. |
adrianxwSend message Joined: 29 Sep 04 Posts: 191 Credit: 707,594 RAC: 44 |
One of the tasks has finished now. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
|
Send message Joined: 14 Jan 10 Posts: 1561 Credit: 10,124,544 RAC: 1,285 |
In reply to computezrmle's message of 11 May 2026: In reply to Crystal Pellet's message of 11 May 2026: You're right. There are those very rare occasions, where nothing is written to runRivet.log for hours. To watch the activity of such a process one could use WSL. For Windows: wsl -u root and then podman stats --no-stream and/or top -n 1 |
|
Send message Joined: 3 Oct 06 Posts: 122 Credit: 9,375,630 RAC: 8,326 |
In reply to computezrmle's message of 11 May 2026: Some tasks run a scientific payload (e.g. powheg-box) that does not update runRivet.log for hours, sometimes even for days. Isn't VBox a more convenient tool in this context? Especially when processing "silent" tasks. If the log (for example, powheg) freezes, you can at least check what is happening inside the VM. If the Guest is active, things are probably fine (unless the task is stuck in some infinite loop). VBox also seems to be a perfect tool for zombie hunting. For example: if no log is created and the Guest is inactive (or the VM OS failed to boot entirely), the diagnosis is... this VM has turned into a zombie. Just grab a brick and smash that zombie! :)
|
|
Send message Joined: 28 Dec 08 Posts: 367 Credit: 6,826,225 RAC: 936 |
I find using Htop and then filtering to BOINC or pythia allows me to see what is moving and what is not. I just use the log to see the progress. Also can check stderr for any other messages. Its a lot of jumping around, but you get the picture doing that. |
|
Send message Joined: 15 Jun 08 Posts: 2760 Credit: 307,109,835 RAC: 133,420 |
Both are fine, vbox as well as docker/podman. Major rule: Don't shutdown BOINC or reboot the computer. Look into slots/n/shared/runRivet.log. As long as the scientific app updates runRivet.log you can be sure it is alive. Stay patient and be aware: - certain input scenarios (like reduced total events instead of 100000) indicate complex math that can delay these updates (by a few hours) - At the very end of processing the app has to work on the images. This can take few seconds or >1 h depending on the #images. If runRivet.log does not update this does not always mean the task got stuck. The task activity via top (in vbox as well as in docker/podman) usually shows one or more busy generator processes BUT this does not mean they make good progress. If you want to manage those (rare) tasks, check the mcplots details for the 'runspec' you find at the top of runRivet.log. The following download puts huge pressure on the mcplots service, so don't use it without good reason. Look up the rev number (e.g. from your task id) and get the mcplots totals, e.g.: http://mcplots-dev.cern.ch/production.php?view=runs&rev=2922&display=all At the top of the page filter for the task's runspec good example (-> the task is most likely doing something useful; let it run): run events attempts success failure unknown ee zhad 133 - - pythia6 6.428 z1-lep 5578000 62 57 0 5 92% success 8% unknown (may be not yet returned) bad example (-> the task's math is either too complex to finish in 10 days or it got stuck on all computers so far; cancel it) run events attempts success failure unknown pp jets 13000 100 - sherpa 1.4.0 default 0 9 0 8 1 88.9% failed 11.1% unknown |
|
Send message Joined: 28 Dec 08 Posts: 367 Credit: 6,826,225 RAC: 936 |
I have done that without any hiccups (knock on silicone). BTW, this one is taking 3 days on a Ryzen 7 3700x Input parameters: mode=boinc beam=pp process=jets energy=8000 params=350 specific=- generator=pythia8 version=8.301 tune=CP1-CR1 nevts=100000 seed=888 and Total histograms selected: 12 350 GV makes for a lot of scatter, then it has to compare against 12 histograms. Its a huge process. at 2.5 days I am at 85,300 and counting. BOINC says I make 1.44% progress per hour. |
|
Send message Joined: 3 Oct 06 Posts: 122 Credit: 9,375,630 RAC: 8,326 |
In reply to computezrmle's message of 11 May 2026: If you want to manage those (rare) tasks, check the mcplots details for the 'runspec' you find at the top of runRivet.log. THANK YOU for this tip! I will not abuse the tool, but it is a real eye-opener. :)
|
©2026 CERN