Message boards : Theory Application : Theory simulation takes way too long
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Asciimonster

Send message
Joined: 26 Sep 20
Posts: 1
Credit: 48,482
RAC: 0
Message 48797 - Posted: 20 Oct 2023, 13:22:34 UTC

I'm running a Theory task that just took 2.5 hours to reach 1%. Although BOINC "helpfully" estimates the time remaining to be 3 and a quarter hours, according to my calculations this will be a week and a half of solid computation. That might be a bit on the long side.

Because of this, other BOINC projects get drowned out. Is there a way to limit the amount of these tasks down or is the only way to block the Theory project in my account?
ID: 48797 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2158
Credit: 162,601,709
RAC: 123,276
Message 48798 - Posted: 20 Oct 2023, 17:13:03 UTC - in response to Message 48797.  

Runtime of recent tasks in hours: average, min, max
Theory Simulation 2.52 (0.01 - 178)
ID: 48798 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,859,285
RAC: 0
Message 48799 - Posted: 20 Oct 2023, 18:47:37 UTC - in response to Message 48797.  
Last modified: 20 Oct 2023, 18:55:59 UTC

Boinc doesn't know what is going on within the Virtual Machine so the % it displays is time-elapsed versus expiry time (10 days) although most run for only a few hours, some a few days. The 10 day limit is to catch faulty units where the user hasn't noticed and stops them so as not to waste resources running a bad task forever.
To see the actual progress within the VM, click Show Console then Alt-F2. Almost all are 100,000 events so it is easy to see the actual % complete.
ID: 48799 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1137
Credit: 49,947,598
RAC: 2,649
Message 48822 - Posted: 26 Oct 2023, 8:59:49 UTC

It has been a while since I got one of these but I don't mind since I could check the running log to see it was actually running.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=400728061

Just finished
ID: 48822 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1137
Credit: 49,947,598
RAC: 2,649
Message 48870 - Posted: 1 Nov 2023, 7:00:55 UTC - in response to Message 48822.  

Another example of how waiting is part of running a Valid task when you watch the running log.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=400796623

I admit that this doesn't happen that often and last time I got one this long or longer was back at Test4Theory

Computer ID 10451775
Run time 7 days 17 hours 36 min 8 sec
CPU time 7 days 16 hours 26 min 13 sec
Validate state Valid
Credit 6,451.85
ID: 48870 · Report as offensive     Reply Quote
Darrell

Send message
Joined: 8 Jul 08
Posts: 20
Credit: 27,550,154
RAC: 23,165
Message 48890 - Posted: 4 Nov 2023, 3:39:51 UTC - in response to Message 48799.  

Just to expand a little for the sake of NOOBs:
My config: BOINC 7.24.1, Virtualbox 7.0.12, Windows 10

1. In BOINC Manager window, select the task you want to view
2. On the left, click on Properties. Near the bottom, note the slot number
listed alongside Directory
3 Navigate to that slot number in "[drive letter]:\ProgramData\BOINC\slots" and
note the folder name starting with "boinc_"
4 In Virtualbox, find that same name.
5 Click "SHOW" on the menu bar
6 Press ALT-F2 to display the running status.
NOTE WELL! If you think is is running too long, the entries in the above window will only
update infrequently, perhaps a minute or two per line, so be patient.
7 When done, close the window and be sure the top option is selected to CONTINUE RUNNING
the Theory application.

Thanks to all the many posters here who helped me get going again.
ID: 48890 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2450
Credit: 232,577,907
RAC: 131,531
Message 48892 - Posted: 4 Nov 2023, 7:36:00 UTC - in response to Message 48890.  

7 When done, close the window and be sure the top option is selected to CONTINUE RUNNING
the Theory application.

Closing the window causes the VM to go through a suspend/resume cycle which puts heavy load on the host.
To avoid this select "Machine -> Detach GUI" from the VM window menu.
ID: 48892 · Report as offensive     Reply Quote
Darrell

Send message
Joined: 8 Jul 08
Posts: 20
Credit: 27,550,154
RAC: 23,165
Message 48893 - Posted: 4 Nov 2023, 8:44:30 UTC - in response to Message 48892.  

Ahh, thanks for the better way to close the window. Now all the info to check INSIDE the VB to see if progress is being made is in one place. Thanks for the improvement.
ID: 48893 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2158
Credit: 162,601,709
RAC: 123,276
Message 49323 - Posted: 31 Jan 2024, 12:50:46 UTC - in response to Message 48870.  

Another example of how waiting is part of running a Valid task when you watch the running log.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=400796623

I admit that this doesn't happen that often and last time I got one this long or longer was back at Test4Theory

Computer ID 10451775
Run time 7 days 17 hours 36 min 8 sec
CPU time 7 days 16 hours 26 min 13 sec
Validate state Valid
Credit 6,451.85


21:22:17 CET +01:00 2024-01-25: cranky-0.1.4: [INFO] mcplots runspec: boinc pp z1j 13000 75 - pythia8 8.244 CP1-CR1 100000 66
13:20:35 CET +01:00 2024-01-31: cranky-0.1.4: [INFO] Container 'runc' finished with status code 0.

Computer ID 10816264
Laufzeit 5 Tage 13 Stunden 41 min. 0 sek.
CPU Zeit 2 Tage 18 Stunden 47 min. 18 sek.
Prüfungsstatus Gültig
Punkte 6,383.10

Yes, waiting for max. 10 days for Theory tasks is possible.
Don't know the difference between CPU-Time and running Time.
ID: 49323 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 128
Credit: 53,223,608
RAC: 20,963
Message 50021 - Posted: 23 Apr 2024, 11:22:57 UTC
Last modified: 23 Apr 2024, 11:24:17 UTC

10 days limit is not enough for this task:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=221471152

It is running 2 days already (24/7) and have done only 10900 events (of 100000 total).

Previous attempt on PC 10836791 confirm the problem.
ID: 50021 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 128
Credit: 53,223,608
RAC: 20,963
Message 50036 - Posted: 25 Apr 2024, 20:39:29 UTC - in response to Message 50021.  
Last modified: 25 Apr 2024, 20:39:42 UTC

10 days limit is not enough for this task:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=221471152

It is running 2 days already (24/7) and have done only 10900 events (of 100000 total).

Previous attempt on PC 10836791 confirm the problem.


It run for about 5 days (no pause) and have done 23400 events (of 100000 total). No chance to success due to 10 days limit? Shoud I abort it?
ID: 50036 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2158
Credit: 162,601,709
RAC: 123,276
Message 50037 - Posted: 25 Apr 2024, 21:05:56 UTC - in response to Message 50036.  

Feel free to do it.
We have not so fast machines to do this Theory-Tasks.
Maybe, splitting in 33k Events and put this three runs together.
Thinking Cern-IT have no interest, to do this.
ID: 50037 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2450
Credit: 232,577,907
RAC: 131,531
Message 50038 - Posted: 26 Apr 2024, 7:10:50 UTC - in response to Message 50036.  

Theory tasks usually start with #events = 100000.
In very rare cases they don't finish within the 10 day limit.

If the long runtime is
- not caused by a local issue and
- mcplots does not get enough valid results for a given set of input parameters
the same task type is reissued with a lower #events.

This reduction may happen repeatedly until enough valid results are returned.
A statement like "Cern-IT have no interest" is simply wrong.
ID: 50038 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 128
Credit: 53,223,608
RAC: 20,963
Message 50039 - Posted: 26 Apr 2024, 7:28:35 UTC - in response to Message 50038.  

If the long runtime is
- not caused by a local issue and
- mcplots does not get enough valid results for a given set of input parameters
the same task type is reissued with a lower #events.


For this reason I should wait for my task to finish?
ID: 50039 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2450
Credit: 232,577,907
RAC: 131,531
Message 50040 - Posted: 26 Apr 2024, 7:42:58 UTC - in response to Message 50039.  

If I see a long running task on any of my systems that has a small chance to finish, I let it run.
If I see a task like the one in question, I cancel it.

On your system it's your decision.
You already mentioned the relevant numbers.
Why do you ask anybody else?
ID: 50040 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2158
Credit: 162,601,709
RAC: 123,276
Message 50041 - Posted: 26 Apr 2024, 7:50:31 UTC - in response to Message 50040.  
Last modified: 26 Apr 2024, 7:56:09 UTC

Why do you ask anybody else?

?
No Multicore (Atlas or Theory) in Windows atm.
ID: 50041 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 128
Credit: 53,223,608
RAC: 20,963
Message 50042 - Posted: 26 Apr 2024, 7:54:49 UTC - in response to Message 50040.  
Last modified: 26 Apr 2024, 8:10:43 UTC

You wrote:
- not caused by a local issue

I wonder is abort a local issue? Does CERN need confirmation that 10 days was not enough to reissue the task with a lower #events? I can let it run to fail for this reason.
ID: 50042 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 128
Credit: 53,223,608
RAC: 20,963
Message 50045 - Posted: 26 Apr 2024, 16:20:49 UTC - in response to Message 50041.  

Why do you ask anybody else?

?
No Multicore (Atlas or Theory) in Windows atm.

No Multicore.
ID: 50045 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1137
Credit: 49,947,598
RAC: 2,649
Message 50071 - Posted: 29 Apr 2024, 0:16:20 UTC

I don't mind getting the long Theory tasks and have had many over the years here and at -dev

But this is the first one like this https://lhcathome.cern.ch/lhcathome/result.php?resultid=410306756

Computer ID 10824117
Run time 3 hours 49 min 47 sec
CPU time 22 hours 35 min 32 sec
Validate state Valid
Credit 119.70

I guess it wanted to be like the muti-core cms
ID: 50071 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 128
Credit: 53,223,608
RAC: 20,963
Message 50126 - Posted: 6 May 2024, 10:08:40 UTC - in response to Message 50038.  

Theory tasks usually start with #events = 100000.
In very rare cases they don't finish within the 10 day limit.

If the long runtime is
- not caused by a local issue and
- mcplots does not get enough valid results for a given set of input parameters
the same task type is reissued with a lower #events.

This reduction may happen repeatedly until enough valid results are returned.
A statement like "Cern-IT have no interest" is simply wrong.


Not so rare cases.
I've got another one:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=221982999
ID: 50126 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Theory Application : Theory simulation takes way too long


©2024 CERN