Message boards :
Theory Application :
Theory simulation takes way too long
Message board moderation
Author | Message |
---|---|
Send message Joined: 26 Sep 20 Posts: 1 Credit: 48,482 RAC: 0 |
I'm running a Theory task that just took 2.5 hours to reach 1%. Although BOINC "helpfully" estimates the time remaining to be 3 and a quarter hours, according to my calculations this will be a week and a half of solid computation. That might be a bit on the long side. Because of this, other BOINC projects get drowned out. Is there a way to limit the amount of these tasks down or is the only way to block the Theory project in my account? |
Send message Joined: 2 May 07 Posts: 2223 Credit: 173,730,166 RAC: 25,122 |
Runtime of recent tasks in hours: average, min, max Theory Simulation 2.52 (0.01 - 178) |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,866,264 RAC: 0 |
Boinc doesn't know what is going on within the Virtual Machine so the % it displays is time-elapsed versus expiry time (10 days) although most run for only a few hours, some a few days. The 10 day limit is to catch faulty units where the user hasn't noticed and stops them so as not to waste resources running a bad task forever. To see the actual progress within the VM, click Show Console then Alt-F2. Almost all are 100,000 events so it is easy to see the actual % complete. |
Send message Joined: 24 Oct 04 Posts: 1166 Credit: 53,729,957 RAC: 51,044 |
It has been a while since I got one of these but I don't mind since I could check the running log to see it was actually running. https://lhcathome.cern.ch/lhcathome/result.php?resultid=400728061 Just finished |
Send message Joined: 24 Oct 04 Posts: 1166 Credit: 53,729,957 RAC: 51,044 |
Another example of how waiting is part of running a Valid task when you watch the running log. https://lhcathome.cern.ch/lhcathome/result.php?resultid=400796623 I admit that this doesn't happen that often and last time I got one this long or longer was back at Test4Theory Computer ID 10451775 Run time 7 days 17 hours 36 min 8 sec CPU time 7 days 16 hours 26 min 13 sec Validate state Valid Credit 6,451.85 |
Send message Joined: 8 Jul 08 Posts: 20 Credit: 30,758,146 RAC: 13,216 |
Just to expand a little for the sake of NOOBs: My config: BOINC 7.24.1, Virtualbox 7.0.12, Windows 10 1. In BOINC Manager window, select the task you want to view 2. On the left, click on Properties. Near the bottom, note the slot number listed alongside Directory 3 Navigate to that slot number in "[drive letter]:\ProgramData\BOINC\slots" and note the folder name starting with "boinc_" 4 In Virtualbox, find that same name. 5 Click "SHOW" on the menu bar 6 Press ALT-F2 to display the running status. NOTE WELL! If you think is is running too long, the entries in the above window will only update infrequently, perhaps a minute or two per line, so be patient. 7 When done, close the window and be sure the top option is selected to CONTINUE RUNNING the Theory application. Thanks to all the many posters here who helped me get going again. |
Send message Joined: 15 Jun 08 Posts: 2519 Credit: 251,023,922 RAC: 121,999 |
7 When done, close the window and be sure the top option is selected to CONTINUE RUNNING Closing the window causes the VM to go through a suspend/resume cycle which puts heavy load on the host. To avoid this select "Machine -> Detach GUI" from the VM window menu. |
Send message Joined: 8 Jul 08 Posts: 20 Credit: 30,758,146 RAC: 13,216 |
Ahh, thanks for the better way to close the window. Now all the info to check INSIDE the VB to see if progress is being made is in one place. Thanks for the improvement. |
Send message Joined: 2 May 07 Posts: 2223 Credit: 173,730,166 RAC: 25,122 |
Another example of how waiting is part of running a Valid task when you watch the running log. 21:22:17 CET +01:00 2024-01-25: cranky-0.1.4: [INFO] mcplots runspec: boinc pp z1j 13000 75 - pythia8 8.244 CP1-CR1 100000 66 13:20:35 CET +01:00 2024-01-31: cranky-0.1.4: [INFO] Container 'runc' finished with status code 0. Computer ID 10816264 Laufzeit 5 Tage 13 Stunden 41 min. 0 sek. CPU Zeit 2 Tage 18 Stunden 47 min. 18 sek. Prüfungsstatus Gültig Punkte 6,383.10 Yes, waiting for max. 10 days for Theory tasks is possible. Don't know the difference between CPU-Time and running Time. |
Send message Joined: 18 Nov 17 Posts: 128 Credit: 55,475,876 RAC: 14,858 |
10 days limit is not enough for this task: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=221471152 It is running 2 days already (24/7) and have done only 10900 events (of 100000 total). Previous attempt on PC 10836791 confirm the problem. |
Send message Joined: 18 Nov 17 Posts: 128 Credit: 55,475,876 RAC: 14,858 |
10 days limit is not enough for this task: It run for about 5 days (no pause) and have done 23400 events (of 100000 total). No chance to success due to 10 days limit? Shoud I abort it? |
Send message Joined: 2 May 07 Posts: 2223 Credit: 173,730,166 RAC: 25,122 |
Feel free to do it. We have not so fast machines to do this Theory-Tasks. Maybe, splitting in 33k Events and put this three runs together. Thinking Cern-IT have no interest, to do this. |
Send message Joined: 15 Jun 08 Posts: 2519 Credit: 251,023,922 RAC: 121,999 |
Theory tasks usually start with #events = 100000. In very rare cases they don't finish within the 10 day limit. If the long runtime is - not caused by a local issue and - mcplots does not get enough valid results for a given set of input parameters the same task type is reissued with a lower #events. This reduction may happen repeatedly until enough valid results are returned. A statement like "Cern-IT have no interest" is simply wrong. |
Send message Joined: 18 Nov 17 Posts: 128 Credit: 55,475,876 RAC: 14,858 |
If the long runtime is For this reason I should wait for my task to finish? |
Send message Joined: 15 Jun 08 Posts: 2519 Credit: 251,023,922 RAC: 121,999 |
If I see a long running task on any of my systems that has a small chance to finish, I let it run. If I see a task like the one in question, I cancel it. On your system it's your decision. You already mentioned the relevant numbers. Why do you ask anybody else? |
Send message Joined: 2 May 07 Posts: 2223 Credit: 173,730,166 RAC: 25,122 |
Why do you ask anybody else? ? No Multicore (Atlas or Theory) in Windows atm. |
Send message Joined: 18 Nov 17 Posts: 128 Credit: 55,475,876 RAC: 14,858 |
You wrote: - not caused by a local issue I wonder is abort a local issue? Does CERN need confirmation that 10 days was not enough to reissue the task with a lower #events? I can let it run to fail for this reason. |
Send message Joined: 18 Nov 17 Posts: 128 Credit: 55,475,876 RAC: 14,858 |
Why do you ask anybody else? No Multicore. |
Send message Joined: 24 Oct 04 Posts: 1166 Credit: 53,729,957 RAC: 51,044 |
I don't mind getting the long Theory tasks and have had many over the years here and at -dev But this is the first one like this https://lhcathome.cern.ch/lhcathome/result.php?resultid=410306756 Computer ID 10824117 Run time 3 hours 49 min 47 sec CPU time 22 hours 35 min 32 sec Validate state Valid Credit 119.70 I guess it wanted to be like the muti-core cms |
Send message Joined: 18 Nov 17 Posts: 128 Credit: 55,475,876 RAC: 14,858 |
Theory tasks usually start with #events = 100000. Not so rare cases. I've got another one: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=221982999 |
©2024 CERN