Very long Tasks

Author	Message
maeax Send message Joined: 2 May 07 Posts: 2278 Credit: 178,775,457 RAC: 1,891	Message 39139 - Posted: 17 Jun 2019, 15:57:51 UTC 1.9 PetaFlops and more than 24 hours Duration-Tme. Is this ok? https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=116598728 w-c1_job.B2topenergy.b6offIRon_c1.1707__3__s__62.31_60.32__14.1_16.1__7__79.5_1_sixvf_boinc1174 or workspace1_HEL_Qp_2_MO_m150_2t_3s__14__s__62.31_60.32__4_6__6__35_1_sixvf_boinc4954 ID: 39139 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 790 Credit: 61,751,415 RAC: 50,771	Message 39142 - Posted: 17 Jun 2019, 17:58:01 UTC I've got a couple of those as well. Estimated runtime about 33 hours. One has been running 50 minutes <5% progress and that gives about 18.8 hours if progress keeps steady. ID: 39142 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 790 Credit: 61,751,415 RAC: 50,771	Message 39143 - Posted: 17 Jun 2019, 19:55:30 UTC - in response to Message 39142. Looks like these aren't that long in the end. Current progress indicates about 7.5 hours of runtime. ID: 39143 · Reply Quote

Ray Murray Volunteer moderator Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,888,115 RAC: 14	Message 39144 - Posted: 17 Jun 2019, 20:28:31 UTC Last modified: 17 Jun 2019, 20:40:19 UTC Longest one for me was 15hrs 29mins on the 2.6GHz i5 which ran 8hrs 19mins on the wingman's 2.8GHz i7. I've also had a few 14+hrs tasks. This is all good as it shows a stable (therefore more useful) beam configuration rather than short runners which hit the wall prematurely. Just followed the link in the original post which points to a WU of only 19 seconds? Is the link wrong? Did the WU eventually finish? ID: 39144 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2278 Credit: 178,775,457 RAC: 1,891	Message 39145 - Posted: 18 Jun 2019, 5:22:00 UTC Last modified: 18 Jun 2019, 5:35:11 UTC First Longrunner after 10 hours with 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED ended https://lhcathome.cern.ch/lhcathome/result.php?resultid=232584711 Sa 15 Jun 2019 09:16:02 CEST \| \| OS: Linux CentOS Linux: CentOS Linux 7 (Core) [3.10.0-693.el7.x86_64\|libc 2.17 (GNU libc)] Sa 15 Jun 2019 09:16:02 CEST \| \| Memory: 8.17 GB physical, 1.90 GB virtual Sa 15 Jun 2019 09:16:02 CEST \| \| Disk: 16.08 GB total, 11.11 GB free ID: 39145 · Reply Quote

Alessio Mereghetti Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0	Message 39147 - Posted: 18 Jun 2019, 8:15:50 UTC - in response to Message 39145. Hi all, thanks for noticing this and giving feedback. The WU named "w-c1_job.B2topenergy.b6offIRon_c1.1707__7__s__62.31_60.32__8.1_10.1__7__21_1_sixvf_boinc2610.zip" belongs to a study were tracking is performed for 10^7 turns - corresponding to 800s of beam in the LHC. In general we simulate 10^5 or 10^6; 10^7 is pretty unusual, even though it is a time scale we would like to hit sooner or later. I think that the reason for the EXIT_DISK_LIMIT_EXCEEDED error is the fact that the user requested a dump of the beam coordinates every 50k turns, and the file collecting those data grows up until the total disk space we request (~200MB) is filled up. I fear that we have to kill those tasks, and resubmit them with an updated result template file - I am running the same task locally to better estimate the requirement. For the other task, I cannot spot anything odd at first sight - I have downloaded and am running the task locally, to see if there is anything wrong with that. I will keep you posted, Cheers, A. ID: 39147 · Reply Quote

Ray Murray Volunteer moderator Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,888,115 RAC: 14	Message 39148 - Posted: 18 Jun 2019, 12:13:56 UTC Last modified: 18 Jun 2019, 13:03:08 UTC Hi Alessio, I hadn't spotted the ... _7_... I have w-c2_job.B2topenergy.b6onIRon_c2.1707__8__s__62.31_60.32__4.1_6.1__7__84_1_sixvf_boinc2947 which is currently 5% in after 5.5hrs with a little over 4 days remaining. My hosts run 24/7 so that's fine. Slot size is 11.9MB, the same as other "normal" WUs, so no size issue yet. Is there a duration limit as well as size limit? Is there anything I could edit at this end to increase the limits before they are reached? [also, hiding on the other machine w-c1_job.B2topenergy.b6offIRon_c1.1707__10__s__62.31_60.32__8.1_10.1__7__69_1_sixvf_boinc3881 at 22% after 15hrs] Found what I was looking for, hopefully. init_data Would it be possible to exit Boinc and simply edit <rsc_disk_bound>200000000.000000</rsc_disk_bound> to place an extra zero in there? Simple answer, NO, it gets reset to original value, but at least I didn't kill it. Did the same edit while Active and it's accepted that but don't know if the init_data gets read from again while the job is active so might not have any effect. ID: 39148 · Reply Quote

Alessio Mereghetti Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0	Message 39149 - Posted: 18 Jun 2019, 13:39:56 UTC - in response to Message 39148. Hi Ray, Interesting discovery, but honestly I don't have direct experience on this, so I cannot tell you much. The only meaningful info I have found around is not super-encouraging: [url][https://boinc.mundayweb.com/wiki/index.php?title=Maximum_disk_space_exceeded/url] Maybe the IT guys can give some further insights. Concerning the max time, we give a week of max return time before a task is declared as obsolete and a brand new one is issued to another volunteer. If the task lasts less than a day, the limit is fine. If it takes 4 days, it is a bit short, evidently... It is somehow strange that your task at 5% takes such a small disk space - it should grow roughly like 10MB every 1% ... ID: 39149 · Reply Quote

Ray Murray Volunteer moderator Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,888,115 RAC: 14	Message 39151 - Posted: 18 Jun 2019, 20:20:20 UTC Last modified: 18 Jun 2019, 20:22:42 UTC I may have gotten the slot size wrong as it was up to 55MB when I had to leave and shows a Peak disk usage of 60 when it was cancelled but the other one cited shows peak usage of only 11.98MB after its 25?% and 18hrs. The other topenergy ones I have had have been of higher amplitude and therefore finished much sooner. I knew there was a return deadline but I didn't know if there was a job_duration deadline (it would appear not) such as is in place for Theory jobs that have a return deadline set but will self-terminate after 18hrs of runtime, which can be increased by the user editing the Theory xml. ID: 39151 · Reply Quote

SAHJ@H Send message Joined: 8 Aug 05 Posts: 1 Credit: 639,261 RAC: 1	Message 39152 - Posted: 19 Jun 2019, 7:42:15 UTC Last modified: 19 Jun 2019, 7:44:44 UTC Thanks for killing the task w-c2_job.B2topenergy.b6onIRon_c2.1707__5__s__62.31_60.32__8.1_10.1__7__58.5_1_sixvf_boinc1809_0 that ended normally on 18 Jun 2019, 20:04:07 UTC after 18.5 hrs of CPU time. ID: 39152 · Reply Quote

Alessio Mereghetti Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0	Message 39153 - Posted: 19 Jun 2019, 12:40:50 UTC - in response to Message 39152. I terribly apology for that - it was not a decision taken easily or instantly, but I wanted to avoid having other 5 volunteers per long WU complaining. ID: 39153 · Reply Quote

AlphaC Send message Joined: 6 Sep 13 Posts: 5 Credit: 1,286,288 RAC: 0	Message 39198 - Posted: 26 Jun 2019, 20:52:42 UTC - in response to Message 39153. Last modified: 26 Jun 2019, 20:56:03 UTC Is there still a recurring problem? Just aborted 5 tasks with estimate 155 days compute time. All other tasks have ETA 2 hours and 30 minutes roughly. At first I thought my memory settings might have become unstable due to BCLK modification but I checked with google stressapp for 3 hours to confirm stability and Prime95 didn't have any errors either. ID: 39198 · Reply Quote

Alessio Mereghetti Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0	Message 39199 - Posted: 27 Jun 2019, 7:22:24 UTC - in response to Message 39198. could you point me to the concerned tasks? I cannot get them - your computers are 'hidden' ID: 39199 · Reply Quote

AlphaC Send message Joined: 6 Sep 13 Posts: 5 Credit: 1,286,288 RAC: 0	Message 39206 - Posted: 27 Jun 2019, 17:09:27 UTC - in response to Message 39199. Sorry about that I PMed you the Workunits in question. I keep PCs hidden for security reasons , I figure that admins can use the backend. ID: 39206 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2710 Credit: 291,951,631 RAC: 144,778	Message 39207 - Posted: 27 Jun 2019, 17:38:46 UTC - in response to Message 39206. I keep PCs hidden for security reasons Of course it's your decision whether to hide your hosts or not, but you must not tell anybody that to hide them increases security. That's just a myth. ID: 39207 · Reply Quote

AlphaC Send message Joined: 6 Sep 13 Posts: 5 Credit: 1,286,288 RAC: 0	Message 39211 - Posted: 27 Jun 2019, 19:33:43 UTC - in response to Message 39207. Going completely offtopic, but please tell me how it is a myth. If someone wants to find the host they can match credits if you use one host, however it is extra steps. I've run this app on at least 5 different machines so that doesn't apply to me as much. Putting a host up with your exact kernel / glibc / OS version is less obscure especially if you're not running in a VM. Even if you are running in a VM , Intel hardware hasn't been fully patched yet. ID: 39211 · Reply Quote

Win10 Send message Joined: 14 Feb 17 Posts: 1 Credit: 351,918 RAC: 0	Message 39214 - Posted: 27 Jun 2019, 21:52:57 UTC Last modified: 27 Jun 2019, 22:10:43 UTC I got a very long task yesterday, that failed after reaching 200 MB size on the local HDD. The name of the WU is w-c4_job.B1topenergy.b6offIRon_c4.1707__5__s__62.31_60.32__12.1_14.1__7__9_1_sixvf_boinc1894_3 https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=116651922 ID: 39214 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2278 Credit: 178,775,457 RAC: 1,891	Message 39233 - Posted: 30 Jun 2019, 15:49:56 UTC Longrunner in sixtracktest with 40 hours Cpu and 1.9 PetaFlops successful!. https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=117930714 ID: 39233 · Reply Quote

Alessio Mereghetti Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0	Message 39239 - Posted: 1 Jul 2019, 13:43:03 UTC - in response to Message 39233. Hello, maex and Win10, I got a very long task yesterday, that failed after reaching 200 MB size on the local HDD. The name of the WU is w-c4_job.B1topenergy.b6offIRon_c4.1707__5__s__62.31_60.32__12.1_14.1__7__9_1_sixvf_boinc1894_3 https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=116651922 I apology for that - that task belong to the series of extremely long jobs (10^7 turns) which were submitted with a wrong request of disk space. This task in particular was not killed by my sudden kill announced to the MB: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5064 The amplitude range covered by your job was outside the range for killing - sorry, I tried to kill as many as possible, trying to minimise the number of upset volunteers... Longrunner in sixtracktest with 40 hours Cpu and 1.9 PetaFlops successful!. https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=117930714 That is a new batch of extremely long jobs, with the correct request of disk space. I got one as well and crunched it correctly: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=117930647 I asked the user to submit few of these jobs on sixtracktest, just to check that everything was correctly set-up before making (again) a big mess. I think he can proceed. Thanks for the feedback, and keep up the good work! Happy crunching, A. ID: 39239 · Reply Quote

Filipe Send message Joined: 9 Aug 05 Posts: 36 Credit: 8,178,084 RAC: 469	Message 39246 - Posted: 3 Jul 2019, 15:19:28 UTC Aren't those long task more suited to GPU's? Long running time on CPU's ID: 39246 · Reply Quote

LHC@home