Thread 'Very long tasks in the queue'

Author	Message
Yeti Volunteer moderator Send message Joined: 2 Sep 04 Posts: 468 Credit: 224,508,842 RAC: 124,398	Message 29865 - Posted: 7 Apr 2017, 13:15:17 UTC - in response to Message 29864. Coincidentally I also have one longrunner at the moment at 180 events processed per core, but it's from the longrunner task 10959636. Is it the same for you? Sorry, it is already uploaded ... Supporting BOINC, a great concept ! ID: 29865 · Reply Quote

HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 29867 - Posted: 7 Apr 2017, 17:05:55 UTC Sorry, it is already uploaded ... I guess this one, taskID=10959636: https://lhcathome.cern.ch/lhcathome/result.php?resultid=132850817 We are the product of random evolution. ID: 29867 · Reply Quote

Yeti Volunteer moderator Send message Joined: 2 Sep 04 Posts: 468 Credit: 224,508,842 RAC: 124,398	Message 29868 - Posted: 7 Apr 2017, 17:30:12 UTC - in response to Message 29867. Sorry, it is already uploaded ... I guess this one, taskID=10959636: https://lhcathome.cern.ch/lhcathome/result.php?resultid=132850817 Nope, I think it was this one: https://lhcathome.cern.ch/lhcathome/result.php?resultid=132801696 And yes, it has taskID=10959636 Supporting BOINC, a great concept ! ID: 29868 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1957 Credit: 158,780,193 RAC: 54,433	Message 29869 - Posted: 7 Apr 2017, 19:22:16 UTC Last modified: 7 Apr 2017, 19:22:31 UTC I now also have a longrunner with taskID=10959636 - event no. 169 after 17:20 hours. David, will these longrunners one day be in a separate selection category in the personal settings? ID: 29869 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2285 Credit: 178,823,324 RAC: 773	Message 29910 - Posted: 12 Apr 2017, 4:42:30 UTC fyi this is a longrunner with problems: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=64572687 ID: 29910 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1957 Credit: 158,780,193 RAC: 54,433	Message 29945 - Posted: 16 Apr 2017, 17:08:32 UTC Since several days ago, the situation with these long tasks is the following (at least this is my experience): I have noticed that all "Long runners" (which, when being downloaded, show a remaining time of a fews days) error out shortly after start. The other tasks (showing a remaining time of a few hours) are all going well. So what I am doing now is: once such a "Long runner" is downloaded, I abort it immediately. Remains to hope that these "Long Runners" will soon be able to be selected as a seperate sub-project, so that people can choose not to download these. ID: 29945 · Reply Quote

HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 29953 - Posted: 17 Apr 2017, 18:14:33 UTC Last modified: 17 Apr 2017, 18:15:07 UTC Maybe a new beta application for long-runners could be started, for example after completing the merger with ATLAS@Home, so that testing those would be on a voluntary basis. We are the product of random evolution. ID: 29953 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1957 Credit: 158,780,193 RAC: 54,433	Message 29960 - Posted: 18 Apr 2017, 15:53:22 UTC - in response to Message 29953. Maybe a new beta application for long-runners could be started, for example after completing the merger with ATLAS@Home, so that testing those would be on a voluntary basis. I strongly endorse this proposal ID: 29960 · Reply Quote

gyllic Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,659,192 RAC: 9	Message 29963 - Posted: 18 Apr 2017, 17:05:10 UTC - in response to Message 29953. Maybe a new beta application for long-runners could be started, for example after completing the merger with ATLAS@Home, so that testing those would be on a voluntary basis. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4157&postid=29301#29301 ID: 29963 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2285 Credit: 178,823,324 RAC: 773	Message 34677 - Posted: 15 Mar 2018, 21:44:06 UTC The Download-File for the new tasks is 425 MByte? ID: 34677 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2285 Credit: 178,823,324 RAC: 773	Message 34686 - Posted: 17 Mar 2018, 11:09:49 UTC - in response to Message 29276. The current ATLAS tasks process 100 events, but as an experiment we have sent some tasks with 1000 events. We would like to see if it's possible to run tasks like these on ATLAS@Home because this is the same number of events each task processes on the ATLAS grid. It would make things a lot easier if the same tasks could run on ATLAS@Home as on the rest of the ATLAS grid. These tasks will run 10 times longer than the other tasks and will generate an output file 10 times as large (500MB), so this may be an issue for those of you with low upload bandwidth. The advantage is that the initial download of 200MB is the same. Obviously using more cores will be better for these tasks, so they finish in a reasonable time. To know if you are running one of these tasks and that it's not a regular "longrunner" you can check the stderr.txt in the slots directory - if it shows "Starting ATLAS job. (PandaID=xxx taskID: taskID=10959636)" then you got one. The regular tasks have taskID=10947180. Please let us know your opinion in general about the length and data in/out requirements of ATLAS tasks. They are usually much shorter than the other vbox LHC projects - is this a good thing or would you prefer more consistency among the projects? This is the beginning of this thread from David! What's about 1.000 events running in nativeApp for Linux only? There is not so a overhead and is running very stable. In the preferences is a flag useful therefore. Is there a second chance...... ID: 34686 · Reply Quote

rbpeake Send message Joined: 17 Sep 04 Posts: 106 Credit: 36,549,147 RAC: 2	Message 34688 - Posted: 17 Mar 2018, 14:02:15 UTC - in response to Message 34686. The current ATLAS tasks process 100 events, but as an experiment we have sent some tasks with 1000 events. We would like to see if it's possible to run tasks like these on ATLAS@Home because this is the same number of events each task processes on the ATLAS grid. It would make things a lot easier if the same tasks could run on ATLAS@Home as on the rest of the ATLAS grid. These tasks will run 10 times longer than the other tasks and will generate an output file 10 times as large (500MB), so this may be an issue for those of you with low upload bandwidth. The advantage is that the initial download of 200MB is the same. Obviously using more cores will be better for these tasks, so they finish in a reasonable time. To know if you are running one of these tasks and that it's not a regular "longrunner" you can check the stderr.txt in the slots directory - if it shows "Starting ATLAS job. (PandaID=xxx taskID: taskID=10959636)" then you got one. The regular tasks have taskID=10947180. Please let us know your opinion in general about the length and data in/out requirements of ATLAS tasks. They are usually much shorter than the other vbox LHC projects - is this a good thing or would you prefer more consistency among the projects? This is the beginning of this thread from David! What's about 1.000 events running in nativeApp for Linux only? There is not so a overhead and is running very stable. In the preferences is a flag useful therefore. Is there a second chance...... And I wish there were a way for ATLAS to identify the Windows machines that could process these successfully, and have an opt-in option for the user. I had no problems with these when they were available. Thanks! Regards, Bob P. ID: 34688 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 34689 - Posted: 17 Mar 2018, 15:32:10 UTC - in response to Message 34686. What's about 1.000 events running in nativeApp for Linux only? There is not so a overhead and is running very stable. In the preferences is a flag useful therefore. Is there a second chance...... Yes, that would be a good use for native ATLAS. I devote a Haswell machine that runs 24/7 (without VitualBox) to it, and it could use some long ones. ID: 34689 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1957 Credit: 158,780,193 RAC: 54,433	Message 34690 - Posted: 17 Mar 2018, 15:52:48 UTC - in response to Message 34688. And I wish there were a way for ATLAS to identify the Windows machines that could process these successfully, and have an opt-in option for the user. I had no problems with these when they were available. I, too, would be much interested in crunching such tasks on my Windows machine (with either 3-core, 4-core or 8-core setting)! ID: 34690 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2285 Credit: 178,823,324 RAC: 773	Message 34691 - Posted: 17 Mar 2018, 17:30:51 UTC This idea to let 1000 Collisions running with Linux-native App is NOT against Windows, sorry. There are more Computer with running Linux-App to do this heavy work. David had a statistic for one week shown in this thread: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4396&postid=33036#33036 ID: 34691 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,214,724 RAC: 20,910	Message 34692 - Posted: 17 Mar 2018, 18:34:38 UTC The idea to hand out WUs with 1000 collisions has to be planned very carefully as there are some disadvantages: 1. ATLAS (native) does not reliably accept a suspend/continue signal from the BOINC client. This must be solved first. 2. Once a host runs 1000-c-WUs it should only run those WUs. A mix of very short and very long WUs would heavily disturb the average GFLOPS calculation for that host and as a consequence would lead to utopic credit calculations and (more important) unrealistic runtime estimations. ID: 34692 · Reply Quote

rbpeake Send message Joined: 17 Sep 04 Posts: 106 Credit: 36,549,147 RAC: 2	Message 34693 - Posted: 17 Mar 2018, 18:44:36 UTC - in response to Message 34691. This idea to let 1000 Collisions running with Linux-native App is NOT against Windows, sorry. There are more Computer with running Linux-App to do this heavy work. David had a statistic for one week shown in this thread: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4396&postid=33036#33036 OK, I see your point. Whatever works best for the project is OK with me. Regards, Bob P. ID: 34693 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1276 Credit: 94,894,450 RAC: 54,471	Message 34699 - Posted: 18 Mar 2018, 20:43:16 UTC - in response to Message 29361. Last modified: 18 Mar 2018, 20:47:12 UTC over night 8 Longrunners have been finished and succesfull validated Those 4,000+ credit tasks do look nice Yeti https://lhcathome.cern.ch/lhcathome/results.php?hostid=10359162 So far my best: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126170665 6,665.59 credits smile I have got several of those over at -dev with those monster sized Atlas tasks (over 5000 credits each) Only problem is the d/l of those tasks is over 426MB per task so multiply that by 10 and you can imagine how long it takes to just d/l the tasks and they run Valids much faster than the d/l time. I used up ALL of my high-speed ISP data d/l for the month in 4 days so I have the next 26 days running at snail speed. (nothing like a 2 page thread) Volunteer Mad Scientist For Life unbelievable are you trying to promote linux again? ID: 34699 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2285 Credit: 178,823,324 RAC: 773	Message 34704 - Posted: 19 Mar 2018, 8:19:34 UTC - in response to Message 34699. Only problem is the d/l of those tasks is over 426MB per task so multiply that by 10 and you can imagine how long it takes to just d/l the tasks and they run Valids much faster than the d/l time. This 425 MByte are for 200 Events at the moment! You can use this up to thursday only in -dev. ID: 34704 · Reply Quote

BelgianEnthousiast Send message Joined: 5 Apr 15 Posts: 18 Credit: 5,910,849 RAC: 0	Message 34784 - Posted: 28 Mar 2018, 19:17:04 UTC - in response to Message 29319. Hi, I think it's a good idea indeed, it's similar to ClimatePrediction which I also run. However, please don't run it on Ubuntu. I installed it on Windows 10 and it conflicted straight away with LHC and VirtualBox. I had to uninstall Ubuntu again and my system has since become much more unstable, prompting me to reboot my system every 3 days or otherwise risk of crashing it. Something which GPUGrid really doesn't like and results in lost WU's (which last around 6-9 hours) and it's a shame losing those if you're at 95 %... Thanks to take that into consideration ! ID: 34784 · Reply Quote