Message boards : ATLAS application : Very long tasks in the queue
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6

AuthorMessage
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 29865 - Posted: 7 Apr 2017, 13:15:17 UTC - in response to Message 29864.  

Coincidentally I also have one longrunner at the moment at 180 events processed per core, but it's from the longrunner task 10959636. Is it the same for you?

Sorry, it is already uploaded ...


Supporting BOINC, a great concept !
ID: 29865 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29867 - Posted: 7 Apr 2017, 17:05:55 UTC

Sorry, it is already uploaded ...

I guess this one, taskID=10959636: https://lhcathome.cern.ch/lhcathome/result.php?resultid=132850817
We are the product of random evolution.
ID: 29867 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 29868 - Posted: 7 Apr 2017, 17:30:12 UTC - in response to Message 29867.  

Sorry, it is already uploaded ...

I guess this one, taskID=10959636: https://lhcathome.cern.ch/lhcathome/result.php?resultid=132850817

Nope, I think it was this one: https://lhcathome.cern.ch/lhcathome/result.php?resultid=132801696
And yes, it has taskID=10959636


Supporting BOINC, a great concept !
ID: 29868 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,410,938
RAC: 102,390
Message 29869 - Posted: 7 Apr 2017, 19:22:16 UTC
Last modified: 7 Apr 2017, 19:22:31 UTC

I now also have a longrunner with taskID=10959636 - event no. 169 after 17:20 hours.

David, will these longrunners one day be in a separate selection category in the personal settings?
ID: 29869 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,158,128
RAC: 105,494
Message 29910 - Posted: 12 Apr 2017, 4:42:30 UTC

ID: 29910 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,410,938
RAC: 102,390
Message 29945 - Posted: 16 Apr 2017, 17:08:32 UTC

Since several days ago, the situation with these long tasks is the following (at least this is my experience):

I have noticed that all "Long runners" (which, when being downloaded, show a remaining time of a fews days) error out shortly after start. The other tasks (showing a remaining time of a few hours) are all going well.

So what I am doing now is: once such a "Long runner" is downloaded, I abort it immediately.

Remains to hope that these "Long Runners" will soon be able to be selected as a seperate sub-project, so that people can choose not to download these.
ID: 29945 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29953 - Posted: 17 Apr 2017, 18:14:33 UTC
Last modified: 17 Apr 2017, 18:15:07 UTC

Maybe a new beta application for long-runners could be started, for example after completing the merger with ATLAS@Home, so that testing those would be on a voluntary basis.
We are the product of random evolution.
ID: 29953 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,410,938
RAC: 102,390
Message 29960 - Posted: 18 Apr 2017, 15:53:22 UTC - in response to Message 29953.  

Maybe a new beta application for long-runners could be started, for example after completing the merger with ATLAS@Home, so that testing those would be on a voluntary basis.

I strongly endorse this proposal
ID: 29960 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 29963 - Posted: 18 Apr 2017, 17:05:10 UTC - in response to Message 29953.  

Maybe a new beta application for long-runners could be started, for example after completing the merger with ATLAS@Home, so that testing those would be on a voluntary basis.


https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4157&postid=29301#29301
ID: 29963 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,158,128
RAC: 105,494
Message 34677 - Posted: 15 Mar 2018, 21:44:06 UTC

The Download-File for the new tasks is 425 MByte?
ID: 34677 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,158,128
RAC: 105,494
Message 34686 - Posted: 17 Mar 2018, 11:09:49 UTC - in response to Message 29276.  

The current ATLAS tasks process 100 events, but as an experiment we have sent some tasks with 1000 events. We would like to see if it's possible to run tasks like these on ATLAS@Home because this is the same number of events each task processes on the ATLAS grid. It would make things a lot easier if the same tasks could run on ATLAS@Home as on the rest of the ATLAS grid.

These tasks will run 10 times longer than the other tasks and will generate an output file 10 times as large (500MB), so this may be an issue for those of you with low upload bandwidth. The advantage is that the initial download of 200MB is the same. Obviously using more cores will be better for these tasks, so they finish in a reasonable time.

To know if you are running one of these tasks and that it's not a regular "longrunner" you can check the stderr.txt in the slots directory - if it shows "Starting ATLAS job. (PandaID=xxx taskID: taskID=10959636)" then you got one. The regular tasks have taskID=10947180.

Please let us know your opinion in general about the length and data in/out requirements of ATLAS tasks. They are usually much shorter than the other vbox LHC projects - is this a good thing or would you prefer more consistency among the projects?


This is the beginning of this thread from David!

What's about 1.000 events running in nativeApp for Linux only?
There is not so a overhead and is running very stable. In the preferences is a flag useful therefore.
Is there a second chance......
ID: 34686 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 99
Credit: 30,619,757
RAC: 3,820
Message 34688 - Posted: 17 Mar 2018, 14:02:15 UTC - in response to Message 34686.  

The current ATLAS tasks process 100 events, but as an experiment we have sent some tasks with 1000 events. We would like to see if it's possible to run tasks like these on ATLAS@Home because this is the same number of events each task processes on the ATLAS grid. It would make things a lot easier if the same tasks could run on ATLAS@Home as on the rest of the ATLAS grid.

These tasks will run 10 times longer than the other tasks and will generate an output file 10 times as large (500MB), so this may be an issue for those of you with low upload bandwidth. The advantage is that the initial download of 200MB is the same. Obviously using more cores will be better for these tasks, so they finish in a reasonable time.

To know if you are running one of these tasks and that it's not a regular "longrunner" you can check the stderr.txt in the slots directory - if it shows "Starting ATLAS job. (PandaID=xxx taskID: taskID=10959636)" then you got one. The regular tasks have taskID=10947180.

Please let us know your opinion in general about the length and data in/out requirements of ATLAS tasks. They are usually much shorter than the other vbox LHC projects - is this a good thing or would you prefer more consistency among the projects?


This is the beginning of this thread from David!

What's about 1.000 events running in nativeApp for Linux only?
There is not so a overhead and is running very stable. In the preferences is a flag useful therefore.
Is there a second chance......

And I wish there were a way for ATLAS to identify the Windows machines that could process these successfully, and have an opt-in option for the user. I had no problems with these when they were available.
Thanks!
Regards,
Bob P.
ID: 34688 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 34689 - Posted: 17 Mar 2018, 15:32:10 UTC - in response to Message 34686.  

What's about 1.000 events running in nativeApp for Linux only?
There is not so a overhead and is running very stable. In the preferences is a flag useful therefore.
Is there a second chance......

Yes, that would be a good use for native ATLAS. I devote a Haswell machine that runs 24/7 (without VitualBox) to it, and it could use some long ones.
ID: 34689 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,410,938
RAC: 102,390
Message 34690 - Posted: 17 Mar 2018, 15:52:48 UTC - in response to Message 34688.  

And I wish there were a way for ATLAS to identify the Windows machines that could process these successfully, and have an opt-in option for the user. I had no problems with these when they were available.
I, too, would be much interested in crunching such tasks on my Windows machine (with either 3-core, 4-core or 8-core setting)!
ID: 34690 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,158,128
RAC: 105,494
Message 34691 - Posted: 17 Mar 2018, 17:30:51 UTC

This idea to let 1000 Collisions running with Linux-native App is NOT against Windows, sorry.

There are more Computer with running Linux-App to do this heavy work.
David had a statistic for one week shown in this thread:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4396&postid=33036#33036
ID: 34691 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,969,741
RAC: 136,700
Message 34692 - Posted: 17 Mar 2018, 18:34:38 UTC

The idea to hand out WUs with 1000 collisions has to be planned very carefully as there are some disadvantages:

1. ATLAS (native) does not reliably accept a suspend/continue signal from the BOINC client.
This must be solved first.

2. Once a host runs 1000-c-WUs it should only run those WUs.
A mix of very short and very long WUs would heavily disturb the average GFLOPS calculation for that host and as a consequence would lead to utopic credit calculations and (more important) unrealistic runtime estimations.
ID: 34692 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 99
Credit: 30,619,757
RAC: 3,820
Message 34693 - Posted: 17 Mar 2018, 18:44:36 UTC - in response to Message 34691.  

This idea to let 1000 Collisions running with Linux-native App is NOT against Windows, sorry.

There are more Computer with running Linux-App to do this heavy work.
David had a statistic for one week shown in this thread:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4396&postid=33036#33036

OK, I see your point. Whatever works best for the project is OK with me.
Regards,
Bob P.
ID: 34693 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,503,137
RAC: 3,956
Message 34699 - Posted: 18 Mar 2018, 20:43:16 UTC - in response to Message 29361.  
Last modified: 18 Mar 2018, 20:47:12 UTC

over night 8 Longrunners have been finished and succesfull validated


Those 4,000+ credit tasks do look nice Yeti

https://lhcathome.cern.ch/lhcathome/results.php?hostid=10359162

So far my best: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126170665

6,665.59 credits *smile*




I have got several of those over at -dev with those monster sized Atlas tasks (over 5000 credits each)
Only problem is the d/l of those tasks is over 426MB per task so multiply that by 10 and you can imagine how long it takes to just d/l the tasks and they run Valids much faster than the d/l time.

I used up ALL of my high-speed ISP data d/l for the month in 4 days so I have the next 26 days running at snail speed.


(nothing like a 2 page thread)
Volunteer Mad Scientist For Life
ID: 34699 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,158,128
RAC: 105,494
Message 34704 - Posted: 19 Mar 2018, 8:19:34 UTC - in response to Message 34699.  

Only problem is the d/l of those tasks is over 426MB per task so multiply that by 10 and you can imagine how long it takes to just d/l the tasks and they run Valids much faster than the d/l time.

This 425 MByte are for 200 Events at the moment!
You can use this up to thursday only in -dev.
ID: 34704 · Report as offensive     Reply Quote
BelgianEnthousiast

Send message
Joined: 5 Apr 15
Posts: 18
Credit: 5,910,849
RAC: 0
Message 34784 - Posted: 28 Mar 2018, 19:17:04 UTC - in response to Message 29319.  

Hi,

I think it's a good idea indeed, it's similar to ClimatePrediction which I also run.

However, please don't run it on Ubuntu.
I installed it on Windows 10 and it conflicted straight away with LHC and VirtualBox.

I had to uninstall Ubuntu again and my system has since become much more unstable, prompting me to reboot
my system every 3 days or otherwise risk of crashing it. Something which GPUGrid really doesn't like and results
in lost WU's (which last around 6-9 hours) and it's a shame losing those if you're at 95 %...

Thanks to take that into consideration !
ID: 34784 · Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6

Message boards : ATLAS application : Very long tasks in the queue


©2024 CERN