Message boards : ATLAS application : queue is empty
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 11 · Next

AuthorMessage
maeax

Send message
Joined: 2 May 07
Posts: 2050
Credit: 151,660,672
RAC: 44,454
Message 48263 - Posted: 28 Jun 2023, 12:52:14 UTC

-dev have a small number of tasks over the last days.
So, the Atlas-Team is busy.
ID: 48263 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 60
Credit: 20,793,215
RAC: 4,252
Message 48265 - Posted: 29 Jun 2023, 7:58:22 UTC

I tried requesting an access code for -dev but it seems to have gotten lost in the aether

Still no Atlas tasks for some time it seems.
ID: 48265 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 694
Message 48299 - Posted: 21 Jul 2023, 19:14:53 UTC - in response to Message 48263.  

-dev have a small number of tasks over the last days.
So, the Atlas-Team is busy.
3 weeks later nothing on mainstream. Anybody know anything?
ID: 48299 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 673
Credit: 42,743,358
RAC: 27,775
Message 48300 - Posted: 21 Jul 2023, 19:45:21 UTC

The job graphs shows that BOINC-TEST system is getting a few jobs constantly. What ever that system is or does, I don't know. Also the whole Atlas grid is getting hundreds of thousands jobs every day. So work is available but just not for us at Boinc_mcore. https://lhcathome.cern.ch/lhcathome/atlas_job.php
ID: 48300 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 694
Message 48301 - Posted: 21 Jul 2023, 19:59:55 UTC - in response to Message 48300.  
Last modified: 21 Jul 2023, 20:00:43 UTC

The job graphs shows that BOINC-TEST system is getting a few jobs constantly. What ever that system is or does, I don't know. Also the whole Atlas grid is getting hundreds of thousands jobs every day. So work is available but just not for us at Boinc_mcore. https://lhcathome.cern.ch/lhcathome/atlas_job.php
Perhaps somebody (the new guy? someone just left?) has decided to adjust the Boinc system a little and is testing things. So Atlas as a whole continues, but we're getting an upgrade/change? I shall play on CMS meanwhile (and Theory for when my pathetic 7Mbit upload isn't fast enough).
ID: 48301 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 694
Message 48302 - Posted: 21 Jul 2023, 20:04:04 UTC

I see that page's equivalant for CMS requires a login for some reason.

I also see the Theory one is entirely Boinc?
ID: 48302 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2050
Credit: 151,660,672
RAC: 44,454
Message 48529 - Posted: 11 Sep 2023, 10:47:33 UTC - in response to Message 48528.  

There are no new tasks.
What you see are tasks with timelimit and restarted by other Volunteers.
ID: 48529 · Report as offensive     Reply Quote
LRZ-LMU

Send message
Joined: 4 May 17
Posts: 5
Credit: 118,785,284
RAC: 0
Message 48530 - Posted: 11 Sep 2023, 11:31:06 UTC - in response to Message 48529.  

Not sure whether it was communicated but the guy who set this up and was looking after it all left ATLAS. It left a knowledge hole and the lack of a responsible. Lots was documented so we should be able to get it going. The immediate problem was that there was a switch of release version, that needs a new image, ideally.
I tested submission of a few jobs with the new release, but without a new image. It means the SW is pulled via cvmfs, so more io.
Also, it is easiest for us to run 2000 events per job, rather than the 200 previously. The SW is twice as fast so this is rather a factor 5 on the job length.
Are either of these a problem for you guys? If not then I think we can get going immediately - at least as an interim fix.
ID: 48530 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2050
Credit: 151,660,672
RAC: 44,454
Message 48532 - Posted: 11 Sep 2023, 14:35:17 UTC - in response to Message 48530.  

2.000 instead of 200 events is useful for take a test.
ID: 48532 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2366
Credit: 219,173,949
RAC: 152,402
Message 48533 - Posted: 11 Sep 2023, 14:53:49 UTC - in response to Message 48530.  

Longer runtimes are usually not a problem if the tasks can write checkpoints and resume from there.
At least pause/resume should be supported.

IIRC ATLAS always starts a task from scratch when it resumes from a break.
Is this still the case with the recent version?

If yes tasks running much more events are likely to produce a much higher failure rate on less powerful computers.
ID: 48533 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 99
Credit: 30,434,493
RAC: 17,985
Message 48534 - Posted: 11 Sep 2023, 15:11:11 UTC - in response to Message 48530.  

Sounds good to me.

Thanks for picking this up this project!
Regards,
Bob P.
ID: 48534 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2050
Credit: 151,660,672
RAC: 44,454
Message 48535 - Posted: 11 Sep 2023, 15:30:44 UTC - in response to Message 48530.  
Last modified: 11 Sep 2023, 15:33:41 UTC

-dev using 1.000 events in 03.2021
Laufzeit 6 Stunden 7 min. 31 sek.
CPU Zeit 2 Tage 9 Stunden 7 min. 39 sek.
ATLAS long simulation v1.01 (long_native_mt)
ID: 48535 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 54
Credit: 7,366,651
RAC: 16,618
Message 48538 - Posted: 11 Sep 2023, 20:35:59 UTC - in response to Message 48534.  

Sounds good to me.

Thanks for picking this up this project!

+923875

Let's get going!
ID: 48538 · Report as offensive     Reply Quote
LRZ-LMU

Send message
Joined: 4 May 17
Posts: 5
Credit: 118,785,284
RAC: 0
Message 48540 - Posted: 12 Sep 2023, 12:58:30 UTC - in response to Message 48538.  

It is certainly the case that there is no checkpoint. However a suspended job should survive and restart ok. So as long as the computer is powered-on there should be no problem with long jobs. It should also survive the VM or machine being suspended, just not a re-boot, unless Boinc can delay the reboot to suspend the VM.
I expect David had a good reason for using 200events though. I`m not very familiar with Boinc - help me out!
ID: 48540 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2366
Credit: 219,173,949
RAC: 152,402
Message 48541 - Posted: 12 Sep 2023, 13:31:51 UTC - in response to Message 48540.  

The reason for the 200 event limit was indeed to keep the total runtime within a limit most volunteers can live with.
As said computers running 24/7 should not have a problem, especially if they run a setup that uses more cores.
Their volunteers usually vote for more events.

Other volunteers vote for a total runtime within less than a work day (6-8 h) so their computers can be shut down and restarted next day.
That's what ATLAS does not properly handle if you send out huge tasks.
There may be tasks close to finish that restart a couple of times every day and finally fail when they reach the maximum runtime limit.

Some volunteers then claim to use a setup with more cores per task but this is also not possible on all kind of computers, even nowadays.
ATLAS runs mostly 1 core at the beginning and at the end of a task which confuses inexperienced volunteers.
They may claim it doesn't work correctly when the cpu count shown in BOINC is not fully used.

What you definitely should avoid is to send out tasks with variable #events.
This would make BOINC's runtime estimation and credit calculation useless and has a bad impact also on other projects on the same computer.
ID: 48541 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 99
Credit: 30,434,493
RAC: 17,985
Message 48543 - Posted: 13 Sep 2023, 14:20:55 UTC - in response to Message 48541.  

I recall that with the faster processing time, David was considering a 500-event work unit.
Regards,
Bob P.
ID: 48543 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2050
Credit: 151,660,672
RAC: 44,454
Message 48544 - Posted: 14 Sep 2023, 10:04:01 UTC

First Atlas Win11pro with 400 events finished successful,
Laufzeit 1 Stunden 26 min. 42 sek.
CPU Zeit 8 Stunden 16 min. 1 sek.
ID: 48544 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 54
Credit: 7,366,651
RAC: 16,618
Message 48546 - Posted: 15 Sep 2023, 0:25:13 UTC - in response to Message 48544.  

First Atlas Win11pro with 400 events finished successful,
Laufzeit 1 Stunden 26 min. 42 sek.
CPU Zeit 8 Stunden 16 min. 1 sek.

Lucky you. I have 4 Atlas tasks running right now. They've been running for 2 days 10 hours, and still have 2 hours to go.
ID: 48546 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 694
Message 48547 - Posted: 15 Sep 2023, 0:44:52 UTC - in response to Message 48546.  

hardon wrote:
Lucky you. I have 4 Atlas tasks running right now. They've been running for 2 days 10 hours, and still have 2 hours to go.
17.5 hours total to run here on a Ryzen 9 3900XT, unfortunately boinc has decided to run something else and interrupted it. I set them to stay in memory while suspended which really helps.
ID: 48547 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 54
Credit: 7,366,651
RAC: 16,618
Message 48548 - Posted: 15 Sep 2023, 1:55:20 UTC - in response to Message 48547.  

hadron wrote:
Lucky you. I have 4 Atlas tasks running right now. They've been running for 2 days 10 hours, and still have 2 hours to go.
17.5 hours total to run here on a Ryzen 9 3900XT, unfortunately boinc has decided to run something else and interrupted it. I set them to stay in memory while suspended which really helps.

Are you overclocking your CPU? Or maybe running each task on multiple threads?
I have a Ryzen 9 5900X which has almost the same base frequency as the 3900XT, so I would expect your times and mine should be roughly equal.
ID: 48548 · Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 11 · Next

Message boards : ATLAS application : queue is empty


©2024 CERN