Message boards :
ATLAS application :
queue is empty
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 12 · Next
Author | Message |
---|---|
Send message Joined: 2 May 07 Posts: 2189 Credit: 173,308,789 RAC: 66,579 |
-dev have a small number of tasks over the last days. So, the Atlas-Team is busy. |
Send message Joined: 7 Aug 11 Posts: 93 Credit: 23,508,797 RAC: 21,573 |
I tried requesting an access code for -dev but it seems to have gotten lost in the aether Still no Atlas tasks for some time it seems. |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 10,246,033 RAC: 16,382 |
-dev have a small number of tasks over the last days.3 weeks later nothing on mainstream. Anybody know anything? |
Send message Joined: 28 Sep 04 Posts: 709 Credit: 47,402,633 RAC: 26,931 |
The job graphs shows that BOINC-TEST system is getting a few jobs constantly. What ever that system is or does, I don't know. Also the whole Atlas grid is getting hundreds of thousands jobs every day. So work is available but just not for us at Boinc_mcore. https://lhcathome.cern.ch/lhcathome/atlas_job.php |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 10,246,033 RAC: 16,382 |
The job graphs shows that BOINC-TEST system is getting a few jobs constantly. What ever that system is or does, I don't know. Also the whole Atlas grid is getting hundreds of thousands jobs every day. So work is available but just not for us at Boinc_mcore. https://lhcathome.cern.ch/lhcathome/atlas_job.phpPerhaps somebody (the new guy? someone just left?) has decided to adjust the Boinc system a little and is testing things. So Atlas as a whole continues, but we're getting an upgrade/change? I shall play on CMS meanwhile (and Theory for when my pathetic 7Mbit upload isn't fast enough). |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 10,246,033 RAC: 16,382 |
I see that page's equivalant for CMS requires a login for some reason. I also see the Theory one is entirely Boinc? |
Send message Joined: 2 May 07 Posts: 2189 Credit: 173,308,789 RAC: 66,579 |
There are no new tasks. What you see are tasks with timelimit and restarted by other Volunteers. |
Send message Joined: 4 May 17 Posts: 5 Credit: 118,785,284 RAC: 0 |
Not sure whether it was communicated but the guy who set this up and was looking after it all left ATLAS. It left a knowledge hole and the lack of a responsible. Lots was documented so we should be able to get it going. The immediate problem was that there was a switch of release version, that needs a new image, ideally. I tested submission of a few jobs with the new release, but without a new image. It means the SW is pulled via cvmfs, so more io. Also, it is easiest for us to run 2000 events per job, rather than the 200 previously. The SW is twice as fast so this is rather a factor 5 on the job length. Are either of these a problem for you guys? If not then I think we can get going immediately - at least as an interim fix. |
Send message Joined: 2 May 07 Posts: 2189 Credit: 173,308,789 RAC: 66,579 |
2.000 instead of 200 events is useful for take a test. |
Send message Joined: 15 Jun 08 Posts: 2500 Credit: 248,490,468 RAC: 127,618 |
Longer runtimes are usually not a problem if the tasks can write checkpoints and resume from there. At least pause/resume should be supported. IIRC ATLAS always starts a task from scratch when it resumes from a break. Is this still the case with the recent version? If yes tasks running much more events are likely to produce a much higher failure rate on less powerful computers. |
Send message Joined: 17 Sep 04 Posts: 104 Credit: 32,678,305 RAC: 3,431 |
Sounds good to me. Thanks for picking this up this project! Regards, Bob P. |
Send message Joined: 2 May 07 Posts: 2189 Credit: 173,308,789 RAC: 66,579 |
-dev using 1.000 events in 03.2021 Laufzeit 6 Stunden 7 min. 31 sek. CPU Zeit 2 Tage 9 Stunden 7 min. 39 sek. ATLAS long simulation v1.01 (long_native_mt) |
Send message Joined: 4 Sep 22 Posts: 90 Credit: 14,482,422 RAC: 17,494 |
Sounds good to me. +923875 Let's get going! |
Send message Joined: 4 May 17 Posts: 5 Credit: 118,785,284 RAC: 0 |
It is certainly the case that there is no checkpoint. However a suspended job should survive and restart ok. So as long as the computer is powered-on there should be no problem with long jobs. It should also survive the VM or machine being suspended, just not a re-boot, unless Boinc can delay the reboot to suspend the VM. I expect David had a good reason for using 200events though. I`m not very familiar with Boinc - help me out! |
Send message Joined: 15 Jun 08 Posts: 2500 Credit: 248,490,468 RAC: 127,618 |
The reason for the 200 event limit was indeed to keep the total runtime within a limit most volunteers can live with. As said computers running 24/7 should not have a problem, especially if they run a setup that uses more cores. Their volunteers usually vote for more events. Other volunteers vote for a total runtime within less than a work day (6-8 h) so their computers can be shut down and restarted next day. That's what ATLAS does not properly handle if you send out huge tasks. There may be tasks close to finish that restart a couple of times every day and finally fail when they reach the maximum runtime limit. Some volunteers then claim to use a setup with more cores per task but this is also not possible on all kind of computers, even nowadays. ATLAS runs mostly 1 core at the beginning and at the end of a task which confuses inexperienced volunteers. They may claim it doesn't work correctly when the cpu count shown in BOINC is not fully used. What you definitely should avoid is to send out tasks with variable #events. This would make BOINC's runtime estimation and credit calculation useless and has a bad impact also on other projects on the same computer. |
Send message Joined: 17 Sep 04 Posts: 104 Credit: 32,678,305 RAC: 3,431 |
I recall that with the faster processing time, David was considering a 500-event work unit. Regards, Bob P. |
Send message Joined: 2 May 07 Posts: 2189 Credit: 173,308,789 RAC: 66,579 |
First Atlas Win11pro with 400 events finished successful, Laufzeit 1 Stunden 26 min. 42 sek. CPU Zeit 8 Stunden 16 min. 1 sek. |
Send message Joined: 4 Sep 22 Posts: 90 Credit: 14,482,422 RAC: 17,494 |
First Atlas Win11pro with 400 events finished successful, Lucky you. I have 4 Atlas tasks running right now. They've been running for 2 days 10 hours, and still have 2 hours to go. |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 10,246,033 RAC: 16,382 |
hardon wrote: Lucky you. I have 4 Atlas tasks running right now. They've been running for 2 days 10 hours, and still have 2 hours to go.17.5 hours total to run here on a Ryzen 9 3900XT, unfortunately boinc has decided to run something else and interrupted it. I set them to stay in memory while suspended which really helps. |
Send message Joined: 4 Sep 22 Posts: 90 Credit: 14,482,422 RAC: 17,494 |
hadron wrote:Lucky you. I have 4 Atlas tasks running right now. They've been running for 2 days 10 hours, and still have 2 hours to go.17.5 hours total to run here on a Ryzen 9 3900XT, unfortunately boinc has decided to run something else and interrupted it. I set them to stay in memory while suspended which really helps. Are you overclocking your CPU? Or maybe running each task on multiple threads? I have a Ryzen 9 5900X which has almost the same base frequency as the 3900XT, so I would expect your times and mine should be roughly equal. |
©2024 CERN