Message boards :
ATLAS application :
Atlas Simulation 1.01 (Vbox64) will not finish
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,235,262 RAC: 3,879 |
To all the rest a general comment: I just got 7 new tasks. Run time is set for 3hrs between tasks as I see they are supposed to complete in 3:25 roughly.If you got the 3:25 from the Remaining time in BOINC Manager then it's likely not very accurate. Yeah, I know, it shouldn't be that way but that's the way it is. The % complete figure is pretty much useless too. I strongly suggest boosting the switch between tasks time to 10 hours or more until you get a better idea of how long the tasks actually take, otherwise you're setting yourself up for more failed tasks. Changed to 4hrs between switching. That should cover these tasks, will see what I get after I am done with the 3:25 stuff. |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,235,262 RAC: 3,879 |
UAM, you are talking about stuff I have no idea how to do or where to find it. Nope. Show VM takes me to a black screen to log into their system or something like that. Need a username and password. Graphics, that takes me to the homepage of CERN. Now, when I open up VM via Windows, then I can see logs and so forth. No graphics. Since the task just started, there is nothing unusual to report. Just the usual setup tasks and starting of the computing. |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,235,262 RAC: 3,879 |
4hrs not long enough. Looks more like 6. (68.874% done) 1:38 remaining still. BOINC has put it into waiting to run status and moved on. 4hrs from now it will come back. Check pointing is set for every 60 seconds by the way. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
4hrs not long enough. Looks more like 6. (68.874% done) Ignore the percent done because it is BS. AGAIN... it should not be that way but it is. Focus on this fact and this fact alone... ATLAS tasks run until they process 200 events. When the task resumes the event counter will go back to 0 and it will attempt to process 200 events again. I can almost guarantee you 6 hours will not be enough. In fact maybe even 10 hours won't. So then what's going to happen? Well, it will suspend again and when it resumes the event counter will reset to 0. Yep, you should have set it to 10 or maybe even more. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
It seems that native ATLAS does not have this problem. I run three WUs at a time, with two cores per WU on my i7-4770 (Ubuntu 16.04). As a test, I suspended them for one minute, and then resumed them. They picked up with no problem where they left off. I don't know why VBox has this problem, but it is not inherent to ATLAS (I have LAIM enabled, if that matters). |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
It seems that native ATLAS does not have this problem. I run three WUs at a time, with two cores per WU on my i7-4770 (Ubuntu 16.04).I've found that if suspended for just 1 minute then native ATLAS will frequently resume from where they left off but not always. When suspended for more than a few minutes the event counter resets to 0. |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,235,262 RAC: 3,879 |
4hrs not long enough. Looks more like 6. (68.874% done) Interesting. Well I will have to take a look at that. Maybe for now I suspend all non ATLAS cpu tasks and hack off what I have in queue. BOINC has not let it resume yet. Is there anywhere to see how many events it has done? Note: Now that ATLAS has resumed, the % done is climbing steadily like .002-.003% per second. But remaining time clicks off 1 second every 2-3 seconds. |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,235,262 RAC: 3,879 |
Put all other CPU projects in suspend mode. Going to let ATLAS grind everything on its own as it wants. In the process of suspending everything I forgot to tell LHC host to not send any new work, so I picked up some theory and CMS stuff. But those should be the last things to process. Right now I have a total of 7 ATLAS tasks. 6 in waiting and one processing. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
Is there anywhere to see how many events it has done?I don't run ATLAS VBox anymore just ATLAS native so I can't say for sure but other responders in this thread seem to suggest there is a way. Perhaps review their suggestions? Note: Now that ATLAS has resumed, the % done is climbing steadily like .002-.003% per second. But remaining time clicks off 1 second every 2-3 seconds.That's the way they work and it's because the % done and remaining time are not calculated from the number of events processed. The numbers would be more accurate if they were calculated that way but it seems BOINC has no facility for doing so and probably never will. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
...so I picked up some theory and CMS stuff. But those should be the last things to process.I believe that's the way it's supposed to work but don't count on it. The only way to guarantee it is to suspend them, AFAIK. Anyway, step-by-step you are slowly discovering what others have learned already and have mentioned in this thread... running ATLAS tasks alongside other types of LHC tasks and/or tasks from other projects requires micro-managing. You'll get fairly decent success rate if you micro-manage well but the optimum configuration is to run ATLAS all by itself. Yeah, I know, then you need at least 2 hosts if you want to participate in other projects and it ain't supposed to work that way but... Right now I have a total of 7 ATLAS tasks. 6 in waiting and one processing.Remember each one of those ATLAS tasks requires a d/l of ~300 MB to ~400 MB. If things go wrong and your host can't crunch them before deadline then they get cancelled and you've wasted a good chunk of your monthly download limit (assuming your ISP has such a limit). You might want to adjust your LHC@home prefs such that you cache fewer ATLAS tasks. |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,859,285 RAC: 0 |
Is there anywhere to see how many events it has done? Show Console then ALT-F2 will show events and average times per event. Don't know how many times it does each group of events before moving on but the percentage progress isn't far away from the events/200 value. |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,235,262 RAC: 3,879 |
**Bronco**- No download limit. I am in Europe and have unlimited DSL for TV and Computer. Only my 4G is limited by the type of contract I use, but there are plenty of free wifi hubs that I have access to. |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,235,262 RAC: 3,879 |
Is there anywhere to see how many events it has done? Interesting, how do you scroll back up? And yes it seems to repeat events. Several numbers keep getting repeated. |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,235,262 RAC: 3,879 |
...so I picked up some theory and CMS stuff. But those should be the last things to process.I believe that's the way it's supposed to work but don't count on it. The only way to guarantee it is to suspend them, AFAIK. From what I am seeing ATLAS time remaining decreases by 1 second every 3 real time seconds. I reduced the total amount of tasks allowed by LHC in general to 5 down from unlimited. And there is now way I can afford the electric bill of two physical hosts, well at least not until we see if we have money for solar panels after the rest of the long term house renovation is done. Now if there were some micro solar panels with a battery pack that could power my computer that would be a big plus. Then maybe I could dedicate a second host to exclusive ATLAS tasks. For now, I just have to learn how it functions and micro manage it day by day. |
Send message Joined: 15 Jun 08 Posts: 2413 Credit: 226,473,792 RAC: 131,954 |
Interesting, how do you scroll back up? You can't. And yes it seems to repeat events. You may read this explanation: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4965&postid=38135 |
Send message Joined: 15 Jun 08 Posts: 2413 Credit: 226,473,792 RAC: 131,954 |
... Then maybe I could dedicate a second host to exclusive ATLAS tasks. Why not just a second BOINC client on the same host? There are lot's of howtos around. To find them you may ask your favorite search engine. |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,235,262 RAC: 3,879 |
... Then maybe I could dedicate a second host to exclusive ATLAS tasks. Interesting..but wouldn't that cause a conflict of resources (cpu) with ATLAS needing all the cores I allow BOINC to have and the other projects also wanting to use the total amount of cores I have allocated at the same time? |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,235,262 RAC: 3,879 |
Interesting, how do you scroll back up? This also explains why their is no ability for BOINC to accurately calculate the time. It looks like 3.25 hrs is more like 8 to max 9 hrs on my machine. Now at 88% with 7hrs 15 mins running and "about" an hour left in remaining time. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
It seems that native ATLAS does not have this problem. I run three WUs at a time, with two cores per WU on my i7-4770 (Ubuntu 16.04).I've found that if suspended for just 1 minute then native ATLAS will frequently resume from where they left off but not always. When suspended for more than a few minutes the event counter resets to 0. I was not sure what would happen with a longer pause, since I run that machine 24/7 and never see it suspend a work unit (native ATLAS is the only CPU job running). So I suspended all three that were running. Each had about 2 hours left to go, out of a 6 1/2 hour run. Then, I resumed them 5 hours later. Immediately, all three work units started uploading "results", so clearly something was amiss. But when I look at the Stderr output, it shows everything as normal. You can probably interpret what is going on better than I can. https://lhcathome.cern.ch/lhcathome/result.php?resultid=218866963 https://lhcathome.cern.ch/lhcathome/result.php?resultid=218866915 https://lhcathome.cern.ch/lhcathome/result.php?resultid=218873625 Almost as curious is that whoever else tried to run these got invalids after a short time. "Anonymous" needs to find a better use for his machines. https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=108987793 https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=108987602 https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=108989900 EDIT: I suspect what may have happened is that ATLAS continued to run after I had "suspended" it. It looks like ran for the expected time of about 4 1/2 hours (except one short one). That could explain it. |
Send message Joined: 15 Jun 08 Posts: 2413 Credit: 226,473,792 RAC: 131,954 |
... native ATLAS is the only CPU job running). Independent from BOINC. ATLAS native must not be suspended/resumed as it will always start from the scratch. David Cameron explained somewhere (don't find it ATM) that this is by design of the scientific app. If you run ATLAS native inside your own VM, then you may suspend the VM instead. |
©2024 CERN