Message boards :
Number crunching :
Ton of Atlas Validate Errors
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 Sep 04 Posts: 102 Credit: 7,224,140 RAC: 5,492 |
over the last couple of weeks, about 80% of my Atlas tasks wind up with a validate error and several have run for more than a day a half, while stuck on 99% (2.5 - 3 hour average task length for the ones that do complete and verify) I've made no changes on my end at all Anyone else seeing strange things with Atlas? |
Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,319,021 RAC: 123,225 |
over the last couple of weeks, about 80% of my Atlas tasks wind up with a validate error and several have run for more than a day a half, while stuck on 99% (2.5 - 3 hour average task length for the ones that do complete and verify) It looks like something got scrambled on your host. The reason may be that some resources are overloaded, e.g. RAM or disk IO. You may try the following steps one after the other. If you get a stable setup, skip the rest. 1. Reboot your host. 2. Reduce the number of concurrently running VMs. 3. Reset the project to get a fresh vdi file. |
Send message Joined: 24 Oct 04 Posts: 1115 Credit: 49,703,010 RAC: 13,795 |
keputnam, looking at your stderr for those I can almost guarantee it is that you had an ISP problem. These tasks are known to do what yours did if you have a throttled down internet connection that is not able to start the task properly (I have run thousands of them and that is what always caused this to happen) Maybe your ISP throttled you down to a slower speed or you just had a bad/slow day with the ISP that you have. You will see this in your stderr when it is an internet problem on your end. TThe hlea slta s1t0 1l0i nleisn eosf otfh et hpei lpoitl olto gl.og ERROR: Missing metadata.xml CCooppyyiinngg iinnppuutt ffiilleess iinnttoo RRuunnAAttllaass.. Guest Log: CCooppyyiinngg iinnppuutt ffiilleess iinnttoo RRuunnAAttllaass.. Guest Log: Copied input files into RunAtlas. Guest Log: Copied input files into RunAtlas. The task gets all messed up and at times will actually run to complete and THEN give you the Invalid. You also got ERROR Validation of return code failed: Non-zero return code from EVNTtoHITS which was not on your end and was supposed to be fixed with a new version the other day and I think that maybe that new version of Atlas will be here soon (ok we did find a new problem with it a few hours ago but I hope that get fixed too) BUT as I mentioned you are having an ISP problem on your end with the Atlas tasks so maybe for now you should just try some during the off-hours and see if that works or run different tasks (Theory) I am at PDT and when I first figured this out a couple months ago I started the tasks after midnight and they would start up and after they run for 30 minutes then the internet speed doesn't matter (I even tested them with mine unplugged) Volunteer Mad Scientist For Life |
Send message Joined: 27 Sep 04 Posts: 102 Credit: 7,224,140 RAC: 5,492 |
Thanks,Guys My ISP swears that they haven't throttled me, but I am about to upgrade/expand my system disk I've noticed the access light on solidly for quite a while several times I also have Atlas sefor only 1 job at a time Any thoughts on the extremely long running ones? |
Send message Joined: 27 Sep 04 Posts: 102 Credit: 7,224,140 RAC: 5,492 |
Finally convinced ATT to come out and look at my router When I upgraded service a few months ago, they swapped out the old one, and the tech didn't configure the new one properly Connect speeds to the router itself are much better and actual throughput is almost an order of magnitude faster Almost a whole day with no verify errors! |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 |
Almost a whole day with no verify errors! Good news! |
Send message Joined: 15 Jun 16 Posts: 1 Credit: 34,845 RAC: 0 |
Hi there. As I look through my tasks I have found many with 'Validate error' status. Is there something I should do? :) |
Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,319,021 RAC: 123,225 |
kraljb wrote: Hi there. As I look through my tasks I have found many with 'Validate error' status. Is there something I should do? :) Your VMs need more RAM. The "official" RAM setting for a 1-core VM is 3400 MB. This is also the configuration you get from the project server. Nonetheless the current ATLAS batch obviously needs more RAM during it's initial phase. You may use a local app_config.xml to rise the RAM setting. 5000 MB should work in any case, less (4600-4800 MB) may also be enough but there is no guarantee. A sample app_config.xml looks like this: <app_config> Reload the local setting and start a new WU as the new setting becomes active only for freshly started VMs. |
©2024 CERN