Message boards :
ATLAS application :
Tasks download 1.9 GB EVNT files
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 15 Jun 08 Posts: 2627 Credit: 266,979,682 RAC: 128,586 ![]() ![]() |
Got a couple of tasks that download 1.9 GB EVNT files (each!). That's a bit large. |
Send message Joined: 18 Dec 15 Posts: 1862 Credit: 130,698,302 RAC: 110,366 ![]() ![]() ![]() |
same here. Plus, the image vdi can get as large as about 5GB (in contrast to 3.2GB so far). Also, these tasks seem to use more RAM. The upload file, however, is about 80GB, i.e. smaller than the others before. Also, the tasks have less runtime than the others. However, my problem is that with my RAMDisk 32GB, I cannot process 4 tasks 3 cores each simultaneously (BOINC would not even let me download more than 3 tasks), so I might have to switch to 3 tasks 4 cores ea. No big deal, but somehow interesting. |
Send message Joined: 14 Jan 10 Posts: 1446 Credit: 9,707,740 RAC: 838 ![]() ![]() |
Suspending such a task with LAIM off may let crash the task because of exceeding BOINC's slot disk limit of 10.000.000.000 bytes. Tested it with 2 tasks. 1 task grew up to 10.979.000.000 bytes and the other task 'only' up to 6.827.000.000. First task upload file 86.400 K |
Send message Joined: 18 Dec 15 Posts: 1862 Credit: 130,698,302 RAC: 110,366 ![]() ![]() ![]() |
Now, none of these tasks with download size 1.9GB are working any longer. 42 seconds after start, they stop, and in the BOINC manager it says "postponed: VM environment needed to be cleaned up". What kind of problem is this now? |
Send message Joined: 14 Jan 10 Posts: 1446 Credit: 9,707,740 RAC: 838 ![]() ![]() |
My 2 tries with normal fast happy end: https://lhcathome.cern.ch/lhcathome/result.php?resultid=330582566 https://lhcathome.cern.ch/lhcathome/result.php?resultid=330582616 |
Send message Joined: 18 Dec 15 Posts: 1862 Credit: 130,698,302 RAC: 110,366 ![]() ![]() ![]() |
Now, none of these tasks with download size 1.9GB are working any longer.well, I opened the Virtual Box Manager, and on the lefthand side I noticed quite a number of tasks which obviously got stuck there, or were not properly deleted after upload (for whatever reason). I removed them all, downloaded new tasks, and they are working well. |
![]() Send message Joined: 15 Jun 08 Posts: 2627 Credit: 266,979,682 RAC: 128,586 ![]() ![]() |
Reported "peak swap sizes" are very variable. Some examples. Different client instances but all are using the same setup. https://lhcathome.cern.ch/lhcathome/result.php?resultid=330582305 34.31 GB (!!) https://lhcathome.cern.ch/lhcathome/result.php?resultid=330581926 2.56 GB Since CMS is currently not running neither CPU nor RAM are under heavy load. |
Send message Joined: 18 Dec 15 Posts: 1862 Credit: 130,698,302 RAC: 110,366 ![]() ![]() ![]() |
Reported "peak swap sizes" are very variable. ...that's interesting, indeed. Maybe it is different with Windows (like in my case) - I now looked up my tasks: in all cases, the value is slightly below 100MB. |
Send message Joined: 27 Sep 08 Posts: 861 Credit: 710,322,089 RAC: 197,431 ![]() ![]() ![]() |
Maybe you get a big peak swap if you quit boinc as you have to save the VM image? the 15 or so I looked though were all less than 100MB. |
Send message Joined: 18 Dec 15 Posts: 1862 Credit: 130,698,302 RAC: 110,366 ![]() ![]() ![]() |
with my 32GB Ramdisk, I now cannot even process two 4-core tasks. Only one is working well. Stderr says: "2021-10-18 20:19:07 (532): VM is no longer is a running state. It is in 'lse, errorID=DevATA_DISKFULL message="Host system reported disk full. VM execution is suspended. You can resume after freeing some space" '. 2021-10-18 20:19:07 (532): VM state change detected. (old = 'Running', new = 'lse, errorID=DevATA_DISKFULL message="Host system reported disk full. VM execution is suspended. You can resume after freeing some space" https://lhcathome.cern.ch/lhcathome/result.php?resultid=330604840 No idea how much disk space this new type of ATLAS tasks now needs. What I also notice: after failing, the vm_image.vdi is not being deleted from the "slots" folder. Hence, no new tasks can be downloaded, due to lack of space. Seemingly, these new tasks are faulty. I will stop crunching ATLAS for the moment. |
Send message Joined: 27 Sep 08 Posts: 861 Credit: 710,322,089 RAC: 197,431 ![]() ![]() ![]() |
I have a few WUs that have an 9GB vm image so these are bigger, maybe with a checkpoint then these can go over 16GB? |
![]() Send message Joined: 15 Jun 08 Posts: 2627 Credit: 266,979,682 RAC: 128,586 ![]() ![]() |
... peak swap if you quit boinc as you have to save the VM image? The (my) BOINC clients in question are running nothing but ATLAS native. Usually 24/7 without suspend/resume and without a BOINC client restart. |
Send message Joined: 18 Dec 15 Posts: 1862 Credit: 130,698,302 RAC: 110,366 ![]() ![]() ![]() |
further, something must be wrong with the credit points calculation: whereas before, for a CPU time of about 14.000 seconds, the credit was around 370, now for the same amount of time, the credit is around 60 :-( |
Send message Joined: 27 Sep 08 Posts: 861 Credit: 710,322,089 RAC: 197,431 ![]() ![]() ![]() |
That was my thought, I run the same so there is no suspend or resume, so could be smaller? |
Send message Joined: 27 Sep 08 Posts: 861 Credit: 710,322,089 RAC: 197,431 ![]() ![]() ![]() |
I assume this is just creditnew being the way that it is. I get the same sort of numbers 30k is 280. |
Send message Joined: 2 May 07 Posts: 2262 Credit: 175,581,097 RAC: 397 ![]() ![]() |
Atlas Simulation needs 998,93 MB more disk space. You currently have 8537 MB. cvmfs_config reload for a CentOS-VM cleared it and the download is starting the other two from four Atlas tasks. 1.9 GByte File is also downloaded, but no new Application of the Atlas-Applet!! With a Downloadspeed because of the squid-Proxy from 0.7 MBit/s instead of 60 Mbit/s 60 min-downloadtime!! raw-file instead of zip?? |
![]() Send message Joined: 15 Jun 08 Posts: 2627 Credit: 266,979,682 RAC: 128,586 ![]() ![]() |
... because of the squid-Proxy ... Slow because of Squid? Surely wrong. Those large files are typical onetimers. This means Squid can't take them from it's caches. Instead each of those files must be downloaded from lhcathome-upload.cern.ch This can be seen in Squid's logfile (-> TCP_MISS:HIER_DIRECT): xxx 3128 - - [19/Oct/2021:08:15:17 +0200] "GET http://lhcathome-upload.cern.ch/lhcathome/download//225/xxx_EVNT.27082874._000014.pool.root.1 HTTP/1.1" 200 2034165623 "-" "BOINC client (x86_64-pc-linux-gnu 7.17.0)" TCP_MISS:HIER_DIRECT Based on my router monitoring I suspect the CERN network can't deliver the files continuously at full speed (it intermittently drops to less than 20 Mbit/s) Nonetheless a download time of 60 min might point out a local bottleneck. |
![]() Send message Joined: 15 Jun 08 Posts: 2627 Credit: 266,979,682 RAC: 128,586 ![]() ![]() |
That was my thought, I run the same so there is no suspend or resume, so could be smaller? Unlike ATLAS vbox ATLAS native doesn't use VirtualBox (hence no snapshot). |
Send message Joined: 2 May 07 Posts: 2262 Credit: 175,581,097 RAC: 397 ![]() ![]() |
Based on my router monitoring I suspect the CERN network can't deliver the files continuously at full speed (it intermittently drops to less than 20 Mbit/s) WCG ignore squid and have on all PC's normal traffic (60 MBit/s). atm a new one with 1.89 GByte max. speed 21 MBit/s on this VM: https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10694634 What's about frontiere?? In this CentOS8 VM running max. 8 WCG ARP or 4 Atlas-VM! NOW 45 min instead of 60 min download. Can this Atlas-Version be stopped from Cern-IT? |
Send message Joined: 2 May 07 Posts: 2262 Credit: 175,581,097 RAC: 397 ![]() ![]() |
I will stop crunching ATLAS for the moment. +1 since one hour. |
©2025 CERN