Message boards : ATLAS application : Tasks download 1.9 GB EVNT files
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1825
Credit: 123,761,719
RAC: 86,212
Message 45446 - Posted: 18 Oct 2021, 7:41:52 UTC

Got a couple of tasks that download 1.9 GB EVNT files (each!).
That's a bit large.
ID: 45446 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1460
Credit: 35,618,040
RAC: 43,679
Message 45447 - Posted: 18 Oct 2021, 7:56:15 UTC - in response to Message 45446.  

same here.
Plus, the image vdi can get as large as about 5GB (in contrast to 3.2GB so far). Also, these tasks seem to use more RAM.
The upload file, however, is about 80GB, i.e. smaller than the others before. Also, the tasks have less runtime than the others.

However, my problem is that with my RAMDisk 32GB, I cannot process 4 tasks 3 cores each simultaneously (BOINC would not even let me download more than 3 tasks), so I might have to switch to 3 tasks 4 cores ea. No big deal, but somehow interesting.
ID: 45447 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1046
Credit: 6,603,873
RAC: 275
Message 45448 - Posted: 18 Oct 2021, 9:59:33 UTC
Last modified: 18 Oct 2021, 10:03:08 UTC

Suspending such a task with LAIM off may let crash the task because of exceeding BOINC's slot disk limit of 10.000.000.000 bytes.
Tested it with 2 tasks. 1 task grew up to 10.979.000.000 bytes and the other task 'only' up to 6.827.000.000.
First task upload file 86.400 K
ID: 45448 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1460
Credit: 35,618,040
RAC: 43,679
Message 45449 - Posted: 18 Oct 2021, 10:34:37 UTC

Now, none of these tasks with download size 1.9GB are working any longer.
42 seconds after start, they stop, and in the BOINC manager it says "postponed: VM environment needed to be cleaned up".
What kind of problem is this now?
ID: 45449 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1046
Credit: 6,603,873
RAC: 275
Message 45450 - Posted: 18 Oct 2021, 11:24:13 UTC - in response to Message 45449.  

ID: 45450 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1460
Credit: 35,618,040
RAC: 43,679
Message 45451 - Posted: 18 Oct 2021, 11:57:44 UTC - in response to Message 45449.  

Now, none of these tasks with download size 1.9GB are working any longer.
42 seconds after start, they stop, and in the BOINC manager it says "postponed: VM environment needed to be cleaned up".
What kind of problem is this now?
well, I opened the Virtual Box Manager, and on the lefthand side I noticed quite a number of tasks which obviously got stuck there, or were not properly deleted after upload (for whatever reason).
I removed them all, downloaded new tasks, and they are working well.
ID: 45451 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1825
Credit: 123,761,719
RAC: 86,212
Message 45452 - Posted: 18 Oct 2021, 15:48:57 UTC

Reported "peak swap sizes" are very variable.


Some examples.
Different client instances but all are using the same setup.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=330582305
34.31 GB (!!)

https://lhcathome.cern.ch/lhcathome/result.php?resultid=330581926
2.56 GB


Since CMS is currently not running neither CPU nor RAM are under heavy load.
ID: 45452 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1460
Credit: 35,618,040
RAC: 43,679
Message 45453 - Posted: 18 Oct 2021, 16:33:21 UTC - in response to Message 45452.  

Reported "peak swap sizes" are very variable. ...
that's interesting, indeed.
Maybe it is different with Windows (like in my case) - I now looked up my tasks: in all cases, the value is slightly below 100MB.
ID: 45453 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 690
Credit: 434,765,871
RAC: 106,061
Message 45455 - Posted: 18 Oct 2021, 16:47:07 UTC

Maybe you get a big peak swap if you quit boinc as you have to save the VM image? the 15 or so I looked though were all less than 100MB.
ID: 45455 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1460
Credit: 35,618,040
RAC: 43,679
Message 45457 - Posted: 18 Oct 2021, 18:37:26 UTC

with my 32GB Ramdisk, I now cannot even process two 4-core tasks. Only one is working well.

Stderr says:
"2021-10-18 20:19:07 (532): VM is no longer is a running state. It is in 'lse, errorID=DevATA_DISKFULL message="Host system reported disk full. VM execution is suspended. You can resume after freeing some space"
'.
2021-10-18 20:19:07 (532): VM state change detected. (old = 'Running', new = 'lse, errorID=DevATA_DISKFULL message="Host system reported disk full. VM execution is suspended. You can resume after freeing some space"

https://lhcathome.cern.ch/lhcathome/result.php?resultid=330604840

No idea how much disk space this new type of ATLAS tasks now needs.
What I also notice: after failing, the vm_image.vdi is not being deleted from the "slots" folder. Hence, no new tasks can be downloaded, due to lack of space.

Seemingly, these new tasks are faulty. I will stop crunching ATLAS for the moment.
ID: 45457 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 690
Credit: 434,765,871
RAC: 106,061
Message 45458 - Posted: 18 Oct 2021, 18:48:01 UTC - in response to Message 45457.  

I have a few WUs that have an 9GB vm image so these are bigger, maybe with a checkpoint then these can go over 16GB?
ID: 45458 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1825
Credit: 123,761,719
RAC: 86,212
Message 45459 - Posted: 18 Oct 2021, 20:00:25 UTC - in response to Message 45455.  

... peak swap if you quit boinc as you have to save the VM image?

The (my) BOINC clients in question are running nothing but ATLAS native.
Usually 24/7 without suspend/resume and without a BOINC client restart.
ID: 45459 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1460
Credit: 35,618,040
RAC: 43,679
Message 45460 - Posted: 18 Oct 2021, 20:06:25 UTC

further, something must be wrong with the credit points calculation:

whereas before, for a CPU time of about 14.000 seconds, the credit was around 370, now for the same amount of time, the credit is around 60 :-(
ID: 45460 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 690
Credit: 434,765,871
RAC: 106,061
Message 45461 - Posted: 18 Oct 2021, 22:03:01 UTC - in response to Message 45459.  

That was my thought, I run the same so there is no suspend or resume, so could be smaller?
ID: 45461 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 690
Credit: 434,765,871
RAC: 106,061
Message 45462 - Posted: 18 Oct 2021, 22:04:46 UTC - in response to Message 45460.  

I assume this is just creditnew being the way that it is. I get the same sort of numbers 30k is 280.
ID: 45462 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1311
Credit: 39,796,455
RAC: 18,095
Message 45464 - Posted: 18 Oct 2021, 23:28:31 UTC
Last modified: 19 Oct 2021, 0:14:27 UTC

Atlas Simulation needs 998,93 MB more disk space.
You currently have 8537 MB.
cvmfs_config reload for a CentOS-VM cleared it and the download is starting the other two from four Atlas tasks.
1.9 GByte File is also downloaded, but no new Application of the Atlas-Applet!!
With a Downloadspeed because of the squid-Proxy from 0.7 MBit/s instead of 60 Mbit/s 60 min-downloadtime!!
raw-file instead of zip??
ID: 45464 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1825
Credit: 123,761,719
RAC: 86,212
Message 45465 - Posted: 19 Oct 2021, 6:47:32 UTC - in response to Message 45464.  

... because of the squid-Proxy ...

Slow because of Squid?
Surely wrong.
Those large files are typical onetimers.
This means Squid can't take them from it's caches.
Instead each of those files must be downloaded from lhcathome-upload.cern.ch
This can be seen in Squid's logfile (-> TCP_MISS:HIER_DIRECT):
xxx 3128 - - [19/Oct/2021:08:15:17 +0200] "GET http://lhcathome-upload.cern.ch/lhcathome/download//225/xxx_EVNT.27082874._000014.pool.root.1 HTTP/1.1" 200 2034165623 "-" "BOINC client (x86_64-pc-linux-gnu 7.17.0)" TCP_MISS:HIER_DIRECT



Based on my router monitoring I suspect the CERN network can't deliver the files continuously at full speed (it intermittently drops to less than 20 Mbit/s)
Nonetheless a download time of 60 min might point out a local bottleneck.
ID: 45465 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1825
Credit: 123,761,719
RAC: 86,212
Message 45466 - Posted: 19 Oct 2021, 7:14:52 UTC - in response to Message 45461.  

That was my thought, I run the same so there is no suspend or resume, so could be smaller?

Unlike ATLAS vbox ATLAS native doesn't use VirtualBox (hence no snapshot).
ID: 45466 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1311
Credit: 39,796,455
RAC: 18,095
Message 45467 - Posted: 19 Oct 2021, 7:26:01 UTC - in response to Message 45465.  
Last modified: 19 Oct 2021, 8:12:00 UTC

Based on my router monitoring I suspect the CERN network can't deliver the files continuously at full speed (it intermittently drops to less than 20 Mbit/s)
Nonetheless a download time of 60 min might point out a local bottleneck.

WCG ignore squid and have on all PC's normal traffic (60 MBit/s).
atm a new one with 1.89 GByte max. speed 21 MBit/s on this VM:
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10694634
What's about frontiere??
In this CentOS8 VM running max. 8 WCG ARP or 4 Atlas-VM!
NOW 45 min instead of 60 min download.
Can this Atlas-Version be stopped from Cern-IT?
ID: 45467 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1311
Credit: 39,796,455
RAC: 18,095
Message 45468 - Posted: 19 Oct 2021, 11:17:28 UTC - in response to Message 45457.  

I will stop crunching ATLAS for the moment.

+1 since one hour.
ID: 45468 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : ATLAS application : Tasks download 1.9 GB EVNT files


©2021 CERN