Message boards :
ATLAS application :
New app version 1.01
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
A 3-core ATLAS did well this morning, but thereafter I wanted to run a single core and that one died early. This is an interesting observation. I always run ATLAS on 1 core. I will try on 2 to see if it makes a difference. So, it may help, if you suspend / de-select other VM-Subprojects for a while and test, how this works. And I will try that suggestion as well. We are the product of random evolution. |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,464,258 RAC: 4,895 |
I think there may still be some other issue. All ATLAS WUs still fail on my machine, and the following 2 WUs also failed on another machine: The second WU (bold) has been finished by one of my machines and it has got validated Supporting BOINC, a great concept ! |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
Yeti, you are using an app_config.xml to set the cores and memory for ATLAS, right? I wonder if the memory size is too small for one and two cores so that's why they don't succeed. I'm running 4-core tasks on my machine and have 100% success since yesterday afternoon. I'll try running one and two cores to see if I see any problems. PS I also run only ATLAS. |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,464,258 RAC: 4,895 |
Yeti, you are using an app_config.xml to set the cores and memory for ATLAS, right? I wonder if the memory size is too small for one and two cores so that's why they don't succeed. I'm running 4-core tasks on my machine and have 100% success since yesterday afternoon. I'll try running one and two cores to see if I see any problems. Yes, I'm using app_config.xml to set up cores and memory. As my machines have plenty of RAM, I'm offering ATLAS-VM a generous RAM-Equipment: 3-Core WU: 5.000 MB 4-Core WU: 7.500 MB 5-Core WU: 7.500 MB --------------------------------------- I'm thinking about if it could have to do something with the processor, you remember that I have one machine that could only crunch Single-Core-WUs. Maybe I should test this one again here at Atlas@LHC Supporting BOINC, a great concept ! |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
I wonder if the memory size is too small for one and two cores so that's why they don't succeed. I am currently running 2 2-core ATLAS tasks, one with default RAM assigned by server (3400 MB) and one manually forced through app_config at 5000 MB. We are the product of random evolution. |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
The task with default memory size at 3400 MB failed with this error that may be related to a lack of memory "FATAL makePool failed": https://lhcathome.cern.ch/lhcathome/result.php?resultid=126128746 The other task has passed the first 20 minutes, which is a good sign :). We are the product of random evolution. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
After switching to 1 core I got 100% failures :( I was able to log into the VM and catch the log messages, indeed it is a problem of running out of memory. I have increased the memory formula to 1.6 + 1 * ncores. |
Send message Joined: 14 Jan 10 Posts: 1274 Credit: 8,480,242 RAC: 2,028 |
After switching to 1 core I got 100% failures :( All my single cores failed too. 1.4GB+1GB = 2.4GB Now running 2 dual 1.4GB+2*1GB = 3.4GB and should fail too as like HerveUAE wrote . . . . wait .. .. .. Indeed they failed! My successful task this morning was a 3-core with 4.4GB. I'll use now David's new formula for dual-cores. When they seems to succeed I'll try the single core one again with 2.6GB |
Send message Joined: 14 Jan 10 Posts: 1274 Credit: 8,480,242 RAC: 2,028 |
I'll use now David's new formula for dual-cores. 3600MB for a dual core seems not enough: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126128929 |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
And 3800 MB for a dual core seems not enough neither: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126129022[/url] |
Send message Joined: 14 Jan 10 Posts: 1274 Credit: 8,480,242 RAC: 2,028 |
|
Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0 |
Only 60 sec. for the checkpoint-interval ??? That should be more I think - perhaps . . . . . |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
If I remember well, the previous formula was 1,4 GB + (NumberOfCores) * 0,8 GB, which apparently worked for 3-core tasks = 3800 MB. Why now 4000 MB would not be sufficient for 2-core tasks? Aren't we running the same 1.01 version on the same data set? We are the product of random evolution. |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
My successful task this morning was a 3-core with 4.4GB. Maybe 4.4 GB is the good value, flat for any number of cores. We are the product of random evolution. |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,464,258 RAC: 4,895 |
|
Send message Joined: 14 Jan 10 Posts: 1274 Credit: 8,480,242 RAC: 2,028 |
I have finally a dual-core running with 4300MB of RAM. I proved earlier that a 3-core VM ran with 4400MB, maybe also with 4300MB? |
Send message Joined: 14 Jan 10 Posts: 1274 Credit: 8,480,242 RAC: 2,028 |
I have finally a dual-core running with 4300MB of RAM. Both 2-core and 3-core VM's will run, but the dual core is only using 1 core and the 3-core is only using 2 cores both with 4300MB RAM, where the 3-core with 4400MB RAM was using 3 cores. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
My single core task with 2.6GB finished ok: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126128857 I am not sure the makepool error is caused by memory, all the failures I was getting were with "EVNTtoHITS got a SIGKILL signal (exit code 137)" where the kernel was killing the ATLAS process for using too much memory, eg https://lhcathome.cern.ch/lhcathome/result.php?resultid=126128854 In general I would recommend running more cores (4 to 8) as this is more efficient. But, as others have discovered, credit is based mainly on running time so using fewer cores guarantees more credit. So it's up to you whether you care about efficiency or credit :) |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
Both 2-core and 3-core VM's will run, but the dual core is only using 1 core and the 3-core is only using 2 cores both with 4300MB RAM, I have a couple of 2-core WUs that completed OK with 4400 MB. The tasks were using 2 cores during their "full speed" phase of the execution: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126131160 In general I would recommend running more cores (4 to 8) as this is more efficient. My own observation is that 1-core are more efficient than 2 or more cores, assuming you have sufficient RAM on your machine. The reason is simple: ATLAS tasks take very long to start-up and reach their "full speed", typically a minimum of 20 minutes on my machines, and up to 30 minutes or more. During that period, all the cores allocated to the task are not working. The more cores you use, the more CPU time is un-used proportionally, and your overall productivity in the number of tasks completed in one day is decreasing. But maybe this is because of the bandwidth I have from home, or Internet latency between my UAE ISP and the ATLAS servers. credit is based mainly on running time so using fewer cores guarantees more credit. This is indeed what I observed over a short period of time, but not after a week or 2. The credit allocation slowly adjusts itself to the number of cores you use. Well, at least this is what I observed at ATLAS@Home. Credit calculation at LHC@Home may be different. We are the product of random evolution. |
Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,517,991 RAC: 124,188 |
My standard suggestion in this case: Think about to use a proxy, e.g. squid. ATLAS WUs generate between 1000 and 2000 HTTP requests at startup to fill the local CVMFS or get data from the frontier caches. A proxy with typically 128 MB RAM and 25 GB disk would serve more than 90% of those requests and 50% of the data volume. As a result the startup times drop down to 5-12 minutes. |
©2024 CERN