New app version 1.01

Author	Message
HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 29251 - Posted: 14 Mar 2017, 14:16:37 UTC A 3-core ATLAS did well this morning, but thereafter I wanted to run a single core and that one died early. This is an interesting observation. I always run ATLAS on 1 core. I will try on 2 to see if it makes a difference. So, it may help, if you suspend / de-select other VM-Subprojects for a while and test, how this works. And I will try that suggestion as well. We are the product of random evolution. ID: 29251 · Reply Quote

Yeti Volunteer moderator Send message Joined: 2 Sep 04 Posts: 468 Credit: 215,197,406 RAC: 1,083	Message 29252 - Posted: 14 Mar 2017, 14:54:54 UTC - in response to Message 29247. I think there may still be some other issue. All ATLAS WUs still fail on my machine, and the following 2 WUs also failed on another machine: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=60483240 https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=60460521 Maybe this is the root cause of a problem that lead to the servers being overloaded over the WE. The second WU (bold) has been finished by one of my machines and it has got validated Supporting BOINC, a great concept ! ID: 29252 · Reply Quote

David Cameron Project administrator Project developer Project scientist Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0	Message 29254 - Posted: 14 Mar 2017, 15:14:13 UTC - in response to Message 29252. Last modified: 14 Mar 2017, 15:15:52 UTC Yeti, you are using an app_config.xml to set the cores and memory for ATLAS, right? I wonder if the memory size is too small for one and two cores so that's why they don't succeed. I'm running 4-core tasks on my machine and have 100% success since yesterday afternoon. I'll try running one and two cores to see if I see any problems. PS I also run only ATLAS. ID: 29254 · Reply Quote

Yeti Volunteer moderator Send message Joined: 2 Sep 04 Posts: 468 Credit: 215,197,406 RAC: 1,083	Message 29256 - Posted: 14 Mar 2017, 15:21:53 UTC - in response to Message 29254. Yeti, you are using an app_config.xml to set the cores and memory for ATLAS, right? I wonder if the memory size is too small for one and two cores so that's why they don't succeed. I'm running 4-core tasks on my machine and have 100% success since yesterday afternoon. I'll try running one and two cores to see if I see any problems. Yes, I'm using app_config.xml to set up cores and memory. As my machines have plenty of RAM, I'm offering ATLAS-VM a generous RAM-Equipment: 3-Core WU: 5.000 MB 4-Core WU: 7.500 MB 5-Core WU: 7.500 MB --------------------------------------- I'm thinking about if it could have to do something with the processor, you remember that I have one machine that could only crunch Single-Core-WUs. Maybe I should test this one again here at Atlas@LHC Supporting BOINC, a great concept ! ID: 29256 · Reply Quote

HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 29257 - Posted: 14 Mar 2017, 15:35:33 UTC Last modified: 14 Mar 2017, 15:36:06 UTC I wonder if the memory size is too small for one and two cores so that's why they don't succeed. I am currently running 2 2-core ATLAS tasks, one with default RAM assigned by server (3400 MB) and one manually forced through app_config at 5000 MB. We are the product of random evolution. ID: 29257 · Reply Quote

HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 29258 - Posted: 14 Mar 2017, 15:46:47 UTC Last modified: 14 Mar 2017, 15:51:08 UTC The task with default memory size at 3400 MB failed with this error that may be related to a lack of memory "FATAL makePool failed": https://lhcathome.cern.ch/lhcathome/result.php?resultid=126128746 The other task has passed the first 20 minutes, which is a good sign :). We are the product of random evolution. ID: 29258 · Reply Quote

David Cameron Project administrator Project developer Project scientist Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0	Message 29260 - Posted: 14 Mar 2017, 16:02:18 UTC - in response to Message 29258. After switching to 1 core I got 100% failures :( I was able to log into the VM and catch the log messages, indeed it is a problem of running out of memory. I have increased the memory formula to 1.6 + 1 * ncores. ID: 29260 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1470 Credit: 9,929,293 RAC: 1,497	Message 29262 - Posted: 14 Mar 2017, 17:14:13 UTC - in response to Message 29260. After switching to 1 core I got 100% failures :( .... I have increased the memory formula to 1.6 + 1 * ncores. All my single cores failed too. 1.4GB+1GB = 2.4GB Now running 2 dual 1.4GB+2*1GB = 3.4GB and should fail too as like HerveUAE wrote . . . . wait .. .. .. Indeed they failed! My successful task this morning was a 3-core with 4.4GB. I'll use now David's new formula for dual-cores. When they seems to succeed I'll try the single core one again with 2.6GB ID: 29262 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1470 Credit: 9,929,293 RAC: 1,497	Message 29263 - Posted: 14 Mar 2017, 17:30:25 UTC - in response to Message 29262. I'll use now David's new formula for dual-cores. 3600MB for a dual core seems not enough: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126128929 ID: 29263 · Reply Quote

HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 29266 - Posted: 14 Mar 2017, 18:32:16 UTC And 3800 MB for a dual core seems not enough neither: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126129022[/url] ID: 29266 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1470 Credit: 9,929,293 RAC: 1,497	Message 29267 - Posted: 14 Mar 2017, 18:53:49 UTC - in response to Message 29266. Neither 4000MB https://lhcathome.cern.ch/lhcathome/result.php?resultid=126130618 ID: 29267 · Reply Quote

peterfilla Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0	Message 29268 - Posted: 14 Mar 2017, 19:09:22 UTC Only 60 sec. for the checkpoint-interval ??? That should be more I think - perhaps . . . . . ID: 29268 · Reply Quote

HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 29269 - Posted: 14 Mar 2017, 19:15:31 UTC If I remember well, the previous formula was 1,4 GB + (NumberOfCores) * 0,8 GB, which apparently worked for 3-core tasks = 3800 MB. Why now 4000 MB would not be sufficient for 2-core tasks? Aren't we running the same 1.01 version on the same data set? We are the product of random evolution. ID: 29269 · Reply Quote

HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 29270 - Posted: 14 Mar 2017, 19:20:55 UTC My successful task this morning was a 3-core with 4.4GB. Maybe 4.4 GB is the good value, flat for any number of cores. We are the product of random evolution. ID: 29270 · Reply Quote

Yeti Volunteer moderator Send message Joined: 2 Sep 04 Posts: 468 Credit: 215,197,406 RAC: 1,083	Message 29271 - Posted: 14 Mar 2017, 19:25:12 UTC Perhaps you try first with my figures and then go down: 3-Core WU: 5.000 MB 4-Core WU: 7.500 MB 5-Core WU: 7.500 MB These are well proven Supporting BOINC, a great concept ! ID: 29271 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1470 Credit: 9,929,293 RAC: 1,497	Message 29273 - Posted: 14 Mar 2017, 20:20:19 UTC I have finally a dual-core running with 4300MB of RAM. I proved earlier that a 3-core VM ran with 4400MB, maybe also with 4300MB? ID: 29273 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1470 Credit: 9,929,293 RAC: 1,497	Message 29274 - Posted: 14 Mar 2017, 21:14:53 UTC - in response to Message 29273. I have finally a dual-core running with 4300MB of RAM. I proved earlier that a 3-core VM ran with 4400MB, maybe also with 4300MB? Both 2-core and 3-core VM's will run, but the dual core is only using 1 core and the 3-core is only using 2 cores both with 4300MB RAM, where the 3-core with 4400MB RAM was using 3 cores. ID: 29274 · Reply Quote

David Cameron Project administrator Project developer Project scientist Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0	Message 29275 - Posted: 14 Mar 2017, 21:26:02 UTC My single core task with 2.6GB finished ok: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126128857 I am not sure the makepool error is caused by memory, all the failures I was getting were with "EVNTtoHITS got a SIGKILL signal (exit code 137)" where the kernel was killing the ATLAS process for using too much memory, eg https://lhcathome.cern.ch/lhcathome/result.php?resultid=126128854 In general I would recommend running more cores (4 to 8) as this is more efficient. But, as others have discovered, credit is based mainly on running time so using fewer cores guarantees more credit. So it's up to you whether you care about efficiency or credit :) ID: 29275 · Reply Quote

HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 29280 - Posted: 15 Mar 2017, 1:52:46 UTC Both 2-core and 3-core VM's will run, but the dual core is only using 1 core and the 3-core is only using 2 cores both with 4300MB RAM, I have a couple of 2-core WUs that completed OK with 4400 MB. The tasks were using 2 cores during their "full speed" phase of the execution: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126131160 In general I would recommend running more cores (4 to 8) as this is more efficient. My own observation is that 1-core are more efficient than 2 or more cores, assuming you have sufficient RAM on your machine. The reason is simple: ATLAS tasks take very long to start-up and reach their "full speed", typically a minimum of 20 minutes on my machines, and up to 30 minutes or more. During that period, all the cores allocated to the task are not working. The more cores you use, the more CPU time is un-used proportionally, and your overall productivity in the number of tasks completed in one day is decreasing. But maybe this is because of the bandwidth I have from home, or Internet latency between my UAE ISP and the ATLAS servers. credit is based mainly on running time so using fewer cores guarantees more credit. This is indeed what I observed over a short period of time, but not after a week or 2. The credit allocation slowly adjusts itself to the number of cores you use. Well, at least this is what I observed at ATLAS@Home. Credit calculation at LHC@Home may be different. We are the product of random evolution. ID: 29280 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2711 Credit: 292,627,777 RAC: 148,514	Message 29287 - Posted: 15 Mar 2017, 9:24:13 UTC - in response to Message 29280. My standard suggestion in this case: Think about to use a proxy, e.g. squid. ATLAS WUs generate between 1000 and 2000 HTTP requests at startup to fill the local CVMFS or get data from the frontier caches. A proxy with typically 128 MB RAM and 25 GB disk would serve more than 90% of those requests and 50% of the data volume. As a result the startup times drop down to 5-12 minutes. ID: 29287 · Reply Quote

LHC@home