Message boards :
ATLAS application :
Some Validate errors
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 ![]() ![]() |
Hi , Jim , when i watch at your task list , i notice that some of them finished at the same time curiously : Philippe, I think that just means that they were uploaded at the same time. They actually finished on my machine a few minutes apart, as you can see from the BoincTasks History log: Ga8LDmCc3FqnSu7Ccp2YYBZmABFKDmABFKDm3INKDmYMSKDmcEz7Ao_0 04:06:47 (03:47:52) 4/6/2017 6:50:43 AM 4/6/2017 7:36:29 AM JMhKDmE50FqnSu7Ccp2YYBZmABFKDmABFKDm3INKDmAHSKDmK2i9im_0 04:00:06 (03:41:22) 4/6/2017 6:31:38 AM 4/6/2017 7:36:29 AM So I don't think there is a problem. Regards |
Send message Joined: 18 Dec 15 Posts: 1571 Credit: 66,708,381 RAC: 166,971 ![]() ![]() ![]() |
since last night, none of the finished and uploaded tasks are being validated - on the task page, the column "points" says "pending". what's the problem? |
Send message Joined: 18 Dec 15 Posts: 1571 Credit: 66,708,381 RAC: 166,971 ![]() ![]() ![]() |
what's the problem? problem obviously solved - all uploaded tasks were validated |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 ![]() ![]() |
I found my mistake , the time in the task list focus on the validation moment by the server. The wu is downloaded from the server with a sending time, given in task list. The wu starts at another time in boinc client ; the starting time given in the log. The wu is crunched during the elapsed time. The wu ends at the ending time, given in the log. The wu is uploaded to the server. The wu is stored on the server side, untill the project update by boinc client. The wus are sorted and grouped by host and validated at the time, given in the task list. So it seems normal that several wus can be validated at the same time if the time laps between two project updates with boinc client and the server is longer than the duration of a wu for a small host or for delayed wus for a big host with many simultaneous tasks. (Sorry , but i never pass out). I just wanted to understand if the fact that several wus , validated at the same time , would have an influence on the credit earned, with credit new system whose rules are dynamics,according to the size of the hosts used and the difficulty of some jobs given particularly. |
Send message Joined: 2 May 07 Posts: 1752 Credit: 136,500,670 RAC: 31,259 ![]() ![]() ![]() |
Four PC have problems with start of this workunit: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=73498862 |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 450 Credit: 171,090,119 RAC: 27,458 ![]() ![]() ![]() |
Four PC have problems with start of this workunit: One of these machines is my TOP-Cruncher, so I'm shure it is not the machine. Either a problem of the WU or with a Server-Backend or both ![]() Supporting BOINC, a great concept ! |
Send message Joined: 16 May 14 Posts: 11 Credit: 7,304,836 RAC: 16 ![]() ![]() |
ALL my tasks keep failing with Validate error - WHY |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 450 Credit: 171,090,119 RAC: 27,458 ![]() ![]() ![]() |
ALL my tasks keep failing with Validate error - WHY Take a walk through my checklist and keep an eye especially on Point Nr 2 ![]() Supporting BOINC, a great concept ! |
![]() Send message Joined: 15 Jun 08 Posts: 2181 Credit: 185,482,545 RAC: 186,737 ![]() ![]() ![]() |
mrchips wrote: ALL my tasks keep failing with Validate error - WHY The recent batch may probably need more than the automatically configured 4200 MB RAM for a 2-core WU. You may set 5000 MB via app_config.xml. Beside that it's of course a good advise to check your system according to Yeti's checklist. |
Send message Joined: 18 Dec 15 Posts: 1571 Credit: 66,708,381 RAC: 166,971 ![]() ![]() ![]() |
The recent batch may probably need more than the automatically configured 4200 MB RAM for a 2-core WU. hm, in view of the fact that currently even a 1-core ATLAS WU uses up almost 5000 MB RAM, I would guess that a 2-core WU needs a value beyond this. I might try 2-core WUs tomorrow, so I'll see. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 ![]() ![]() |
The recent batch may probably need more than the automatically configured 4200 MB RAM for a 2-core WU. I set my RAM for 5000 MB via the app_config.xml, and also switched to two CPU cores per task, and no problems since then (all errors before). https://lhcathome.cern.ch/lhcathome/results.php?hostid=10477864&offset=0&show_names=0&state=0&appid=14 In BoincTasks, the real memory usage shows up as 4200 MB for the two-core task. |
![]() Send message Joined: 15 Jun 08 Posts: 2181 Credit: 185,482,545 RAC: 186,737 ![]() ![]() ![]() |
As of my understanding there is a RAM usage peak during the initialisation phase of the ATLAS VM. Some of the startup scripts fail if there is not enough RAM and a watchdog shuts down the VM. During the calculation phase the RAM requirement seems to be much lower. VMs that are configured to use more CPU cores have enough RAM to also cover the initialisation phase but 1-core (sometimes 2-core) VMs need a special RAM management, probably during batch generation. But, that's only a guess derived from the error logs. It would be nice if somebody from the project team could post an explanation, e.g. if there is a checklist for batch generation or some quality checks. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 ![]() ![]() |
VMs that are configured to use more CPU cores have enough RAM to also cover the initialisation phase but I am trying the single CPU core tasks again (still with 5000 MB memory). The reason is that the two-core tasks do not honor the core reservation that I have set up for my GPU in an app_config.xml for GPU Grid. That is, ATLAS will run four tasks and once, while the GPU also uses a core, so that nine cores are being allocated. That results in slow running on the GPU due to CPU starvation, not a desirable condition. If I encounter memory problems with this configuration, I can limit the number of ATLAS tasks via an app_config.xml, but that may lead to scheduling problems as you probably know, and I would prefer to avoid it. But unless several ATLAS tasks are in start-up phase all at the same time, it should not be a problem with 32 GB of memory. |
Send message Joined: 18 Dec 15 Posts: 1571 Credit: 66,708,381 RAC: 166,971 ![]() ![]() ![]() |
computezrmle wrote: As of my understanding there is a RAM usage peak during the initialisation phase of the ATLAS VM. hm, honestly, I have never made this observation, although having run 1-core, 2-core and 3-core ATLAS tasks during the course of time for more about 1 1/2 years now. What I watch is that there is a rather quick increase in RAM usage right from the beginning on, than the increase becomes more flat, and stays at a pretty much even level till the task is finished. |
Send message Joined: 18 Dec 15 Posts: 1571 Credit: 66,708,381 RAC: 166,971 ![]() ![]() ![]() |
Jim1348 wrote: I am trying the single CPU core tasks again (still with 5000 MB memory). The reason is that the two-core tasks do not honor the core reservation that I have set up for my GPU in an app_config.xml for GPU Grid. That is, ATLAS will run four tasks and once, while the GPU also uses a core, so that nine cores are being allocated. That results in slow running on the GPU due to CPU starvation, not a desirable condition. hm, this sounds interesting. I am also running LHC tasks and GPUGRID tasks concurrently on 2 of my PCs, but never had any problems with CPU core Reservation/allocation. Only for LHC I use various app_config.xml's (depending on what configuration I'l like to run), but for GPUGRID I never had an app_config.xml. For example, my main PC: 12-core CPU (= 6 core + 6 HT), 32 GB RAM. 2 GPUs, 1 GPUGRID task running on each (totally automatically, no need for an app_config.xml) Currently, 4 2-core ATLAS tasks also running (app_config.xml for higher RAM). This uses a total of 10 CPU cores out of my 12 cores, percentage of total CPU usage is shown as ~ 86%. Total RAM usage (once the ATLAS tasks are running beyond startup phase): ~ 22,5 GB. What surprises me, regarding RAM usage, is the following: When I ran 1-core ATLAS tasks (till yesterday), I had to increase RAM availability per task to 5000MB (via app_config.xml), otherwise the tasks failed after 10-14 minutes. From what I could measure, each task indeed used close to 5000MB. Now the 2-core ATLAS tasks use not more than slightly above 5000MB each, whereas I would have expected (based on how much the 1-core tasks had needed) some value between 7000 and 7500MB. This is really strange. |
![]() ![]() Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 ![]() ![]() |
The initialisation and end phases of ATLAS tasks are single core, while the calculation phase is multicore. If the highest RAM requirements are from the initialisation phase, then it is logical to have similar RAM requirements for 1-core and 2-core tasks. We are the product of random evolution. |
![]() Send message Joined: 15 Jun 08 Posts: 2181 Credit: 185,482,545 RAC: 186,737 ![]() ![]() ![]() |
Erich56 wrote: What I watch is that there is a rather quick increase in RAM usage right from the beginning on, than the increase becomes more flat, and stays at a pretty much even level till the task is finished. There are different perspectives. (1) from outside the VM if you keep an eye on the host's monitoring apps (2) from inside the VM if you look at the top console (which we still don't have in ATLAS) IIRC (1) shows the maximum amount of RAM a VM has allocated since startup as VirtualBox would never give it back to the OS. I have no evidence for my previous guess. It's all speculation but it's obvious that (much) more RAM given to the VM avoids errors with the recent batch. HerveUAE wrote: The initialisation and end phases of ATLAS tasks are single core, while the calculation phase is multicore. If the highest RAM requirements are from the initialisation phase, then it is logical to have similar RAM requirements for 1-core and 2-core tasks. Good point. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 ![]() ![]() |
I am also running LHC tasks and GPUGRID tasks concurrently on 2 of my PCs, but never had any problems with CPU core Reservation/allocation. In your case, it doesn't really matter if you use an app_config.xml for GPU Grid, since you have four extra cores to feed the two cards. However, if LHC (or any other project) were allowed to run on all the cores, then they would use all 12 cores. By default, GPU Grid does not reserve a whole core for itself, so it would have to run on only a partial core, slowing it down. In that case, you normally use an app_config.xml to reserve a whole core for each GPU for best performance. |
Send message Joined: 2 May 07 Posts: 1752 Credit: 136,500,670 RAC: 31,259 ![]() ![]() ![]() |
This Workunit doesn't finished for more than four PC's: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=80187546 |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 ![]() ![]() |
|
©2023 CERN