Message boards :
ATLAS application :
New app version 1.01
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,817 RAC: 2,374 |
I tested ATLAS-VM's up to 4 cores. Single core needs 2600 MB and the multicore needs 4400 MB of RAM. Yes, also the 4-core is running with 4400 MB using real 4 threads. Don't have "top" to see the memory usage inside the VM and possible swapping :-( I've to wait (slow i7 2600) till running tasks are ready to test with higher number of cores with the same base 4400MB. Maybe we only need 2 applications: single core and multi core. |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
Yes, also the 4-core is running with 4400 MB using real 4 threads I have had 8 of 4-cores tasks with 4400 MB running smoothly today, and I am currently testing 8-cores with 4400 MB. So far so good. My standard suggestion in this case: Thanks for the suggestion. I'll give it a try. We are the product of random evolution. |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
The 8-cores task with 4400 MB completed successfully: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126137372 We are the product of random evolution. |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,569,815 RAC: 9,173 |
The 8-cores task with 4400 MB completed successfully: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126137372 Did you take a look on the efficiency ? Supporting BOINC, a great concept ! |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,817 RAC: 2,374 |
The 8-cores task with 4400 MB completed successfully: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126137372 Efficiency of the 8 4-core tasks was 70.4%. The one 8-core task had an efficiency of 59.7% |
Send message Joined: 17 Sep 04 Posts: 99 Credit: 30,741,655 RAC: 8,070 |
Is the 4-core the most efficient of the various configurations? I have been assuming so for all of my work. Thanks! Regards, Bob P. |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
Did you take a look on the efficiency ? How do you measure the efficiency? We are the product of random evolution. |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
How do you measure the efficiency? I think I understood how it is calculated. As I mentioned earlier in this thread, the most efficient tasks on my computers are 1-core tasks. This is because of the very long idle time of the task (due to network bandwidth / latency issues from my home in Dubai). The best I can get is 85% to 90% on long 1-core tasks. So in my particular case, I don't think a low efficiency with 4400 MB on 4-core or 8-core would be conclusive. I would suggest that someone with short idle time tests the efficiency with 4400 MB. We are the product of random evolution. |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,817 RAC: 2,374 |
I returned this task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=127457570 At the end of the task, it uploaded 819K from BOINC's project directory, but in the result I don't see a 60MB HITS*.root file, so I think it was not useful for ATLAS. Console showed towards the end of this 4 core VM ready events 23, 25, 27, 25 - makes 100 events. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
The log shows this 2017-03-22 12:00:11 (10356): Guest Log: PyJobTransforms.transform.execute 2017-03-22 11:51:20,813 CRITICAL Transform executor raised TransformValidationException: HITSMergeAthenaMP0 got a SIGSEGV signal (exit code 139) 2017-03-22 12:00:11 (10356): Guest Log: PyJobTransforms.transform.execute 2017-03-22 11:51:24,222 WARNING Transform now exiting early with exit code 65 (HITSMergeAthenaMP0 got a SIGSEGV signal (exit code 139)) Which means there was a segmentation fault in the ATLAS simulation code. This is a bug in the software and nothing to do with your PC and it will eventually be reported back to the developers. This is why we still give credit for these WU even if the result is not useful for us.[/quote] |
Send message Joined: 18 Dec 15 Posts: 1689 Credit: 103,941,078 RAC: 122,200 |
HerveUAE wrote: I wonder if the memory size is too small for one and two cores so that's why they don't succeed. On my PC with 8 GB RAM and a Quadcore-CPU (real 4 cores, no HT), I have so far been running 1 ATLAS, 1 Rosetta, 1 WCG and 1 GPUGRID Task. Now I decided to try a 2-core ATLAS task and stopped one of the other projects. Since I had learned that with 2-core ATLAS tasks, it's necessary to increase the RAM allocation to 4400MB by app_config.xml, I did this. In fact, I allocated 4500MB to begin with. However, this did not work, the console showed me something like "Memory error" or similar. So I increased the value in the app_config.xml to 5000MB, and now it works. Hence, even the 4400MB which were mentioned here in the forum several times, may not be enough (and, in fact, in my case 4500MB not either). |
Send message Joined: 18 Dec 15 Posts: 1689 Credit: 103,941,078 RAC: 122,200 |
So I increased the value in the app_config.xml to 5000MB, and now it works. Unfortunately, I was too early with my above statement. After some 13 minutes (during which the console was looking good) the task stopped - see here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=128566639 So I tried again, and it doesn't work either. The console for example says: kernel Panic - not syncing: attempted to kill init. Exit code0x00000007. I abortet the task after 1 hour, since no RAM was used all the time, and the CPU was working only with 1 core, instead of 2. see here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=128570490 What's going wrong? BTW, before, 1-core tasks were processed without any problem. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I have completed a 2 core native Atlas task on one of my Linux boxen and it completed with a 4100 RAM. It has produced a HITS file. Another is running. Tullio |
Send message Joined: 18 Dec 15 Posts: 1689 Credit: 103,941,078 RAC: 122,200 |
After some 13 minutes (during which the console was looking good) the task stopped - see here: Does any of the experts here have any idea why this 2-core task failed? What catches my eye when reading the stderr.txt is what can be read at time log 17:15:25. At this point the problems seem to begin. FYI, the PC has 8GB RAM, and in the app_config.xml I assigned 5000MB. |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,817 RAC: 2,374 |
Does any of the experts here have any idea why this 2-core task failed? In fact the problem was during the initializing phase. So it's an ATLAS problem, not your machine. Guest Log: PyJobTransforms.trfExe.execute 2017-03-25 17:06:39,415 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh']) Guest Log: PyJobTransforms.trfExe.execute 2017-03-25 17:12:29,838 INFO EVNTtoHITS executor returns 137 Guest Log: PyJobTransforms.trfExe.validate 2017-03-25 17:12:30,758 ERROR Validation of return code failed: EVNTtoHITS got a SIGKILL signal (exit code 137) (Error code 65) |
Send message Joined: 18 Dec 15 Posts: 1689 Credit: 103,941,078 RAC: 122,200 |
In fact the problem was during the initializing phase. So it's an ATLAS problem, not your machine. Thanks for your answer. However, the next task I then tried also got a problem, as written above: So I tried again, and it doesn't work either. The console for example says: kernel Panic - not syncing: attempted to kill init. Exit code0x00000007. I abortet the task after 1 hour, since no RAM was used all the time, and the CPU was working only with 1 core, instead of 2. see here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=128570490 any hint what this could have been? |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,817 RAC: 2,374 |
At least that machine can produce valid work: https://lhcathome.cern.ch/lhcathome/result.php?resultid=128669972 The CPU is introduced 9 years ago running Win10 now as OS. Maybe the processor is overloaded? |
Send message Joined: 18 Dec 15 Posts: 1689 Credit: 103,941,078 RAC: 122,200 |
Also before, with the "old" ATLAS jobs, this machine had a problem with crunching 2-core tasks. It simply did not work. Anyway, I was hoping that with the new Simulation Version 1.01 it might function, but obviously it does NOT :-( I was then trying to run 2 jobs 1-core each, but - as expected - I ran out of RAM. So I must accept that this machine, despite of 8GB RAM, can only run 1 ATLAS job at a time. I then tried to run 1 ATLAS and 1 CMS job simultaneously, which used up 93% of RAM, and hence is not the best solution either; besides that I noticed that downloading ATLAS plus CMS tasks did not work the way it was supposed to (at one time I had 2 ATLAS jobs downloaded, then 2 CMS jobs ...). |
Send message Joined: 18 Dec 15 Posts: 1689 Credit: 103,941,078 RAC: 122,200 |
The CPU is introduced 9 years ago running Win10 now as OS. Maybe the processor is overloaded? Maybe this CPU (Intel Core2 Quad Q9550), from it's technical specs, is not able to run such a task on more than 1 core. I am not a CPU expert, maybe someone would be able to tell. |
Send message Joined: 2 May 07 Posts: 2101 Credit: 159,819,191 RAC: 123,837 |
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=62016639 Cpu-Time more than 16 hours at the moment. Checkpoint-intervall ok. 2 CPU's, normally on this Computer this tasks ended after 5 or 6 hours. Thanks for help. |
©2024 CERN