Message boards : ATLAS application : New app version 1.01
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 29289 - Posted: 15 Mar 2017, 9:57:14 UTC

I tested ATLAS-VM's up to 4 cores.

Single core needs 2600 MB and the multicore needs 4400 MB of RAM.
Yes, also the 4-core is running with 4400 MB using real 4 threads. Don't have "top" to see the memory usage inside the VM and possible swapping :-(
I've to wait (slow i7 2600) till running tasks are ready to test with higher number of cores with the same base 4400MB.
Maybe we only need 2 applications: single core and multi core.
ID: 29289 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29305 - Posted: 15 Mar 2017, 16:41:57 UTC

Yes, also the 4-core is running with 4400 MB using real 4 threads

I have had 8 of 4-cores tasks with 4400 MB running smoothly today, and I am currently testing 8-cores with 4400 MB. So far so good.

My standard suggestion in this case:
Think about to use a proxy, e.g. squid.

Thanks for the suggestion. I'll give it a try.
We are the product of random evolution.
ID: 29305 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29308 - Posted: 15 Mar 2017, 18:10:34 UTC
Last modified: 15 Mar 2017, 18:11:01 UTC

The 8-cores task with 4400 MB completed successfully: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126137372
We are the product of random evolution.
ID: 29308 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 29309 - Posted: 15 Mar 2017, 18:39:55 UTC - in response to Message 29308.  

The 8-cores task with 4400 MB completed successfully: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126137372

Did you take a look on the efficiency ?


Supporting BOINC, a great concept !
ID: 29309 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 29310 - Posted: 15 Mar 2017, 18:41:40 UTC - in response to Message 29309.  
Last modified: 15 Mar 2017, 18:42:00 UTC

The 8-cores task with 4400 MB completed successfully: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126137372

Did you take a look on the efficiency ?

Efficiency of the 8 4-core tasks was 70.4%.

The one 8-core task had an efficiency of 59.7%
ID: 29310 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 99
Credit: 30,618,118
RAC: 3,938
Message 29311 - Posted: 15 Mar 2017, 18:52:09 UTC - in response to Message 29310.  

Is the 4-core the most efficient of the various configurations? I have been assuming so for all of my work.

Thanks!
Regards,
Bob P.
ID: 29311 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29316 - Posted: 16 Mar 2017, 3:23:35 UTC

Did you take a look on the efficiency ?

How do you measure the efficiency?
We are the product of random evolution.
ID: 29316 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29317 - Posted: 16 Mar 2017, 3:33:30 UTC
Last modified: 16 Mar 2017, 3:37:32 UTC

How do you measure the efficiency?

I think I understood how it is calculated.
As I mentioned earlier in this thread, the most efficient tasks on my computers are 1-core tasks. This is because of the very long idle time of the task (due to network bandwidth / latency issues from my home in Dubai).
The best I can get is 85% to 90% on long 1-core tasks.

So in my particular case, I don't think a low efficiency with 4400 MB on 4-core or 8-core would be conclusive. I would suggest that someone with short idle time tests the efficiency with 4400 MB.
We are the product of random evolution.
ID: 29317 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 29523 - Posted: 22 Mar 2017, 11:16:57 UTC
Last modified: 22 Mar 2017, 11:17:36 UTC

I returned this task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=127457570

At the end of the task, it uploaded 819K from BOINC's project directory, but in the result I don't see a 60MB HITS*.root file, so I think it was not useful for ATLAS.
Console showed towards the end of this 4 core VM ready events 23, 25, 27, 25 - makes 100 events.
ID: 29523 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 29526 - Posted: 22 Mar 2017, 13:01:42 UTC - in response to Message 29523.  

The log shows this

2017-03-22 12:00:11 (10356): Guest Log: PyJobTransforms.transform.execute 2017-03-22 11:51:20,813 CRITICAL Transform executor raised TransformValidationException: HITSMergeAthenaMP0 got a SIGSEGV signal (exit code 139)
2017-03-22 12:00:11 (10356): Guest Log: PyJobTransforms.transform.execute 2017-03-22 11:51:24,222 WARNING Transform now exiting early with exit code 65 (HITSMergeAthenaMP0 got a SIGSEGV signal (exit code 139))

Which means there was a segmentation fault in the ATLAS simulation code. This is a bug in the software and nothing to do with your PC and it will eventually be reported back to the developers. This is why we still give credit for these WU even if the result is not useful for us.[/quote]
ID: 29526 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,389,652
RAC: 102,176
Message 29623 - Posted: 25 Mar 2017, 14:39:31 UTC - in response to Message 29257.  

HerveUAE wrote:
I wonder if the memory size is too small for one and two cores so that's why they don't succeed.

I am currently running 2 2-core ATLAS tasks, one with default RAM assigned by server (3400 MB) and one manually forced through app_config at 5000 MB.


On my PC with 8 GB RAM and a Quadcore-CPU (real 4 cores, no HT), I have so far been running 1 ATLAS, 1 Rosetta, 1 WCG and 1 GPUGRID Task.
Now I decided to try a 2-core ATLAS task and stopped one of the other projects.

Since I had learned that with 2-core ATLAS tasks, it's necessary to increase the RAM allocation to 4400MB by app_config.xml, I did this. In fact, I allocated 4500MB to begin with.
However, this did not work, the console showed me something like "Memory error" or similar.
So I increased the value in the app_config.xml to 5000MB, and now it works.

Hence, even the 4400MB which were mentioned here in the forum several times, may not be enough (and, in fact, in my case 4500MB not either).
ID: 29623 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,389,652
RAC: 102,176
Message 29635 - Posted: 25 Mar 2017, 17:31:13 UTC - in response to Message 29623.  

So I increased the value in the app_config.xml to 5000MB, and now it works.

Unfortunately, I was too early with my above statement.

After some 13 minutes (during which the console was looking good) the task stopped - see here:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=128566639

So I tried again, and it doesn't work either. The console for example says:
kernel Panic - not syncing: attempted to kill init. Exit code0x00000007.
I abortet the task after 1 hour, since no RAM was used all the time, and the CPU was working only with 1 core, instead of 2.
see here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=128570490

What's going wrong?

BTW, before, 1-core tasks were processed without any problem.
ID: 29635 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 29642 - Posted: 26 Mar 2017, 9:49:32 UTC - in response to Message 29623.  

I have completed a 2 core native Atlas task on one of my Linux boxen and it completed with a 4100 RAM. It has produced a HITS file. Another is running.
Tullio
ID: 29642 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,389,652
RAC: 102,176
Message 29644 - Posted: 26 Mar 2017, 11:55:33 UTC - in response to Message 29635.  

After some 13 minutes (during which the console was looking good) the task stopped - see here:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=128566639

Does any of the experts here have any idea why this 2-core task failed?
What catches my eye when reading the stderr.txt is what can be read at time log 17:15:25. At this point the problems seem to begin.

FYI, the PC has 8GB RAM, and in the app_config.xml I assigned 5000MB.
ID: 29644 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 29646 - Posted: 26 Mar 2017, 14:12:22 UTC - in response to Message 29644.  

Does any of the experts here have any idea why this 2-core task failed?
What catches my eye when reading the stderr.txt is what can be read at time log 17:15:25. At this point the problems seem to begin.

In fact the problem was during the initializing phase. So it's an ATLAS problem, not your machine.

Guest Log: PyJobTransforms.trfExe.execute 2017-03-25 17:06:39,415 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh'])
Guest Log: PyJobTransforms.trfExe.execute 2017-03-25 17:12:29,838 INFO EVNTtoHITS executor returns 137
Guest Log: PyJobTransforms.trfExe.validate 2017-03-25 17:12:30,758 ERROR Validation of return code failed: EVNTtoHITS got a SIGKILL signal (exit code 137) (Error code 65)
ID: 29646 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,389,652
RAC: 102,176
Message 29648 - Posted: 26 Mar 2017, 14:39:37 UTC - in response to Message 29646.  

In fact the problem was during the initializing phase. So it's an ATLAS problem, not your machine.

Thanks for your answer.
However, the next task I then tried also got a problem, as written above:

So I tried again, and it doesn't work either. The console for example says:
kernel Panic - not syncing: attempted to kill init. Exit code0x00000007.
I abortet the task after 1 hour, since no RAM was used all the time, and the CPU was working only with 1 core, instead of 2.
see here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=128570490

any hint what this could have been?
ID: 29648 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 29649 - Posted: 26 Mar 2017, 15:00:45 UTC

At least that machine can produce valid work: https://lhcathome.cern.ch/lhcathome/result.php?resultid=128669972

The CPU is introduced 9 years ago running Win10 now as OS. Maybe the processor is overloaded?
ID: 29649 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,389,652
RAC: 102,176
Message 29650 - Posted: 26 Mar 2017, 15:17:12 UTC - in response to Message 29649.  

Also before, with the "old" ATLAS jobs, this machine had a problem with crunching 2-core tasks. It simply did not work.
Anyway, I was hoping that with the new Simulation Version 1.01 it might function, but obviously it does NOT :-(

I was then trying to run 2 jobs 1-core each, but - as expected - I ran out of RAM.
So I must accept that this machine, despite of 8GB RAM, can only run 1 ATLAS job at a time.

I then tried to run 1 ATLAS and 1 CMS job simultaneously, which used up 93% of RAM, and hence is not the best solution either; besides that I noticed that downloading ATLAS plus CMS tasks did not work the way it was supposed to (at one time I had 2 ATLAS jobs downloaded, then 2 CMS jobs ...).
ID: 29650 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,389,652
RAC: 102,176
Message 29658 - Posted: 26 Mar 2017, 19:44:04 UTC - in response to Message 29649.  

The CPU is introduced 9 years ago running Win10 now as OS. Maybe the processor is overloaded?

Maybe this CPU (Intel Core2 Quad Q9550), from it's technical specs, is not able to run such a task on more than 1 core.
I am not a CPU expert, maybe someone would be able to tell.
ID: 29658 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,130,430
RAC: 104,897
Message 29667 - Posted: 27 Mar 2017, 8:40:41 UTC

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=62016639

Cpu-Time more than 16 hours at the moment. Checkpoint-intervall ok.

2 CPU's, normally on this Computer this tasks ended after 5 or 6 hours.

Thanks for help.
ID: 29667 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : ATLAS application : New app version 1.01


©2024 CERN