Message boards : Theory Application : New version 263.90
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,446,182
RAC: 103,178
Message 37549 - Posted: 5 Dec 2018, 17:28:59 UTC

again, we have the "no subtasks" problem: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10544654.

Will it ever be solved?
ID: 37549 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,503,283
RAC: 3,836
Message 38475 - Posted: 28 Mar 2019, 1:30:57 UTC

Just got 10 (or maybe more now) [ERROR] Condor ended after 1099 seconds.

The ones that run for 30 minutes and then become errors (1 (0x00000001) Unknown error code)
ID: 38475 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,181,589
RAC: 104,964
Message 39743 - Posted: 28 Aug 2019, 12:19:38 UTC

ID: 39743 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,637
RAC: 1,939
Message 39761 - Posted: 30 Aug 2019, 8:33:46 UTC

I requested new Theory Vbox tasks this morning.
In my preferences I had set 4 tasks max and no limit for the number of usable CPU's (threads), so max 8 available.
Until yesterday BOINC reserves then 630 MB RAM + 8 x 100MB, so total 1430 MB RAM.
This morning however it reserves suddenly 6750 MB RAM (rsc_fpops_bound = 7077888000) for the new tasks.

Something changed server side?
ID: 39761 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,180,005
RAC: 0
Message 39768 - Posted: 30 Aug 2019, 20:26:10 UTC - in response to Message 39761.  

Same problem here.

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4669&postid=39767#39767

I have set 1 task at the most, I hate to see idle cores without no apparent reason.
ID: 39768 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,446,182
RAC: 103,178
Message 39770 - Posted: 31 Aug 2019, 5:03:06 UTC

what I noticed this morning: although normally 7 tasks are running concurrently (according to the setting in the app_config.xml), now only 3 are running concurrently. Only when one of these 3 got finished, the next one in the queue started. So, at no time more than 3 tasks are running at a time.

What's happening here, all of a sudden? My CPU has 6+6(HT) cores, so I hate to leave most of them idle.
ID: 39770 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,446,182
RAC: 103,178
Message 39772 - Posted: 31 Aug 2019, 7:56:13 UTC

in order to keep more than 3 of my CPU cores busy, I managed to download several CMS tasks (it took me more than half an hour trying manually, due to the known problem caused by the millions of Sixtrack tasks on the server, blocking everything else).

So, in the the app_config.xml the setting for the concurrently running CMS tasks is set to 3 (which would mean, that besides the 3 Theory tasks, also 3 CMS tasks are running).
However, only 1 CMS tasks got started, not 3.

I guess all this has to do with the sudden much too high RAM allcoation for the Theory VM tasks, about which Crystal Pellet was posting yesterday.
This does not only cause problems for Theory tasks, but for all other VM tasks as well :-(

Could someone get this problem of the too high RAM allocation solved?
ID: 39772 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,181,589
RAC: 104,964
Message 39773 - Posted: 31 Aug 2019, 8:03:57 UTC

The next Linux running inside VM is CentOS7, up to now it was ScientificLinux.
David wrote about this:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5115&postid=39621#39621
ID: 39773 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,446,182
RAC: 103,178
Message 39774 - Posted: 31 Aug 2019, 8:14:09 UTC - in response to Message 39773.  

The next Linux running inside VM is CentOS7, up to now it was ScientificLinux.
how does this stand in relation to the RAM allocation problem described above?
ID: 39774 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,180,005
RAC: 0
Message 39775 - Posted: 31 Aug 2019, 8:31:21 UTC
Last modified: 31 Aug 2019, 8:35:01 UTC

I crunch ATLAS + VLHC on another 4-core host that has got 24GB. Is it not enough?
If all the tasks will need 6.5GB, this host would use only 3 cores too.

Could someone get this problem of the too high RAM allocation solved?
It's not an allocation problem for now because VMs do actually use the same amount of RAM as before.
It's about BOINC that calculates how many tasks to run.

7077888000*3 < 24GB
7077888000*4 > 24GB !!!

I don't know if they will use 6.5GB in the future.
ID: 39775 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,637
RAC: 1,939
Message 39776 - Posted: 31 Aug 2019, 8:48:18 UTC - in response to Message 39775.  

For me it looks like a typo from a Theory administrator.

It looks like someone want to decrease the rsc_disk_bound from 8000000000 bytes (7629.39453125 MB) to a lower and a rounded value in MB's to 7077888000 bytes (6750 MB),
but in stead of changing the rsc_disk_bound he/she changed the rsc_memory_bound.
ID: 39776 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,446,182
RAC: 103,178
Message 39777 - Posted: 31 Aug 2019, 12:17:52 UTC - in response to Message 39775.  

It's not an allocation problem for now because VMs do actually use the same amount of RAM as before.
they use the same amount of RAM as before, but due to this (probably unintended) change in the parameters, BOINC thinks the VM needs much more RAM and therefore does not let start more than just a few VMs at a time.


At the bottom line, this is very bad for the project, because a lot less tasks can be crunched now as opposed to before.
ID: 39777 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,162,254
RAC: 15,940
Message 39778 - Posted: 31 Aug 2019, 12:24:11 UTC

Just reduce the number of CPUs on your web preferences and the rsc_memory_bound is reduced accordingly. I have set 4 CPUs on web site and Boinc reserves 3750 MB memory for each Theory task. I have set 730 MB and 1 CPU core with app_config and that is what the app is actually using.
ID: 39778 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,637
RAC: 1,939
Message 39779 - Posted: 31 Aug 2019, 13:04:56 UTC - in response to Message 39778.  
Last modified: 1 Sep 2019, 7:32:27 UTC

Just reduce the number of CPUs on your web preferences and the rsc_memory_bound is reduced accordingly. I have set 4 CPUs on web site and Boinc reserves 3750 MB memory for each Theory task. I have set 730 MB and 1 CPU core with app_config and that is what the app is actually using.

Thanks Harri for pointing to this. So my theory is false, cause a new formula for memory use seems to be implemented and not a fixed value.

750MB + 750MB for each thread in Max # CPUs in your preferences.
ID: 39779 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,180,005
RAC: 0
Message 39781 - Posted: 31 Aug 2019, 16:42:32 UTC - in response to Message 39777.  

they use the same amount of RAM as before, but due to this (probably unintended) change in the parameters, BOINC thinks the VM needs much more RAM and therefore does not let start more than just a few VMs at a time.

Of course, this is what I meant.


Anyway I have set 8-thread VMs and it's working fine compared to 1 year ago (post 36607).

From BOINC Manager it results:
CPU time = 51.1h
elapsed time = 7.5h
working set size = 6.59GB

Average CPU used: 51.1 / 7.5 = 6.8
I hope BOINC credits would not drop when I run 1 task.
ID: 39781 · Report as offensive     Reply Quote
Greger

Send message
Joined: 9 Jan 15
Posts: 151
Credit: 431,596,822
RAC: 0
Message 39782 - Posted: 31 Aug 2019, 20:13:58 UTC

For the project it would help if project admins set count in cores to what the applications could scale up to. For users experience and also to project it could have great balance for MT task. I'm happy to see that Theory that they move out of single core task and also make it possible for native application but still on low processes in jobs to each task (2-4) last tested. I did a test last year and got 40 cores task at that time and host end at 6% in cpu usage. Way out of even possible to reach that count and it could be changed since that time but i would not use more then 4 cores today based on experience of what it could run on my host. Today i use 3 cores each but it would probably be times that only 1 cores are needed when it hit a long runner job.

Looking at Atlas it scale up to even 12 cores and really use up to that count and end faster if when events are done.

Theory could be reduced to max 4 cores as default and when jobs to vm are sorted out it could be increased to core count it could scale up to. This would put away high credits for users that use app_config and project would get more task running and work more effectively.

Remove "unlimited" in users LHC@home preferences to cores the application really could use as max target.
ID: 39782 · Report as offensive     Reply Quote
Greger

Send message
Joined: 9 Jan 15
Posts: 151
Credit: 431,596,822
RAC: 0
Message 39783 - Posted: 1 Sep 2019, 1:48:46 UTC - in response to Message 39782.  
Last modified: 1 Sep 2019, 2:36:14 UTC

Did a test and load 2.7 for Theory #Unlimited 32 core task and total for system around 3.6 cores .
Virtualbox would have 16C as supported usage so the do not know it would handle 32C task for Theory.

Other project as Cosmology is effected on this limit and they hand out 32C MT task in docker container but they fail at start if user do not set a limit.
ID: 39783 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,180,005
RAC: 0
Message 39784 - Posted: 1 Sep 2019, 9:14:39 UTC - in response to Message 39778.  

Just reduce the number of CPUs on your web preferences and the rsc_memory_bound is reduced accordingly. I have set 4 CPUs on web site and Boinc reserves 3750 MB memory for each Theory task. I have set 730 MB and 1 CPU core with app_config and that is what the app is actually using.
It worked. I have set 1 CPU and got tasks with 1500MB reserved.


Anyway I have set 8-thread VMs and it's working fine compared to 1 year ago (post 36607).

From BOINC Manager it results:
CPU time = 51.1h
elapsed time = 7.5h
working set size = 6.59GB

Average CPU used: 51.1 / 7.5 = 6.8
I hope BOINC credits would not drop when I run 1 task.
Ratio got bad.

Now avg = 122.1 / 24.0 = 5.1 threads.

There is only 1 running thread right now, so multithreading is still an unreliable option for me.
ID: 39784 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,446,182
RAC: 103,178
Message 39785 - Posted: 1 Sep 2019, 13:00:43 UTC - in response to Message 39779.  

Crystal Pellet wrote:
... So my theory is false, cause a new formula for memory use seems to be implemented and not a fixed value.
would be interesting to know why this was done ...
ID: 39785 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,181,589
RAC: 104,964
Message 39790 - Posted: 2 Sep 2019, 0:09:30 UTC - in response to Message 39743.  
Last modified: 2 Sep 2019, 0:16:18 UTC

There are Tasks from some Computer with more than 5k Points:
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10555784&offset=0&show_names=0&state=4&appid=13
Since two days have also Theory running, get the normal 500 Points:
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10548292&offset=0&show_names=0&state=4&appid=13

No Comments?
ID: 39790 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Theory Application : New version 263.90


©2024 CERN