Message boards : Theory Application : Extreme Overload caused by a Theory Task
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2463
Credit: 237,826,934
RAC: 121,980
Message 47487 - Posted: 4 Nov 2022, 12:36:44 UTC - in response to Message 47486.  

As far as I understood the complaint was that madgraph doesn't respect the 2-core limit any more.
The output shows that it respects the limit.

You may compare your BOINC RAM limits with the RAM usage BOINC reports for the madgraph task.
Those tasks are known to require lots of RAM.
They may also run a couple of days.
ID: 47487 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 8 Dec 19
Posts: 37
Credit: 7,587,303
RAC: 8
Message 47497 - Posted: 5 Nov 2022, 6:06:00 UTC - in response to Message 47487.  
Last modified: 5 Nov 2022, 6:09:40 UTC

It's annoying that the task finished before I had a chance to figure out how to make other tasks run alongside. I was going to try an app_config modification. I already found that you can manually make other tasks run by suspend/resume the MadGraph one momentarily but once those tasks finished new ones wouldn't start on their own.

My complaint was that other tasks wouldn't run alongside MadGraph. The 2-core limit not being respected was a possible reason I was thinking before knowing how and what to look for. I have no resource restrictions on BOINC settings. I did notice that MadGraph does take a lot of RAM, ~7.3 GB for that one, I believe it's the most I've ever seen for a BOINC task of any project I've ran. I have 12 GB allocated to WSL so it shouldn't have prevented other tasks from starting. I believe BOINC will suspend tasks saying "waiting for memory" if it detects that there's not enough.

Is it likely a BOINC issue or MadGraph issue? Does 2 processes limit necessarily mean 2-core limit or could those 2 processes be using all available cores? I think I may have seen evidence for the latter.

I wish this task ran a couple of days to get more time to try to figure things out. I didn't expect it to be done this soon otherwise I could've restarted it. I wonder if there's a way to get another one?
ID: 47497 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 8 Dec 19
Posts: 37
Credit: 7,587,303
RAC: 8
Message 47516 - Posted: 10 Nov 2022, 21:35:36 UTC

It turns out MadGraph tasks are more common than I realized. I found a few more in the Valid list. That last one was an odd one in that it prevented anything else from running. Others haven't done that. They do use more than one CPU thread as their CPU time is greater than run time and they're long-runners, taking over 24 hrs. to complete.
ID: 47516 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2463
Credit: 237,826,934
RAC: 121,980
Message 47517 - Posted: 11 Nov 2022, 6:30:24 UTC - in response to Message 47516.  

It turns out MadGraph tasks are more common than I realized.

Theory's backend is mcplots.
We are currently processing "runs" from mcplots revision 2390 which has a total of 70981 rows.
192 of them are "madgraph" which means 0.27 %.

Here's a snippet from the complete list:

run 	events 	attempts 	success 	failure 	unknown
pp zinclusive 7000 -,-,50,130 - madgraph5amc 2.4.3.atlas lo 	7500000 	83 	75 	3	5
pp zinclusive 7000 -,-,50,130 - madgraph5amc 2.4.3.atlas lo1jet 	7000000 	84 	70 	6	8
pp zinclusive 7000 -,-,50,130 - madgraph5amc 2.4.3.atlas lo2jet 	6932000 	83 	70 	3	10
pp zinclusive 7000 -,-,50,130 - madgraph5amc 2.5.5.atlas lo 	7600000 	83 	76 	3	4



That last one was an odd one in that it prevented anything else from running. Others haven't done that.

No idea why.


They do use more than one CPU thread...

Yes.
And because of this...
... as their CPU time is greater than run time


and they're long-runners, taking over 24 hrs. to complete.

Not really long.
There are Theory tasks (very few) running a couple of days, sometimes a week.
The BOINC server has no influence on the mcplots backend queue.
It has to send out what it gets from there, short tasks, long tasks, sherpas, madgraphs ...
ID: 47517 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Theory Application : Extreme Overload caused by a Theory Task


©2024 CERN