Message boards :
Theory Application :
MadGraph5
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 May 07 Posts: 2090 Credit: 158,856,517 RAC: 126,388 |
Have a MadGraph5 Task running more than 40 hours so long. madgraph5amc 2.6.5.atlas nlo2jet - zinclusive 7000 -,-,50,130 MC Production matrix: 0+4/18 Is there a chance of finishing, runRivet.log is still growing, last line so long: INFO: Idle:185, Running: 2, Completed: 413 [ 35h 51 min] https://launchpad.net/mg5amcnlo |
Send message Joined: 13 Jul 05 Posts: 167 Credit: 14,938,551 RAC: 191 |
My guess is that the "idle" number is slowly reducing until "Completed" reaches 600 when either the task completes, or starts a whole new phase... Can you leave it for ~20hrs and see what happens? As long as the log file is growing then there's some grounds for optimism it'll finish OK. Back-stepping through the log file to the start of the current phase should tell you what it's trying to do in this phase. My experience with madgraph hasn't been been good - native it will run 2 cores forcing other tasks off the machine, Is there a chance of finishing, runRivet.log is still growing, last line so long: It also has significant stretches of not actually using CPU at all. We did have a thread about it some months back. |
Send message Joined: 2 May 07 Posts: 2090 Credit: 158,856,517 RAC: 126,388 |
Have found this thread you wrote - Extreme overload: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5323#41736 Have native Linux with ONE Cpu, but in the log is a entry to use two cpu's (set nb_core 2) How can this second Cpu being used? The running: is 2. Now 548 Completed and Idle: 50 (seem 600 is the max.) It would be nice if the Theory-task is reaching the 10 day limit, to get some points for this pain ;-). Edit: There are quite a lot Fontconfig error: Cannot load default config file |
Send message Joined: 13 Jul 05 Posts: 167 Credit: 14,938,551 RAC: 191 |
Have found this thread you wrote - Extreme overload: Same way as it used all 232 cores on computezrmle's machine! :( It'll just chuck processes at the OS and see what happens - isn't there a rivetvm.exe as well, or is that idle while madgraph does its multiprocessing thing? The running: is 2. Now 548 Completed and Idle: 50 (seem 600 is the max.) Looking back at that thread you might want to do a grep subprocess /var/lib/boinc/slots/?/cernvm/shared/runRivet.logto check that 600 is indeed the correct number (edit: just in case "idle" doesn't mean what I think it does). |
Send message Joined: 13 Jul 05 Posts: 167 Credit: 14,938,551 RAC: 191 |
It also has significant stretches of not actually using CPU at all.e.g I recently killed task 281349801 precisely because it was holding two cores but idle - it's reported as using just 50 mins in 20 hours :( We did have a thread (which maeax has kindly tracked down) about it some months back.This does remind me that I was going to complain there that even hard-wiring the coreness to two isn't really good enough - it should either be one, or else the WUs submitted to BOINC with a consistent #cores requirement. |
Send message Joined: 24 Oct 04 Posts: 1117 Credit: 49,723,551 RAC: 13,979 |
I have never had any problems with MadGraph5 event generator tasks but it would take all day to go through all of the Valids to find just how long any of them ran and most of mine are probably from the other version of Theory 300.06 But I do watch ALL of mine start running and check the finished tasks and have saved examples of all the different event generator versions ( I have some epos and herwig7 and herwig++ running) and a few sherpa but mostly the pythia versions. I will get on one of my desktops and see what I have saved there later today if I get a chance. |
Send message Joined: 13 Jul 05 Posts: 167 Credit: 14,938,551 RAC: 191 |
True - there's a sampling feature in that I only check in rarely and follow up on tasks that look to be misbehaving. I also wonder if madgraph behaves better within a VM where it can't see any other cores. |
Send message Joined: 2 May 07 Posts: 2090 Credit: 158,856,517 RAC: 126,388 |
First round was ending after 600 events (49h 2m). The second is running for the moment. First line Computing upper envelope INFO: Idle:598, Running 2, Completed: 0 current time 16h17 (Thinking this is the time to finish the second round. Tomorrow morning seeing the next point - good night. |
Send message Joined: 2 May 07 Posts: 2090 Credit: 158,856,517 RAC: 126,388 |
Magic you are right. After: Laufzeit 3 Tage 5 Stunden 48 min. 40 sek. CPU Zeit 6 Tage 2 Stunden 0 min. The task finished successful. Don't understand why a second Cpu was used in a VM with one CPU defined. |
Send message Joined: 24 Oct 04 Posts: 1117 Credit: 49,723,551 RAC: 13,979 |
Thanks maeax |
©2024 CERN