1) Message boards : ATLAS application : What caused this to run 24hrs and not even complete before aborting? (Message 45558)
Posted 26 Oct 2021 by Ardis
Post:
The maximum number of cores to use has been set to one. We'll try that for a while, then may consider going to two.
2) Message boards : ATLAS application : What caused this to run 24hrs and not even complete before aborting? (Message 45529)
Posted 25 Oct 2021 by Ardis
Post:
Thanks, y'all, for taking a look at this and pointing me in some good directions. Changes since my previous post:


    Updated Virtual Box and Extension Pack to 6.1.28
    Read Yeti's List 18 Sep 2018
    Marked the PC running Atlas as visible
    Leo Moon CPU-V shows two green check marks
    verified p_vm_extensions_disabled = 0



Currently one Atlas 4 CPU WU is running, all other tasks have been suspended. Before it started, the running time was 5:29:00. Elapsed time just went over six hours, with remaining time 3:00 and a few seconds. The remaining time decrements one second for about every 3-4 seconds of elapsed time, or 19 seconds per minute.

A concern seems to be memory. This "mature" box has four cores and 8 GB physical memory + 32 GB "Virtual Boost". Is that going to be enough to float a 4 CPU WU?

Ardis

3) Message boards : ATLAS application : What caused this to run 24hrs and not even complete before aborting? (Message 45524)
Posted 23 Oct 2021 by Ardis
Post:
A similar issue here. A 4 core ATLAS WU had run for 21 hours the other day, had 42 seconds left to crunch, and the deadline was four hours away, but it wasn't running. "Waiting for memory." So I suspended all other projects, boosted memory and CPU usage to 100%, ended all non-critical processes, and crossed my fingers. It ran for another 24 hours with CPU in single digits, got to zero seconds left to compute and 99.999% completion, and then ran another three hours with no change in stats. Of course by that time it was way past the deadline. I killed it and went back to crunching elsewhere. Seems a shame to waste all that computing time with no results for the project.

At this point my concern is how to proceed with other four core WUs. There currently are five more of them in the queue, projected at about 5.5 hours each. What's the chance that they will screw everything up just like the last one? Should I abort them and opt out of ATLAS?

Your insight, please,

Ardis



©2024 CERN