Message boards :
ATLAS application :
ATLAS task at 100% for 10h
Message board moderation
Author | Message |
---|---|
Send message Joined: 21 Dec 07 Posts: 3 Credit: 1,092,719 RAC: 0 |
Dear all, My PC is running an ATLAS task with 8 cores, at the beginning it said it needs about 1,5h and now it is already running 34h and the percentage of completion encolsed more and more the 100% and now it is 100% since about 10h. Will it finish by itself or is there something wrong? Meanwhile I updated also Boinc to 7.16.11 (with corresponding vbox)... I am running an Intel i9 and Win10. And somehow my CU load is curretnly at nearly 0% even that some tasks are running (Theory simulation) anybody with hints? Der Martinator |
Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,517,991 RAC: 124,188 |
... at the beginning it said it needs about 1,5h and now it is already running 34h and the percentage of completion encolsed more and more the 100% and now it is 100% since about 10h... These are fake numbers - as mentioned many times in this forum. Guess it's this computer: https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10659409 Since an 8-core setup is very inefficient (also explained many times) the suggestion would be to run a 4-core setup and limit it to 3 concurrently running tasks. ToDos: Set "max #CPUs" to 4 at your preferences page https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project Limit concurrency with an app_config.xml like this: <app_config> <app> <name>ATLAS</name> <max_concurrent>3</max_concurrent> </app> </app_config> You may also consider to install the VirtualBox Extension Pack as it would allow you to access the ATLAS progress monitoring at ALT-F2. |
Send message Joined: 21 Dec 07 Posts: 3 Credit: 1,092,719 RAC: 0 |
Dear computezrmle, Thanks for you info! Yes, you are right, that computer is the PC we are talking about... Well my CPU shall have 16 cores of which I am now using 8 and it would be kind of waste of resource if I limit it to only 3 cores... I am currently using Boinc (and that also with boincstats) - shall I then set the "max #CPUs" at the LHC site? I would be sorry if my PC could only use 4 cores for LHC... Another question - what to do with these 8-core tasks which are already assigned to my PC which seem to run quite inifite? Will they come to an end by themselves or should I abort them? Thanks! |
Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,517,991 RAC: 124,188 |
ATLAS is a multicore app. Using a 4-core setup means that each task will use 4 (virtual) cores. 3 ATLAS tasks will then use 12 cores and leave 4 cores available for the OS and other stuff. A 4th 4-core ATLAS task would then allocate all remaining cores. Be aware that each ATLAS task uses 1 core during setup and phase-out and leaves all other cores allocated but idle. In addition it's a question of RAM allocation. 2 8-core tasks will allocate 20400 MB RAM while 3 4-core tasks will allocate 19800 MB RAM. Regarding the running 8-core task it's up to you whether to abort it or not. |
Send message Joined: 21 Dec 07 Posts: 3 Credit: 1,092,719 RAC: 0 |
Thanks for the explanation! I set up Boinc to use maximum 50% of the cores, so 8 are resulting. 8 cores shall remain always free. There had been a problem during the update of Boinc where VirtualBox was not properly installed so I installed it manually and at least now the other LHC tasks such as Theory simulation are doing fine. I also installed the extension pack afterwards and then when having a look at the console of this ATLAS (8 core) task then there is just "CentOS and a linux login" - nothing more. And my whole CPU is completely idle - only 1-3% for the Win10. I presume, something is wrong with these tasks... All others are running well, loading the CPU, giving a console output and finishing in reasonable time... Any idea? |
Send message Joined: 2 May 07 Posts: 2090 Credit: 158,874,648 RAC: 126,110 |
This checklist from Yeti help for troubleshooting: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4302#30658 Seeing no country or International for your Country Definition. |
Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,517,991 RAC: 124,188 |
What you see is console 1 (the default). Switching through the consoles can be done using the ALT key (press and hold) plus a function key. ALT-F1 -> console 1 ALT-F2 -> console 2 (ATLAS monitoring if it's an ATLAS task; be patient as it will take a while until the setup completes and data becomes available) ALT-F3 -> console 3 (output of the linux top command) ... If you interrupt/suspend a VM task the command starts at your BOINC client and goes to the VM via the vboxwrapper. VirtualBox then writes a snapshot of the complete VM RAM to disk (10200 MB in case of an 8-core ATLAS). Vice versa when the task resumes. In case of a shutdown all tasks must be written within 1 min or the BOINC client treats them as hanging and kills them regardless of their state. A reboot might use shorter timeouts defined by your OS. This might have happened with your longrunner. Your logfile shows typical error messages: 2020-11-02 07:59:22 (11028): Stopping VM. 2020-11-02 07:59:22 (11028): Error in stop VM for VM: -2147221164 Command: VBoxManage -q controlvm "boinc_761f0d6aaa869b4f" savestate Output: VBoxManage.exe: error: Failed to create the VirtualBox object! VBoxManage.exe: error: Code REGDB_E_CLASSNOTREG (0x80040154) - Class not registered (extended info not available) VBoxManage.exe: error: Most likely, the VirtualBox COM server is not running or failed to start. 2020-11-02 07:59:22 (11028): VM did not stop when requested. 2020-11-02 07:59:22 (11028): VM was successfully terminated. Next restart just 4 min later: 2020-11-02 08:03:15 (12936): Detected: vboxwrapper 26197 2020-11-02 08:03:15 (12936): Detected: BOINC client v7.7 2020-11-02 08:03:15 (12936): CreateProcess failed! (2). 2020-11-02 08:03:16 (12936): CreateProcess failed! (2). 2020-11-02 08:03:17 (12936): CreateProcess failed! (2). 2020-11-02 08:03:18 (12936): CreateProcess failed! (2). 2020-11-02 08:03:19 (12936): CreateProcess failed! (2). 2020-11-02 08:03:20 (12936): CreateProcess failed! (2). 2020-11-02 08:03:20 (12936): Error in version check for VM: -108 Same logfile: 2020-11-02 21:30:32 (11524): VM state change detected. (old = 'Running', new = 'Paused') 2020-11-02 21:30:33 (11524): VM state change detected. (old = 'Paused', new = 'Running') I doubt your disk is fast enough to write 10 GB and reread the same amount of data within 1 second. What to do? It depends on what else is running on that computer. General hints: - try to find a setup that allows ATLAS to run with as few suspends as possible - extend the period between task switches (BOINC client) |
©2024 CERN