Message boards :
ATLAS application :
Another crappy task
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Dec 08 Posts: 346 Credit: 5,415,700 RAC: 6,107 ![]() ![]() ![]() |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=256978465 This is crazy! It burns through a ton of cpu time at the start of the task and at the very end of the task it craps out and stalls. In 13 hrs it has used only 222 units of CPU time. in the last hour on checkpoints it used 49 units of CPU time! What is going on? There is no more FAH CPU to interfere, all other projects for CPU are BOINC based. A script restricts ATLAS to 4 cores which a person said would make it more stable, but here I am again with a bogged down unit at 99.999% done. With a theoretical 1 second left. I went to bed and it had 4 seconds to go and I got up 8 hrs later and its stuck on 1 second and in the last 3 hours it has not moved at all. CPU time before it crashed while I was trying to force it to complete by allocating all resources to it was only 2% What is going on with these ATLAS tasks? I complete a whole bunch of them ok and then I get one or two that crap out. This is not normal. 2019-12-30 16:36:46 (12132): Detected: vboxwrapper 26197 2019-12-30 16:36:46 (12132): Detected: BOINC client v7.7 2019-12-30 16:36:47 (12132): Detected: VirtualBox VboxManage Interface (Version: 6.1.0) 2019-12-30 16:36:47 (12132): Successfully copied 'init_data.xml' to the shared directory. 2019-12-30 16:36:49 (12132): Create VM. (boinc_956c7a0da2235ae1, slot#15) 2019-12-30 16:36:50 (12132): Setting Memory Size for VM. (2241MB) 2019-12-30 16:36:50 (12132): Setting CPU Count for VM. (4) 2019-12-30 16:36:50 (12132): Setting Chipset Options for VM. 2019-12-30 16:36:51 (12132): Setting Boot Options for VM. 2019-12-30 16:36:51 (12132): Setting Network Configuration for NAT. 2019-12-30 16:36:51 (12132): Enabling VM Network Access. 2019-12-30 16:36:52 (12132): Disabling USB Support for VM. 2019-12-30 16:36:52 (12132): Disabling COM Port Support for VM. 2019-12-30 16:36:52 (12132): Disabling LPT Port Support for VM. 2019-12-30 16:36:53 (12132): Disabling Audio Support for VM. 2019-12-30 16:36:53 (12132): Disabling Clipboard Support for VM. 2019-12-30 16:36:54 (12132): Disabling Drag and Drop Support for VM. 2019-12-30 16:36:54 (12132): Adding storage controller(s) to VM. 2019-12-30 16:36:54 (12132): Adding virtual disk drive to VM. (vm_image.vdi) 2019-12-30 16:36:55 (12132): Adding VirtualBox Guest Additions to VM. 2019-12-30 16:36:55 (12132): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB) 2019-12-30 16:36:55 (12132): forwarding host port 49906 to guest port 80 2019-12-30 16:36:56 (12132): Enabling remote desktop for VM. 2019-12-30 16:36:56 (12132): Enabling shared directory for VM. 2019-12-30 16:36:57 (12132): Starting VM using VBoxManage interface. (boinc_956c7a0da2235ae1, slot#15) 2019-12-30 16:37:02 (12132): Successfully started VM. (PID = '11112') 2019-12-30 16:37:02 (12132): Reporting VM Process ID to BOINC. 2019-12-30 16:37:02 (12132): Guest Log: BIOS: VirtualBox 6.1.0 2019-12-30 16:37:02 (12132): Guest Log: CPUID EDX: 0x178bfbff 2019-12-30 16:37:02 (12132): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63 2019-12-30 16:37:02 (12132): VM state change detected. (old = 'PoweredOff', new = 'Running') 2019-12-30 16:37:02 (12132): Detected: Web Application Enabled (http://localhost:49906) 2019-12-30 16:37:02 (12132): Detected: Remote Desktop Enabled (localhost:49907) 2019-12-30 16:37:02 (12132): Preference change detected 2019-12-30 16:37:02 (12132): Setting CPU throttle for VM. (100%) 2019-12-30 16:37:02 (12132): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 180 seconds) or (Vbox_job.xml: 900 seconds)) 2019-12-30 16:37:04 (12132): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032 2019-12-30 16:37:04 (12132): Guest Log: BIOS: Booting from Hard Disk... 2019-12-30 16:37:07 (12132): Guest Log: BIOS: KBD: unsupported int 16h function 03 2019-12-30 16:37:07 (12132): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=81 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=81 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=82 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=82 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=83 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=83 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=84 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=84 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=85 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=85 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=86 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=86 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=87 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=87 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=88 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=88 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=89 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=89 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8a 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8a 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8b 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8b 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8c 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8c 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8d 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8d 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8e 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8e 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8f 2019-12-30 16:37:07 (12132): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8f 2019-12-30 16:37:15 (12132): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds 2019-12-30 16:37:15 (12132): Guest Log: vboxguest: misc device minor 58, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) 2019-12-30 16:37:19 (12132): Guest Log: Checking CVMFS... 2019-12-30 16:37:27 (12132): Guest Log: CVMFS is ok 2019-12-30 16:37:27 (12132): Guest Log: Mounting shared directory 2019-12-30 16:37:27 (12132): Guest Log: Copying input files 2019-12-30 16:37:30 (12132): Guest Log: VBoxService 5.2.32 r132073 (verbosity: 0) linux.amd64 (Jul 12 2019 10:32:28) release log 2019-12-30 16:37:30 (12132): Guest Log: 00:00:00.001115 main Log opened 2019-12-30T16:37:28.657561000Z 2019-12-30 16:37:30 (12132): Guest Log: 00:00:00.001292 main OS Product: Linux 2019-12-30 16:37:30 (12132): Guest Log: 00:00:00.001332 main OS Release: 3.10.0-957.27.2.el7.x86_64 2019-12-30 16:37:30 (12132): Guest Log: 00:00:00.001368 main OS Version: #1 SMP Mon Jul 29 17:46:05 UTC 2019 2019-12-30 16:37:30 (12132): Guest Log: 00:00:00.001407 main Executable: /opt/VBoxGuestAdditions-5.2.32/sbin/VBoxService 2019-12-30 16:37:30 (12132): Guest Log: 00:00:00.001409 main Process ID: 1730 2019-12-30 16:37:30 (12132): Guest Log: 00:00:00.001409 main Package type: LINUX_64BITS_GENERIC 2019-12-30 16:37:30 (12132): Guest Log: 00:00:00.002121 main 5.2.32 r132073 started. Verbose level = 0 2019-12-30 16:37:32 (12132): Guest Log: Copied input files into RunAtlas. 2019-12-30 16:37:33 (12132): Guest Log: copied the webapp to /var/www 2019-12-30 16:37:34 (12132): Guest Log: This vm does not need to setup an http proxy 2019-12-30 16:37:34 (12132): Guest Log: ATHENA_PROC_NUMBER=4 2019-12-30 16:37:34 (12132): Guest Log: *** Starting ATLAS job. (PandaID=4592658421 taskID=20172820) *** 2019-12-30 16:37:40 (12132): Guest Log: 00:00:10.004789 timesync vgsvcTimeSyncWorker: Radical guest time change: -3 588 680 980 000ns (GuestNow=1 577 720 259 979 102 000 ns GuestLast=1 577 723 848 660 082 000 ns fSetTimeLastLoop=true ) 2019-12-30 18:17:51 (12132): Status Report: Elapsed Time: '6000.000000' 2019-12-30 18:17:51 (12132): Status Report: CPU Time: '11427.375000' 2019-12-30 19:58:48 (12132): Status Report: Elapsed Time: '12000.000000' 2019-12-30 19:58:48 (12132): Status Report: CPU Time: '23327.625000' 2019-12-30 21:39:03 (12132): Status Report: Elapsed Time: '18000.000000' 2019-12-30 21:39:03 (12132): Status Report: CPU Time: '26047.875000' 2019-12-30 23:19:11 (12132): Status Report: Elapsed Time: '24000.000000' 2019-12-30 23:19:11 (12132): Status Report: CPU Time: '26094.828125' 2019-12-31 00:59:16 (12132): Status Report: Elapsed Time: '30000.000000' 2019-12-31 00:59:16 (12132): Status Report: CPU Time: '26141.718750' 2019-12-31 02:39:23 (12132): Status Report: Elapsed Time: '36000.000000' 2019-12-31 02:39:23 (12132): Status Report: CPU Time: '26186.859375' 2019-12-31 04:19:32 (12132): Status Report: Elapsed Time: '42000.000000' 2019-12-31 04:19:32 (12132): Status Report: CPU Time: '26202.296875' 2019-12-31 05:59:42 (12132): Status Report: Elapsed Time: '48000.000000' 2019-12-31 05:59:42 (12132): Status Report: CPU Time: '26217.953125' 2019-12-31 07:39:51 (12132): Status Report: Elapsed Time: '54000.000000' 2019-12-31 07:39:51 (12132): Status Report: CPU Time: '26232.343750' 2019-12-31 09:20:00 (12132): Status Report: Elapsed Time: '60000.000000' 2019-12-31 09:20:00 (12132): Status Report: CPU Time: '26246.859375' 2019-12-31 11:00:10 (12132): Status Report: Elapsed Time: '66000.000000' 2019-12-31 11:00:10 (12132): Status Report: CPU Time: '26261.671875' 2019-12-31 12:40:18 (12132): Status Report: Elapsed Time: '72000.000000' 2019-12-31 12:40:18 (12132): Status Report: CPU Time: '26295.250000' 2019-12-31 14:20:24 (12132): Status Report: Elapsed Time: '78000.000000' 2019-12-31 14:20:24 (12132): Status Report: CPU Time: '26340.484375' 2019-12-31 16:00:30 (12132): Status Report: Elapsed Time: '84000.508299' 2019-12-31 16:00:30 (12132): Status Report: CPU Time: '26386.843750' 2019-12-31 17:40:40 (12132): Status Report: Elapsed Time: '90000.508299' 2019-12-31 17:40:40 (12132): Status Report: CPU Time: '26435.671875' 2019-12-31 19:20:47 (12132): Status Report: Elapsed Time: '96000.508299' 2019-12-31 19:20:47 (12132): Status Report: CPU Time: '26483.359375' 2019-12-31 21:00:54 (12132): Status Report: Elapsed Time: '102000.508299' 2019-12-31 21:00:54 (12132): Status Report: CPU Time: '26522.484375' 2019-12-31 22:41:06 (12132): Status Report: Elapsed Time: '108000.508299' 2019-12-31 22:41:06 (12132): Status Report: CPU Time: '26538.937500' 2020-01-01 00:21:17 (12132): Status Report: Elapsed Time: '114000.508299' 2020-01-01 00:21:17 (12132): Status Report: CPU Time: '26554.515625' 2020-01-01 02:01:28 (12132): Status Report: Elapsed Time: '120000.508299' 2020-01-01 02:01:28 (12132): Status Report: CPU Time: '26571.078125' 2020-01-01 03:41:39 (12132): Status Report: Elapsed Time: '126000.508299' 2020-01-01 03:41:39 (12132): Status Report: CPU Time: '26587.171875' 2020-01-01 05:21:48 (12132): Status Report: Elapsed Time: '132000.508299' 2020-01-01 05:21:48 (12132): Status Report: CPU Time: '26601.343750' 2020-01-01 07:01:54 (12132): Status Report: Elapsed Time: '138000.508299' 2020-01-01 07:01:54 (12132): Status Report: CPU Time: '26638.953125' 2020-01-01 08:41:58 (12132): Status Report: Elapsed Time: '144000.508299' 2020-01-01 08:41:58 (12132): Status Report: CPU Time: '26681.843750' 2020-01-01 10:22:03 (12132): Status Report: Elapsed Time: '150000.508299' 2020-01-01 10:22:03 (12132): Status Report: CPU Time: '26727.750000' 2020-01-01 12:02:11 (12132): Status Report: Elapsed Time: '156000.508299' 2020-01-01 12:02:11 (12132): Status Report: CPU Time: '26776.796875'[/url] |
Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0 ![]() ![]() |
Your task have most likely stalled out. Avoid other processes like FAH, the load you would see is not correct measurement when several other process is running. Cpu could easy suffer if doesn't get core/treads that is reserved. This include I/O on disk and ram. To startup a vm machine it would need higher then set to boinc as a process on start/stop/save would not be counted and be system load. On specific task boinc only get 2241MB far to low as old atlas require 2600MB for 1 core and new application recommend 3000MB for one core. Task in this case suffer on start to boot and get script running. you would not see any proccess of ATLAS.py running as can't get any memory to even start. Those few sec are probably the attempt on start atlas and stalled out. 2019-12-30 16:36:50 (12132): Setting Memory Size for VM. (2241MB) 2019-12-30 16:36:50 (12132): Setting CPU Count for VM. (4) Each core added would need somewhere around 800MB-1000MB each. If you would like to use app_config i suggest to not include ram setting and let application pick what it would need. Changes to ram would only work to new downloaded task and if you update boinc manager it would only change corecount. Virtualbox have it's own issues and error and mix with LHC it could be hard to catch what problem could be but LHC have put great log and extension could pull out a lot of good info. The suggestion that 4 core app_config is good and i got better experience running on 4 then default 12 on virtualbox. I got less (error 195) using app_config.xml. But running on virtualbox i had to use 8000MB as minimum for 4 core task to new application to have it somewhat stable running. So running default on 12 core with 11000MB or what it would require default is probably better for most users as ramusage would be lower and less load on disk and cpu and ram. Task process in boinc-manager is just wrapper fetch info on vm machine any estimated time is only based on device flops what your cpu could/should or be able to do in time. If flops calculation is of target what cpu is estimated could be days off. Never ever trust estimated on first batch of task your boinc manager download or when you make changes to app_config.xml. If like any info estimated time it would be one in console of each vm machine task. it would provide a much better but not perfect time errors and load. |
Send message Joined: 28 Dec 08 Posts: 346 Credit: 5,415,700 RAC: 6,107 ![]() ![]() ![]() |
Your task have most likely stalled out. Avoid other processes like FAH, the load you would see is not correct measurement when several other process is running. Cpu could easy suffer if doesn't get core/treads that is reserved. I think I found the problem. The amount of memory allocated in the VBOX is less than what it needs. Even VBOX points this out. Which is weird as everything is automatic. But I'll boost it higher and see if that helps. I don't knwo how much OC plays in these problems. I can run BOINC at 40.75 without crashing anything. My temps stay within in their max limits. I lowered it down to 40.50 now to see if that helps anything, the next time ATLAS comes to my system and starts running. GRRR - Something weird happened. System went down with a IRQL Not Less or Equal error and I lost one task that was slowing down again in the 95% range and a task that was in waiting. I'll have to check drivers on the system. It is always something with ATLAS. I can rarely complete a task on this project despite having a fully capable system. IT is becoming annoying as hell! |
Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0 ![]() ![]() |
If that task had old setting of low ram it would be doomed to fail from start. Get settings right and app_config and after that allow new task. Only new task would be able to be valid. If like to reach any stable environment i suggest to run on linux and able to do native instead. |
Send message Joined: 28 Dec 08 Posts: 346 Credit: 5,415,700 RAC: 6,107 ![]() ![]() ![]() |
If that task had old setting of low ram it would be doomed to fail from start. Get settings right and app_config and after that allow new task. Only new task would be able to be valid. Remember though, I said VBOX memory allocation was low. Windows memory is more than enough, could be tweaked to be a little bit higher, but VBOX assigned the Windows memory allocation. When I poked around in VBOX and found the specific settings for the task, there was a message on the screen saying the memory was to low. Again VBOX not Windows. So how do you push VBOX to assign more memory automatically? |
Send message Joined: 28 Dec 08 Posts: 346 Credit: 5,415,700 RAC: 6,107 ![]() ![]() ![]() |
For now, I will try this <app_config> <project_max_concurrent>2</project_max_concurrent> <app_version> <app_name>ATLAS</app_name> <version_num>100</version_num> <platform>windows_x86_64</platform> <avg_ncpus>4.000000</avg_ncpus> <max_ncpus>4.000000</max_ncpus> <plan_class>vbox64_mt_mcore_atlas</plan_class> <api_version>7.7.0</api_version> <cmdline>--memory_size_mb 7500</cmdline> <dont_throttle/> <is_wrapper/> <needs_network/> </app_version> </app_config> According to the person posting this, it has been tried and tested and is ok. |
![]() Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,887,455 RAC: 54,539 ![]() ![]() |
I don't see any post from an experienced user within the last year that suggested an app_config.xml like this: <app_config> <project_max_concurrent>2</project_max_concurrent> <app_version> <app_name>ATLAS</app_name> <version_num>100</version_num> <platform>windows_x86_64</platform> <avg_ncpus>4.000000</avg_ncpus> <max_ncpus>4.000000</max_ncpus> <plan_class>vbox64_mt_mcore_atlas</plan_class> <api_version>7.7.0</api_version> <cmdline>--memory_size_mb 7500</cmdline> <dont_throttle/> <is_wrapper/> <needs_network/> </app_version> </app_config> This file includes lots of tags that will simply be ignored by your BOINC client, e.g. max_ncpus, is_wrapper,... . I'm curious where you got this from. Could you post a link to the source? Beside that lots of your other posts mention suggestions from "this guy" or "that guy" that are all "tested" but none of them can be checked as links to the sources are always missing. This makes it nearly impossible to get an impression where you got lost. And, yes, you obviously got lost as some of the recent logs do show: https://lhcathome.cern.ch/lhcathome/result.php?resultid=257084193 2020-01-01 16:11:48 (16604): Setting Memory Size for VM. (2241MB) 2020-01-01 16:11:48 (16604): Setting CPU Count for VM. (4) ATLAS vbox requires at least 3900 MB RAM for a 1-core setup. An n-core setup requires RAM according to this formula: 3000 + 900 * n_cores => a 4-core setup requires 6600 MB. VBoxManage.exe: error: AMD-V is disabled in the BIOS (or by the host OS) (VERR_SVM_DISABLED) AMD-V (would be VT-x on intel) must be enabled in your BIOS as mentioned a couple of times. https://lhcathome.cern.ch/lhcathome/result.php?resultid=257045774 Another VirtualBox management application has locked the session for this VM. BOINC cannot properly monitor this VM and so this job will be aborted. Looks like HyperV is running concurrently with VirtualBox. HyperV must be disabled if you plan to use VirtualBox. NOTE: VM session lock error encountered. BOINC will be notified that it needs to clean up the environment. This might be a temporary problem and so this job will be rescheduled for another time. Looks like some crashes left garbage on the disk. Restart your computer without BOINC and use your VirtualBox GUI to remove unaccessible VMs before you restart BOINC. greg_be wrote: I don't knwo how much OC plays in these problems. I can run BOINC at 40.75 without crashing anything. My temps stay within in their max limits. I lowered it down to 40.50 now to see if that helps anything, the next time ATLAS comes to my system and starts running. Try to get a stable system at stock settings. |
Send message Joined: 28 Dec 08 Posts: 346 Credit: 5,415,700 RAC: 6,107 ![]() ![]() ![]() |
VBOX takes 10,200 MB for 4 cores at stock settings plus another 5,5xx in Virtual memory (sorry don't recall precise figures for this) AMD-V IS enabled. Not sure how it got disabled. Lots of weird stuff happening lately. Checked Hyper V settings, not enabled. Again, don't know how that happened. It's not something I mess with. The app config was a 2017 post here on LHC:https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4137#29016 I found this via a google search. |
Send message Joined: 28 Dec 08 Posts: 346 Credit: 5,415,700 RAC: 6,107 ![]() ![]() ![]() |
I had this given to me by user Saigon (Has been with LHC since 2012) in one of my other threads about slow tasks. So what would I use from the script below to boost memory? <app_config> <app> <name>ATLAS</name> <max_concurrent>3</max_concurrent> <---- I set this to 1 usually or no more than 2 so I don't have them hanging around in the queue. I just want to run and finish 1 or 2 at a time. </app> <app_version> <app_name>ATLAS</app_name> <plan_class>vbox64_mt_mcore_atlas</plan_class> <avg_ncpus>4.0</avg_ncpus> <cmdline>--nthreads 4</cmdline> </app_version> </app_config> |
Send message Joined: 8 Apr 12 Posts: 1 Credit: 333,794 RAC: 0 ![]() ![]() |
I had the same problem with task: ATLAS Simulation 2.00 (vbox64_mt_mcore_atlas) J0wMDmFPy7vn9Rq4apoT9bVoABFKDmABFKDmvNPXDmABFKDml6WuHm It was a 8 CPU job and ran for 14 hours and seemed to be asymptotically approaching 100%. I aborted it in the end. It also appeared to be under-utilising the processors that it had "reserved", which is unfortunate for all the other tasks. For the time being I've changed my LHC preferences to Max # CPUs = 4. In future I'll keep a closer eye on jobs and abort them if they run 25-50% over their initial estimation. |
Send message Joined: 28 Dec 08 Posts: 346 Credit: 5,415,700 RAC: 6,107 ![]() ![]() ![]() |
I had the same problem with task: For me they run fine (but show only 4-6 seconds actual CPU time) until they reach the 90% range, then they slow down. At around 97% they slow down to .00010% every 2-4 seconds. Sometimes a little higher. I give it a day to day and a half to overcome that issue, but usually its useless. So around 98 or 99.xxx% I have to abort the task because it will not finish. If I can get an answer back on where to put that memory boost into the script, or I will try it myself I can only hope that solves the issue. Seems like VBOX is under allocating memory for the tasks. I don't touch any of the settings and you see what I have reported and how that works out. |
Send message Joined: 2 May 07 Posts: 2277 Credit: 178,709,076 RAC: 100,489 ![]() ![]() |
When you see for the Task no cpu-time growing after about 10 min.(Initialisation phase), you can delete the task. Had over the last few days two or three tasks with no CPU-time after 10 min. Duration-Time. But, it can be that you have a other problem too. |
![]() Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,887,455 RAC: 54,539 ![]() ![]() |
greg_be wrote: I don't knwo how much OC plays in these problems. I can run BOINC at 40.75 without crashing anything. My temps stay within in their max limits. I lowered it down to 40.50 now to see if that helps anything, the next time ATLAS comes to my system and starts running. AMD lists the Ryzen 2700 with a base clock of 3.2 GHz and a max boost clock of 4.1 GHz (single core, bursty single-threaded workload!). https://www.amd.com/en/products/cpu/amd-ryzen-7-2700 A clock setting above 4 GHZ seems to be far too high. You may try to get stable results at AMD's base clock (3.2 GHz) before you try OC again. That's what I meant with stock settings. |
Send message Joined: 28 Dec 08 Posts: 346 Credit: 5,415,700 RAC: 6,107 ![]() ![]() ![]() |
greg_be wrote:I don't knwo how much OC plays in these problems. I can run BOINC at 40.75 without crashing anything. My temps stay within in their max limits. I lowered it down to 40.50 now to see if that helps anything, the next time ATLAS comes to my system and starts running. Ive done a little digging and it can handle long term higher end OC. Everyone talks gaming of course, so its hard to compare that to here. But I will trim the frequency down to 4.0. It's interesting that some pages talk max 4.1 (but not for me, I freeze up at 4.1) and some others have ran as high as 4.2. I know BOINC tasks can be touchy about what frequency you use. So 4.0 for now and see how that does. |
Send message Joined: 28 Dec 08 Posts: 346 Credit: 5,415,700 RAC: 6,107 ![]() ![]() ![]() |
When you see for the Task no cpu-time growing after about 10 min.(Initialisation phase), With me..in the first 5 mins it shows 4-6 seconds of CPU time and that's it for the entire task. I think one time I got about 11 seconds for the whole 16+ hrs it ran. But the completion rate keeps ticking up nicely until 70% and then slows down a bit. At 90% it bogs down really good. 95 and up its almost dead. 99.xxx its dead or crawling at .00010 percent every 4 seconds. So where should I abort it at? after the CPU seconds stop? What is it doing when its not reporting CPU cycles but still shows an increase in percent done? |
Send message Joined: 28 Dec 08 Posts: 346 Credit: 5,415,700 RAC: 6,107 ![]() ![]() ![]() |
Here is my idea of how to boost memory and constrain the task to 4 cpu's. This is combining the memory boost section of a 2007 post with a message to me from a user that has been on here since 2012. <app_config> <app> <name>ATLAS</name> <max_concurrent></max_concurrent> </app> <app_version> <app_name>ATLAS</app_name> <plan_class>vbox64_mt_mcore_atlas</plan_class> <avg_ncpus>4.0</avg_ncpus> <cmdline>--nthreads 4</cmdline> <cmdline>--memory_size_mb 7500</cmdline> (though I have seen via BOINC Tasks that it wants 10,200MB of memory) </app_version> </app_config> What corrections are needed or what are your thoughts on this app_config script? Remember though from what I saw in VBOX, VBOX is automatically only taking 2240 when the machine is engaged. So do I have to goto VBOX and on each ATLAS machine, boost the memory manually or what? What is the interaction of VBOX memory allocation and BOINC allocation? |
Send message Joined: 2 May 07 Posts: 2277 Credit: 178,709,076 RAC: 100,489 ![]() ![]() |
My Board is a ASUS X-370 with a Ryzen 2700. More than 3.4 GHz is not useful. ASUS-AI Suite 3 does the Clock-setting itsself. Ryzen-Master is also a feature to see what is possible. But Higher than 3.4 GHz my Ryzen is not able to run well. More than 1 year experience with it. |
Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,859,193 RAC: 2,531 ![]() ![]() |
<app_config>Your max_concurrent is empty. <app_config> <project_max_concurrent>8</project_max_concurrent> <app> <name>ATLAS</name> <max_concurrent>1</max_concurrent> </app> <app_version> <app_name>ATLAS</app_name> <plan_class>vbox64_mt_mcore_atlas</plan_class> <avg_ncpus>4.000000</avg_ncpus> <cmdline>--memory_size_mb 6600</cmdline> </app_version> </app_config> After a change of the app_config.xml you have to read the config files - BoincTasks - Menu Extra - Read config files. The change only effects new loaded tasks. Remember though from what I saw in VBOX, VBOX is automatically only taking 2240 when the machine is engaged. So do I have to goto VBOX and on each ATLAS machine, boost the memory manually or what? What is the interaction of VBOX memory allocation and BOINC allocation?BOINC's memory allocation is coming from your preferences When you saw 10200, that's because you have set in your preferences Max # CPUs to 8. Set that to 4 too. |
Send message Joined: 28 Dec 08 Posts: 346 Credit: 5,415,700 RAC: 6,107 ![]() ![]() ![]() |
<app_config>Your max_concurrent is empty. Thanks for the corrections. I did miss refilling the max concurrent to 1. I was trying to force just ATLAS into 4 CPU's but leave it wide open to Theory and all the rest, so I figured set the restriction to ATLAS in app_config and leave the memory alone. But...I still cant figure out why VBOX is coming in so low on memory...2240 total if I remember correctly. |
Send message Joined: 28 Dec 08 Posts: 346 Credit: 5,415,700 RAC: 6,107 ![]() ![]() ![]() |
Crystal Pellet and the rest, Thank you for your help. It looks like with your assistance I finally can run ATLAS with no problems. Overnight three tasks ran and all completed ok. |
©2025 CERN