Message boards : ATLAS application : Aborted task possible instablility due to OC speed?
Message board moderation

To post messages, you must log in.

AuthorMessage
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 44635 - Posted: 31 Mar 2021, 19:11:49 UTC

I am not sure what is going on here.
I keep crashing ATLAS tasks.
They run a time of 2+ days and slow down at the end to .10% of a CPU max according to BoincTasks and they show a 99.99999% completion.

A question before I post the log. Does OC speed affect the way ATLAS performs? I am running 3.975 GHZ. Rosetta is stable at this speed, BOINC Mgr is stable at this speed, but my ATLAS apps keep crashing or stalling.

Dropped down another small notch to 3.950 GHZ now and am on a new task. VB is clean and now updated.

Here is the log before I aborted the task:
2021-03-29 00:43:15 (15440): Detected: vboxwrapper 26197
2021-03-29 00:43:15 (15440): Detected: BOINC client v7.7
2021-03-29 00:43:16 (15440): Detected: VirtualBox VboxManage Interface (Version: 6.1.16)
2021-03-29 00:43:16 (15440): Successfully copied 'init_data.xml' to the shared directory.
2021-03-29 00:43:17 (15440): Create VM. (boinc_b82366028fb6350b, slot#28)
2021-03-29 00:43:17 (15440): Setting Memory Size for VM. (6600MB)
2021-03-29 00:43:17 (15440): Setting CPU Count for VM. (4)
2021-03-29 00:43:18 (15440): Setting Chipset Options for VM.
2021-03-29 00:43:18 (15440): Setting Boot Options for VM.
2021-03-29 00:43:18 (15440): Setting Network Configuration for NAT.
2021-03-29 00:43:19 (15440): Enabling VM Network Access.
2021-03-29 00:43:19 (15440): Disabling USB Support for VM.
2021-03-29 00:43:19 (15440): Disabling COM Port Support for VM.
2021-03-29 00:43:20 (15440): Disabling LPT Port Support for VM.
2021-03-29 00:43:20 (15440): Disabling Audio Support for VM.
2021-03-29 00:43:20 (15440): Disabling Clipboard Support for VM.
2021-03-29 00:43:20 (15440): Disabling Drag and Drop Support for VM.
2021-03-29 00:43:21 (15440): Adding storage controller(s) to VM.
2021-03-29 00:43:21 (15440): Adding virtual disk drive to VM. (vm_image.vdi)
2021-03-29 00:43:54 (15440): Error in storage attach (fixed disk) for VM: -2135228409
Command:
VBoxManage -q storageattach "boinc_b82366028fb6350b" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --setuuid "" --medium "C:\boinc data\slots\28/vm_image.vdi"
Output:
VBoxManage.exe: error: Medium 'C:\boinc data\slots\28\vm_image.vdi' is not accessible. UUID {5e2342bb-76a4-44ff-81f6-2f3283cde68f} of the medium 'C:\boinc data\slots\28\vm_image.vdi' does not match the value {7911afc1-747b-47db-bde8-c92ba3e03ea5} stored in the media registry ('C:\Users\Greg\.VirtualBox\VirtualBox.xml')
VBoxManage.exe: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component MediumWrap, interface IMedium, callee IUnknown
VBoxManage.exe: error: Context: "SetIds(fSetNewUuid, bstrNewUuid.raw(), fSetNewParentUuid, bstrNewParentUuid.raw())" at line 694 of file VBoxManageStorageController.cpp
VBoxManage.exe: error: Failed to set the medium/parent medium UUID

Notes:

Another VirtualBox management application has locked the session for
this VM. BOINC cannot properly monitor this VM
and so this job will be aborted.


2021-03-29 00:43:54 (15440): Could not create VM
2021-03-29 00:43:54 (15440): ERROR: VM failed to start
2021-03-29 00:43:59 (15440):
NOTE: VM session lock error encountered.
BOINC will be notified that it needs to clean up the environment.
This might be a temporary problem and so this job will be rescheduled for another time.

2021-03-29 08:35:23 (17348): Detected: vboxwrapper 26197
2021-03-29 08:35:23 (17348): Detected: BOINC client v7.7
2021-03-29 08:35:25 (17348): Detected: VirtualBox VboxManage Interface (Version: 6.1.16)
2021-03-29 08:35:25 (17348): Starting VM using VBoxManage interface. (boinc_b82366028fb6350b, slot#28)
2021-03-29 08:35:31 (17348): Successfully started VM. (PID = '18460')
2021-03-29 08:35:31 (17348): Reporting VM Process ID to BOINC.
2021-03-29 08:35:31 (17348): Guest Log: BIOS: VirtualBox 6.1.16

2021-03-29 08:35:31 (17348): Guest Log: CPUID EDX: 0x178bfbff

2021-03-29 08:35:31 (17348): VM state change detected. (old = 'PoweredOff', new = 'Running')
2021-03-29 08:35:31 (17348): Preference change detected
2021-03-29 08:35:31 (17348): Setting CPU throttle for VM. (100%)
2021-03-29 08:35:31 (17348): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 180 seconds) or (Vbox_job.xml: 900 seconds))
2021-03-29 08:35:33 (17348): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032

2021-03-29 08:35:33 (17348): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=80

2021-03-29 08:35:33 (17348): Guest Log: BIOS: Boot from Hard Disk 0 failed

2021-03-29 08:35:33 (17348): Guest Log: BIOS: Boot : bseqnr=2, bootseq=0003

2021-03-29 08:35:33 (17348): Guest Log: BIOS: CDROM boot failure code : 0002

2021-03-29 08:35:33 (17348): Guest Log: BIOS: Boot from CD-ROM failed

2021-03-29 08:35:33 (17348): Guest Log: Could not read from the boot medium! System halted.

2021-03-29 10:15:55 (17348): Status Report: Elapsed Time: '6000.000000'
2021-03-29 10:15:55 (17348): Status Report: CPU Time: '5.687500'
2021-03-29 11:56:12 (17348): Status Report: Elapsed Time: '12000.000000'
2021-03-29 11:56:12 (17348): Status Report: CPU Time: '8.078125'
2021-03-29 13:36:27 (17348): Status Report: Elapsed Time: '18000.000000'
2021-03-29 13:36:27 (17348): Status Report: CPU Time: '10.390625'
2021-03-29 15:16:39 (17348): Status Report: Elapsed Time: '24000.000000'
2021-03-29 15:16:39 (17348): Status Report: CPU Time: '11.187500'
2021-03-29 16:56:44 (17348): Status Report: Elapsed Time: '30000.000000'
2021-03-29 16:56:44 (17348): Status Report: CPU Time: '11.265625'
2021-03-29 18:36:53 (17348): Status Report: Elapsed Time: '36000.000000'
2021-03-29 18:36:53 (17348): Status Report: CPU Time: '11.281250'
2021-03-29 20:18:12 (17348): Status Report: Elapsed Time: '42000.000000'
2021-03-29 20:18:12 (17348): Status Report: CPU Time: '13.046875'
2021-03-29 21:59:21 (17348): Status Report: Elapsed Time: '48000.000000'
2021-03-29 21:59:21 (17348): Status Report: CPU Time: '14.828125'
2021-03-29 23:40:34 (17348): Status Report: Elapsed Time: '54000.000000'
2021-03-29 23:40:34 (17348): Status Report: CPU Time: '18.343750'
2021-03-30 01:22:01 (17348): Status Report: Elapsed Time: '60000.090588'
2021-03-30 01:22:01 (17348): Status Report: CPU Time: '21.984375'
2021-03-30 03:03:21 (17348): Status Report: Elapsed Time: '66000.090588'
2021-03-30 03:03:21 (17348): Status Report: CPU Time: '26.203125'
2021-03-30 04:44:42 (17348): Status Report: Elapsed Time: '72000.090588'
2021-03-30 04:44:42 (17348): Status Report: CPU Time: '29.390625'
2021-03-30 06:26:12 (17348): Status Report: Elapsed Time: '78000.090588'
2021-03-30 06:26:12 (17348): Status Report: CPU Time: '31.843750'
2021-03-30 08:13:24 (17348): Status Report: Elapsed Time: '84004.171682'
2021-03-30 08:13:24 (17348): Status Report: CPU Time: '35.359375'
2021-03-30 10:00:40 (17348): Status Report: Elapsed Time: '90004.897005'
2021-03-30 10:00:55 (17348): Status Report: CPU Time: '38.546875'
2021-03-30 11:48:40 (17348): Status Report: Elapsed Time: '96005.456440'
2021-03-30 11:48:40 (17348): Status Report: CPU Time: '42.000000'
2021-03-30 13:53:41 (17348): Status Report: Elapsed Time: '102006.129061'
2021-03-30 13:53:41 (17348): Status Report: CPU Time: '49.828125'
2021-03-30 15:53:22 (17348): Status Report: Elapsed Time: '108006.592880'
2021-03-30 15:53:22 (17348): Status Report: CPU Time: '58.703125'
2021-03-30 17:52:11 (17348): Status Report: Elapsed Time: '114006.770287'
2021-03-30 17:52:11 (17348): Status Report: CPU Time: '64.343750'
2021-03-30 19:50:43 (17348): Status Report: Elapsed Time: '120007.376806'
2021-03-30 19:50:43 (17348): Status Report: CPU Time: '68.984375'
2021-03-30 21:34:41 (17348): Status Report: Elapsed Time: '126007.376806'
2021-03-30 21:34:41 (17348): Status Report: CPU Time: '71.921875'
2021-03-30 23:15:55 (17348): Status Report: Elapsed Time: '132007.376806'
2021-03-30 23:15:55 (17348): Status Report: CPU Time: '74.328125'
2021-03-31 00:57:11 (17348): Status Report: Elapsed Time: '138007.376806'
2021-03-31 00:57:11 (17348): Status Report: CPU Time: '76.906250'
2021-03-31 02:38:26 (17348): Status Report: Elapsed Time: '144007.376806'
2021-03-31 02:38:26 (17348): Status Report: CPU Time: '79.546875'
2021-03-31 04:19:44 (17348): Status Report: Elapsed Time: '150007.376806'
2021-03-31 04:19:44 (17348): Status Report: CPU Time: '80.734375'
2021-03-31 06:01:41 (17348): Status Report: Elapsed Time: '156007.376806'
2021-03-31 06:01:41 (17348): Status Report: CPU Time: '81.343750'
2021-03-31 07:43:30 (17348): Status Report: Elapsed Time: '162007.376806'
2021-03-31 07:43:30 (17348): Status Report: CPU Time: '84.828125'
2021-03-31 09:25:18 (17348): Status Report: Elapsed Time: '168007.376806'
2021-03-31 09:25:18 (17348): Status Report: CPU Time: '86.703125'
2021-03-31 11:07:18 (17348): Status Report: Elapsed Time: '174007.859615'
2021-03-31 11:07:18 (17348): Status Report: CPU Time: '90.921875'
2021-03-31 12:49:18 (17348): Status Report: Elapsed Time: '180008.367327'
2021-03-31 12:49:18 (17348): Status Report: CPU Time: '93.312500'
2021-03-31 14:32:52 (17348): Status Report: Elapsed Time: '186008.413286'
2021-03-31 14:32:52 (17348): Status Report: CPU Time: '93.937500'
2021-03-31 16:15:18 (17348): Status Report: Elapsed Time: '192009.459303'
2021-03-31 16:15:18 (17348): Status Report: CPU Time: '95.343750'
2021-03-31 18:13:24 (17348): Status Report: Elapsed Time: '198010.243620'
2021-03-31 18:13:24 (17348): Status Report: CPU Time: '99.656250'
2021-03-31 20:22:42 (17348): Status Report: Elapsed Time: '204010.543724'
2021-03-31 20:22:42 (17348): Status Report: CPU Time: '103.750000'
ID: 44635 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 44647 - Posted: 2 Apr 2021, 19:32:39 UTC

1) Got rid of app_config.xml. Same thing can be accomplished via webpage.
2) Updated Vbox. Previous updates had caused problems.

Now for a challenge after it ran 4 tasks on 4 cores and completed ok.
Now keeping the 4 core restriction in place and upping the tasks queue to unlimited.
See how that works for time with all my other projects.
ID: 44647 · Report as offensive     Reply Quote

Message boards : ATLAS application : Aborted task possible instablility due to OC speed?


©2024 CERN