21) Message boards : ATLAS application : Bad WUs? (Message 45835)
Posted 9 Dec 2021 by Jonathan
Post:
stderr.txt log from stuck work unit

2021-12-08 18:59:59 (25680): Detected: vboxwrapper 26202
2021-12-08 18:59:59 (25680): Detected: BOINC client v7.16.20
2021-12-08 18:59:59 (25680): Detected: VirtualBox VboxManage Interface (Version: 6.1.30)
2021-12-08 18:59:59 (25680): Successfully copied 'init_data.xml' to the shared directory.
2021-12-08 19:00:00 (25680): Create VM. (boinc_493aaf73f80ced28, slot#1)
2021-12-08 19:00:01 (25680): Setting Memory Size for VM. (6600MB)
2021-12-08 19:00:01 (25680): Setting CPU Count for VM. (4)
2021-12-08 19:00:01 (25680): Setting Chipset Options for VM.
2021-12-08 19:00:02 (25680): Setting Boot Options for VM.
2021-12-08 19:00:02 (25680): Setting Network Configuration for NAT.
2021-12-08 19:00:02 (25680): Enabling VM Network Access.
2021-12-08 19:00:03 (25680): Disabling USB Support for VM.
2021-12-08 19:00:03 (25680): Disabling COM Port Support for VM.
2021-12-08 19:00:03 (25680): Disabling LPT Port Support for VM.
2021-12-08 19:00:03 (25680): Disabling Audio Support for VM.
2021-12-08 19:00:04 (25680): Disabling Clipboard Support for VM.
2021-12-08 19:00:04 (25680): Disabling Drag and Drop Support for VM.
2021-12-08 19:00:04 (25680): Adding storage controller(s) to VM.
2021-12-08 19:00:04 (25680): Adding virtual disk drive to VM. (vm_image.vdi)
2021-12-08 19:00:05 (25680): Adding VirtualBox Guest Additions to VM.
2021-12-08 19:00:05 (25680): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
2021-12-08 19:00:05 (25680): forwarding host port 59563 to guest port 80
2021-12-08 19:00:05 (25680): Enabling remote desktop for VM.
2021-12-08 19:00:06 (25680): Enabling shared directory for VM.
2021-12-08 19:00:06 (25680): Starting VM using VBoxManage interface. (boinc_493aaf73f80ced28, slot#1)
2021-12-08 19:00:11 (25680): Successfully started VM. (PID = '1956')
2021-12-08 19:00:11 (25680): Reporting VM Process ID to BOINC.
2021-12-08 19:00:11 (25680): Guest Log: BIOS: VirtualBox 6.1.30
2021-12-08 19:00:11 (25680): Guest Log: CPUID EDX: 0x178bfbff
2021-12-08 19:00:11 (25680): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63
2021-12-08 19:00:11 (25680): VM state change detected. (old = 'poweredoff', new = 'running')
2021-12-08 19:00:11 (25680): Detected: Web Application Enabled (http://localhost:59563)
2021-12-08 19:00:11 (25680): Detected: Remote Desktop Enabled (localhost:59564)
2021-12-08 19:00:11 (25680): Preference change detected
2021-12-08 19:00:11 (25680): Setting CPU throttle for VM. (100%)
2021-12-08 19:00:11 (25680): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 900 seconds))
2021-12-08 19:00:13 (25680): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2021-12-08 19:00:13 (25680): Guest Log: BIOS: Booting from Hard Disk...
2021-12-08 19:00:15 (25680): Guest Log: BIOS: KBD: unsupported int 16h function 03
2021-12-08 19:00:15 (25680): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=81
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=81
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=82
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=82
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=83
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=83
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=84
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=84
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=85
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=85
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=86
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=86
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=87
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=87
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=88
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=88
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=89
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=89
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8a
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8a
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8b
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8b
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8c
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8c
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8d
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8d
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8e
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8e
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8f
2021-12-08 19:00:15 (25680): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8f
2021-12-08 19:00:19 (25680): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds
2021-12-08 19:00:19 (25680): Guest Log: vboxguest: misc device minor 58, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
2021-12-08 19:00:22 (25680): Guest Log: Checking CVMFS...
2021-12-08 19:00:24 (25680): Guest Log: VBoxService 5.2.32 r132073 (verbosity: 0) linux.amd64 (Jul 12 2019 10:32:28) release log
2021-12-08 19:00:24 (25680): Guest Log: 00:00:00.000155 main     Log opened 2021-12-08T19:00:22.038429000Z
2021-12-08 19:00:24 (25680): Guest Log: 00:00:00.000262 main     OS Product: Linux
2021-12-08 19:00:24 (25680): Guest Log: 00:00:00.000297 main     OS Release: 3.10.0-957.27.2.el7.x86_64
2021-12-08 19:00:24 (25680): Guest Log: 00:00:00.000325 main     OS Version: #1 SMP Mon Jul 29 17:46:05 UTC 2019
2021-12-08 19:00:24 (25680): Guest Log: 00:00:00.000353 main     Executable: /opt/VBoxGuestAdditions-5.2.32/sbin/VBoxService
2021-12-08 19:00:24 (25680): Guest Log: 00:00:00.000354 main     Process ID: 1492
2021-12-08 19:00:24 (25680): Guest Log: 00:00:00.000354 main     Package type: LINUX_64BITS_GENERIC
2021-12-08 19:00:24 (25680): Guest Log: 00:00:00.001280 main     5.2.32 r132073 started. Verbose level = 0
2021-12-08 19:00:34 (25680): Guest Log: 00:00:10.010700 timesync vgsvcTimeSyncWorker: Radical guest time change: 21 611 928 059 000ns (GuestNow=1 639 011 633 968 992 000 ns GuestLast=1 638 990 022 040 933 000 ns fSetTimeLastLoop=true )
2021-12-08 20:40:19 (25680): Status Report: Elapsed Time: '6000.000000'
2021-12-08 20:40:19 (25680): Status Report: CPU Time: '45.843750'


I looked at my successful tasks and it looks like it is hanging at the CVMFS check or something, mounting shared files and then copying files into RunAtlas.
Below is from a successful task where I think it may be failing or getting hung. Just a theory.


2021-12-08 16:23:47 (11704): Guest Log: CVMFS is ok
2021-12-08 16:23:47 (11704): Guest Log: Mounting shared directory
2021-12-08 16:23:47 (11704): Guest Log: Copying input files
2021-12-08 16:23:49 (11704): Guest Log: Copied input files into RunAtlas.
2021-12-08 16:23:50 (11704): Guest Log: copied the webapp to /var/www
2021-12-08 16:23:50 (11704): Guest Log: This VM did not configure a local http proxy via BOINC.
2021-12-08 16:23:50 (11704): Guest Log: Small home clusters do not require a local http proxy but it is suggested if
2021-12-08 16:23:50 (11704): Guest Log: more than 10 cores throughout the same LAN segment are regularly running ATLAS like tasks.
2021-12-08 16:23:50 (11704): Guest Log: Further information can be found at the LHC@home message board.
2021-12-08 16:23:50 (11704): Guest Log: Running cvmfs_config stat atlas.cern.ch
2021-12-08 16:23:50 (11704): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
2021-12-08 16:23:50 (11704): Guest Log: 2.6.3.0 1617 0 29828 97456 3 1 1491998 4096001 0 65024 0 102 98.0392 533 455 http://s1cern-cvmfs.openhtc.io/cvmfs/atlas.cern.ch DIRECT 1
2021-12-08 16:23:50 (11704): Guest Log: ATHENA_PROC_NUMBER=4
2021-12-08 16:23:51 (11704): Guest Log:  *** Starting ATLAS job. (PandaID=5285080426 taskID=27554476) ***
22) Message boards : ATLAS application : Bad WUs? (Message 45834)
Posted 9 Dec 2021 by Jonathan
Post:
Next VM started up but it isn't running anything inside. All three virtual terminals look identical. Not getting Atlas Event Process Monitoring start up on virt term 2 nor TOP on virt term 3.
Guessing something broken inside VM.

Hung work unit below. I am just going to leave it stuck for now.

boinc_493aaf73f80ced28
Application
ATLAS Simulation 2.00 (vbox64_mt_mcore_atlas)
Name
k5pLDmaoCF0nfZGDcpSWOuwoABFKDmABFKDmQ2DbDmABFKDmy3TEWm
State
Running
Received
12/8/2021 4:18:10 PM
Report deadline
12/15/2021 4:18:08 PM
Resources
4 CPUs
Estimated computation size
43,200 GFLOPs
CPU time
00:00:32
CPU time since checkpoint
00:00:00
Elapsed time
00:21:07
Estimated time remaining
01:57:53
Fraction done
15.195%
Virtual memory size
80.82 MB
Working set size
6.45 GB
Directory
slots/1
Process ID
25680
Progress rate
43.200% per hour
Executable
vboxwrapper_26198ab7_windows_x86_64.exe
23) Message boards : ATLAS application : Bad WUs? (Message 45833)
Posted 9 Dec 2021 by Jonathan
Post:
Task just completed.
I was able to run two, 4 cores tasks at once. 8 processor cores in use. I left SMT on so 8/16 in use. I only have 16Gb and it was almost all in use due to each task taking 6600Kb memory.
Second task should finish up in about 45 more minutes but I don't see any problems. I only had LHC / Atlas running. No other projects or work.

I will just let the machine continue and see if it gets any trouble work units.
24) Message boards : ATLAS application : Bad WUs? (Message 45832)
Posted 8 Dec 2021 by Jonathan
Post:
My 4 core task is behaving normally.
I think I got the wrapper changed as log shows "2021-12-08 16:22:04 (11704): Detected: vboxwrapper 26202" It's the new 26203 misreporting then number, as usual.

About 25 min in on work unit and I have all 4 athena.py running. Virtual consoles 2 and 3 look normal.
25) Questions and Answers : Windows : CMS Simulation work and all of them get stuck on random % and all of them got aborted. (Message 45786)
Posted 5 Dec 2021 by Jonathan
Post:
Use your LCH@home preferences to limit the number of work units.
Set Max # jobs to the number of true cores or less that your processor has.
Virtual Box jobs run at normal priority and not a lower priority like legacy Boinc jobs.
26) Questions and Answers : Preferences : errors (Message 45782)
Posted 4 Dec 2021 by Jonathan
Post:
I would guess a permissions or other error related to Virtual Box.
Try the latest version
https://www.virtualbox.org/wiki/Downloads
27) Questions and Answers : Unix/Linux : ATLAS seems to block my BOINC (Message 45777)
Posted 2 Dec 2021 by Jonathan
Post:
That check list is a very good idea.
I am going to take a guess that the VM created for Atlas is taking about 10Gb of RAM and the number of processors is set to 8 within the VM.

Go to your LHC preferences and set the following below. This will run one Atlas task with the minimum amount of RAM required and create a VM with a single processor. You probably have to set no new tasks for LHC, abort any current Atlas tasks, exit and restart boinc after changing your preferences.

I looked up your processor and you have 4 true cores. That is the highest you should set your Max # CPUs to.

You might need to look at a different sub project, rather than Atlas

Max # jobs 1
Max # CPUs 1
28) Message boards : ATLAS application : 100% CPU Use (Message 45765)
Posted 30 Nov 2021 by Jonathan
Post:
That sounds like the normal behavior for Virtual Box projects. You can set Max # cpus to get a lower cpu count per task and try Max # Jobs to limit running tasks. You can also control the behavior using app_config.xml method.

Virtual Box tasks run at normal priority and require a bit of time to shut down / save the running VM state. I just exit Boinc Manager and have it kill the processes. Give it a minute or two and then reboot or shut down.
29) Message boards : Theory Application : All errors on my new Ryzen (Message 44352)
Posted 21 Feb 2021 by Jonathan
Post:
Do you have any other projects that use Virtual Box and are working on that machine?
6.1 series of VirtualBox is the only one supported per the VirtualBox website.
You can try uninstalling and reinstalling or move to the 6.1 version. Try opening VirtualBox and watching the start up of the Theory task. It may give some clues also.
30) Message boards : Theory Application : Running.log only available to download (Message 44044)
Posted 1 Jan 2021 by Jonathan
Post:
Try going into Firefox settings. General Tab. Section Applications. Check to see if log or .log is listed in that area?
31) Questions and Answers : Windows : All vBox WU in error (Message 43986)
Posted 22 Dec 2020 by Jonathan
Post:
I am just tagging in that I am successfully running CMS tasks.
Four concurrent CMS tasks at a time. No other projects running. No app_config.xml for this project

I have local preferences set to Use at most 50% of the CPUS and 100% of CPU time.

Network is unrestricted

Disk is 100 GB for Boinc
Memory is 85% in use and not in use. Page/swap is 75% limit

Boinc is 7.16.11 and VirtualBox is 6.1.16

My computer https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10585495

My tasks https://lhcathome.cern.ch/lhcathome/results.php?userid=550738
32) Message boards : ATLAS application : BOINC says there are no ATLAS jobs available (Message 42314)
Posted 27 Apr 2020 by Jonathan
Post:
Check your project preferences. That error message is showing you have Theory checked and Atlas unchecked. Something may have borked.
33) Message boards : Number crunching : Not getting any tasks, though many are available (Message 42221)
Posted 18 Apr 2020 by Jonathan
Post:
Go to your LCH@home preferences. At the bottom is "Max # CPUs", set that to 1 and your Atlas tasks will use one core after you get new ones. You can also max # of jobs, if necessary.
This can be controlled app_config.xml but I had trouble with Atlas, the number of cores and the assigned memory in the past. I think you have to have the cores and memory requirement in the config. There should be posts in the Atlas forum section. I don't know what the current memory requirement formula is.
34) Message boards : Number crunching : Not getting any tasks, though many are available (Message 42217)
Posted 17 Apr 2020 by Jonathan
Post:
Maybe a 'tool tip' could be added to the preference "Run native if available?" to explain it's usage. It really is confusing for anyone not well versed in this project.
Maybe change to "Run LINUX native if available?" or give it a mouse over explanation, if possible?
35) Message boards : Number crunching : Not getting any tasks, though many are available (Message 42216)
Posted 17 Apr 2020 by Jonathan
Post:
Native is only for a Linux computer running applications directly and not using Virtual Box.
36) Questions and Answers : Windows : no response of computer when runnin boinc (Message 42214)
Posted 17 Apr 2020 by Jonathan
Post:
Did you switch to Atlas tasks at LHC or are you quitting all LHC tasks?
37) Questions and Answers : Windows : no response of computer when runnin boinc (Message 42204)
Posted 16 Apr 2020 by Jonathan
Post:
Hyper-V is Microsoft's version of a virtual machine or hypervisor.

I don't think that is the problem here as the computer is showing "Microsoft Windows 8.1
Core x64 Edition, (06.03.9600.00)"
38) Questions and Answers : Windows : no response of computer when runnin boinc (Message 42178)
Posted 15 Apr 2020 by Jonathan
Post:
You can try Google Translate for language.
https://translate.google.com

When logged into the LHC website, you click on your name on the upper right side. This takes you to Your Account. LHC@Home Preferences shows what tasks you have selected to run here. I have Theory and Atlas selected. I don't think CMS is working, right now.

Virtual Box related tasks run at a higher priority than normal BOINC task and can slow your computer, no response.

It looks like you are just getting Theory tasks. Try posting in the forum for those tasks. There may be someone that know what your error messages means.
https://lhcathome.cern.ch/lhcathome/forum_forum.php?id=89
39) Questions and Answers : Windows : no response of computer when runnin boinc (Message 42172)
Posted 14 Apr 2020 by Jonathan
Post:
'app_config.xml' is a file you create that helps control the applications in BOINC for each project. It is a bit complicated.

Did everything run okay before adding the LHC project? I looked at a few of your tasks and it may be you're running out of memory for the virtual machine tasks.
Are you only running Theory tasks at LHC?
40) Questions and Answers : Windows : no response of computer when runnin boinc (Message 42165)
Posted 14 Apr 2020 by Jonathan
Post:
What do you have set for your preferences and what projects and applications are you running?

Looks like you have 4 cores / 8 with hyperthreading and 8Gb ram.

You can try looking in the Theory forum for ideas but setting your computer preferences to use less than 100% of the number of processors would be a start. You might need to set up an 'app_config.xml' to control this project and it's applications.

Yeti's checklist for Virtual Box may help sort it out also.
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161#29359


Previous 20 · Next 20


©2024 CERN