Message boards :
ATLAS application :
ATLAS vbox v2.01
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
Hi all, We have just release a new virtualbox version of the ATLAS app, v2.01. The most significant change is the use of a new version of vboxwrapper which enables multiattach mode. In short this means there is no need to make a copy of the large vdi image file at the start of each task so tasks will start quicker. For more technical details see the GitHub issue. As always let us know if you see any issues. David |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,365,553 RAC: 119,375 |
Mo 27 Jun 2022 10:00:00 CEST | LHC@home | No tasks are available for ATLAS Simulation Did you restart all project server instances to make them aware of the new app version? <edit> Got a task. Could have been a wrong pref or local setting. </edit> |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,743,357 RAC: 19,676 |
Tasks are downloaded Win11pro including 1.07 GByte .vdi (10:30 min. with 70 MBits) vboxwrapper_26204_windows_x86_64.exe :-) https://lhcathome.cern.ch/lhcathome/results.php?hostid=10631979 |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,365,553 RAC: 119,375 |
1st task (on Linux) started fine and is now processing events - will take a while. Logfiles don't show unexpected issues. |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,743,357 RAC: 19,676 |
RDP show no running Collision. 20 sec CPU-Time - 15 Min Run-Time. Cancel it now! Verschoben:VM environment needs to be cleaned up. Have set prefs using Theory AND Atlas with Unlimited Tasks instead of 8. Flooting with Theory Tasks, OMG! |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,743,357 RAC: 19,676 |
The next Atlas show no RDP in Boinc. Have stopped testing Atlas complete and running only Theory with the old wrapper. |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,365,553 RAC: 119,375 |
These tasks succeeded: https://lhcathome.cern.ch/lhcathome/result.php?resultid=358663450 https://lhcathome.cern.ch/lhcathome/result.php?resultid=358663452 Your issues are not caused by the new vboxwrapper nor by the new method using differencing images. The logfiles clearly show there are network issues when the VM makes CVMFS requests. 2022-06-27 10:29:22 (2260): Guest Log: Checking CVMFS... 2022-06-27 10:29:24 (2260): Guest Log: Failed to check CVMFS, check output from cvmfs_config probe: 2022-06-27 10:29:24 (2260): Guest Log: Probing /cvmfs/atlas.cern.ch... Failed! 2022-06-27 10:29:24 (2260): Guest Log: Probing /cvmfs/atlas-condb.cern.ch... Failed! 2022-06-27 10:29:24 (2260): Guest Log: Probing /cvmfs/grid.cern.ch... Failed! My guess would be that your router can't deal with the huge number of concurrently open connections and drops new connection requests. |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,743,357 RAC: 19,676 |
PLEASE! FATAL: Could not read from the boot medium! System halted. Hold the ball flat... |
Send message Joined: 18 Dec 15 Posts: 1783 Credit: 116,980,901 RAC: 67,322 |
here (Windows10) the BOINC log says: "No tasks available for ATLAS simulation" while Server Status shows 2.920 unsent tasks ... :-( EDIT: receiving new tasks just now :-) including image.vdi v2.01 |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,365,553 RAC: 119,375 |
This needs more details. Is it on 1 computer or on many/all? Describe the details immediately before the error happened. #tasks, status (e.g. starting/running...), task types (LHC or other projects...) |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,365,553 RAC: 119,375 |
1st task (on Linux) started fine and is now processing events - will take a while. The task succeeded. https://lhcathome.cern.ch/lhcathome/result.php?resultid=358663313 2022-06-27 10:17:49 (68925): Detected: vboxwrapper 26204 . . . 2022-06-27 10:17:52 (68925): Adding virtual disk drive to VM. (ATLAS_vbox_2.01_image.vdi) . . . 2022-06-27 16:54:00 (68925): Guest Log: HITS file was successfully produced . . . 2022-06-27 16:54:01 (68925): Guest Log: *** Success! Shutting down the machine. *** |
Send message Joined: 17 Sep 04 Posts: 104 Credit: 32,771,246 RAC: 3,735 |
There seem to be a lot of errors generally. Mine have all errored-out, and typically several people before me have also errored-out. Because the errors occur within the first 20 minutes or so, a lot of the volume of work is not being successfully completed. Regards, Bob P. |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,365,553 RAC: 119,375 |
There is a rogue host in the list that crashes everything (even the older ATLAS tasks). I informed CERN. @rbpeake Nonetheless your computer also crashes CMS which has not been changed. You may run your work buffer dry and check your VirtualBox installation. |
Send message Joined: 14 Jan 10 Posts: 1411 Credit: 9,398,960 RAC: 12,707 |
A lot of tasks fail because they can't connect to CVMFS. I've 4 running now OK, but 8 errors because of no connection. The problem here is that those tasks will run until eternity, cause there is no check to shutdown the VM gracefully. I noticed them while they did not use CPU, so I created a computation error or aborted them. Example: 2022-06-27 18:30:54 (13232): Guest Log: Checking CVMFS... 2022-06-27 18:30:56 (13232): Guest Log: Failed to check CVMFS, check output from cvmfs_config probe: 2022-06-27 18:30:56 (13232): Guest Log: Probing /cvmfs/atlas.cern.ch... Failed! 2022-06-27 18:30:56 (13232): Guest Log: Probing /cvmfs/atlas-condb.cern.ch... Failed! 2022-06-27 18:30:56 (13232): Guest Log: Probing /cvmfs/grid.cern.ch... Failed! |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,743,357 RAC: 19,676 |
Theory have a lot of short Tasks. All are connecting to CVMFS and using Squid without problems. |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,743,357 RAC: 19,676 |
Seeing a lot of sigusr1 problems from different Server. Theory AND Atlas. Win11pro. |
Send message Joined: 14 Jan 10 Posts: 1411 Credit: 9,398,960 RAC: 12,707 |
I suppose those connection errors are caused server side. LHC's setting max_connections is maybe exceeded. That's one side of the problem. The other problem is that those failed-connection-tasks are running for ever. The administrator should build in some connection retries and when still failing shut the VM down. |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,743,357 RAC: 19,676 |
Have saved a Theory with PC shutdown over night for testing: Since 15 min. waiting with last line: grid.cern.ch: Waiting for delivery of SIGUSR1... A new Theory Task on this PC started one hour ago with a starttime of 20 min. Atlas have the same Traffic... https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10408749 2022-06-28 08:11:34 (6312): VM is no longer is a running state. It is in 'lse, errorID=DevATA_DISKFULL message="Host system reported disk full. VM execution is suspended. You can resume after freeing some space" '. |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,365,553 RAC: 119,375 |
These are the logfile lines from your recently failed tasks: 2022-06-28 07:00:34 (4532): Guest Log: Checking CVMFS... 2022-06-28 07:00:35 (4532): Guest Log: Failed to check CVMFS, check output from cvmfs_config probe: 2022-06-28 07:00:35 (4532): Guest Log: Probing /cvmfs/atlas.cern.ch... Failed! 2022-06-28 07:00:35 (4532): Guest Log: Probing /cvmfs/atlas-condb.cern.ch... Failed! 2022-06-28 07:00:36 (4532): Guest Log: Probing /cvmfs/grid.cern.ch... Failed! 2022-06-28 07:05:30 (6484): Guest Log: Checking CVMFS... 2022-06-28 07:05:31 (6484): Guest Log: Failed to check CVMFS, check output from cvmfs_config probe: 2022-06-28 07:05:31 (6484): Guest Log: Probing /cvmfs/atlas.cern.ch... Failed! 2022-06-28 07:05:31 (6484): Guest Log: Probing /cvmfs/atlas-condb.cern.ch... Failed! 2022-06-28 07:05:31 (6484): Guest Log: Probing /cvmfs/grid.cern.ch... Failed! CVMFS (server side) is very robust. Even if 1 server is unavailable the client tries all other servers from the list before it gives up. This happens independently for each repository. The pattern above points out a major network issue. This can be on your side as well as on the Cloudflare/CERN side but since many other computers are running fine it's more likely the issue is on your side. If you run a (Linux) CVMFS client inside your LAN you may manually run "cvmfs_config probe" from that client. |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,365,553 RAC: 119,375 |
Guess you highlighted the wrong line. The real issue is this: Host system reported disk full. |
©2024 CERN