Message boards :
ATLAS application :
Problem of the day ATLAS
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 Sep 08 Posts: 857 Credit: 703,152,539 RAC: 137,904 ![]() ![]() ![]() |
|
Send message Joined: 2 May 07 Posts: 2260 Credit: 175,581,097 RAC: 15,522 ![]() ![]() ![]() |
Yes, saw this also, but only in a small number of Atlas-Tasks, also Guru Meditation, last week. We can only control the Error-Tasks of Atlas or this one with too long runtime and deleting this Tasks. |
Send message Joined: 14 Jan 10 Posts: 1440 Credit: 9,657,640 RAC: 1,126 ![]() ![]() |
When you see this happen, you could revive the task: 1. Suspend the task in BOINC with "leave in memory" not selected. The VM will be saved to disk. 2. With Virtual Box Manager: - delete the saved state - start the VM and let it run until the first events are processing - stop the VM with writing the saved state to disk 3. Resume the task in BOINC |
Send message Joined: 2 May 07 Posts: 2260 Credit: 175,581,097 RAC: 15,522 ![]() ![]() ![]() |
2022-05-25 16:02:06 (11660): Guest Log: Running cvmfs_config stat atlas.cern.ch 2022-05-25 16:02:06 (11660): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2022-05-25 16:02:06 (11660): Guest Log: 2.6.3.0 1781 307445734561825742 32288 104734 4 1 1492424 4096000 0 65024 0 0 n/a 0 0 http://s1cern-cvmfs.openhtc.io/cvmfs/atlas.cern.ch http://xx.yyy.zzz.aa:3128 1 2022-05-25 16:02:06 (11660): Guest Log: ATHENA_PROC_NUMBER=12 2022-05-25 16:02:06 (11660): Guest Log: *** Starting ATLAS job. (PandaID=5463929576 taskID=29107814) *** 2022-05-25 16:12:56 (11660): VM is no longer is a running state. It is in 'GuruMeditation'. 2022-05-25 16:12:56 (11660): VM state change detected. (old = 'Running', new = 'GuruMeditation') 2022-05-25 16:12:56 (11660): Powering off VM. 2022-05-25 16:12:56 (11660): Deregistering VM. (boinc_d20a7b32445566aa, slot#5) 2022-05-25 16:13:38 (11660): Removing network bandwidth throttle group from VM. 2022-05-25 16:13:39 (11660): Removing VM from VirtualBox. 2022-05-25 16:14:17 (11660): Virtual machine exited. 16:14:27 (11660): called boinc_finish(0) |
Send message Joined: 27 Sep 08 Posts: 857 Credit: 703,152,539 RAC: 137,904 ![]() ![]() ![]() |
![]() |
Send message Joined: 2 May 07 Posts: 2260 Credit: 175,581,097 RAC: 15,522 ![]() ![]() ![]() |
2022-06-21 20:26:53 (52460): Guest Log: 2.6.3.0 1852 307445734561825742 32172 105817 3 1 1492435 4096000 0 65024 0 0 n/a 0 0 http://s1cern-cvmfs.openhtc.io/cvmfs/atlas.cern.ch http://xx.xxx.xxx.xx:3128 1 2022-06-21 20:26:53 (52460): Guest Log: ATHENA_PROC_NUMBER=12 2022-06-21 20:26:55 (52460): Guest Log: *** Starting ATLAS job. (PandaID=5497406599 taskID=29339193) *** 2022-06-21 22:02:59 (52460): Status Report: Elapsed Time: '6000.000000' 2022-06-21 22:02:59 (52460): Status Report: CPU Time: '29828.796875' 2022-06-21 23:43:05 (52460): Status Report: Elapsed Time: '12000.000000' 2022-06-21 23:43:05 (52460): Status Report: CPU Time: '66942.609375' 2022-06-22 00:41:18 (52460): Guest Log: *** Job finished *** Computer ID 10795955 https://lhcathome.cern.ch/lhcathome/result.php?resultid=358429345 Laufzeit 4 hours 18 min. 52 sek. CPU Zeit 23 hours 48 min. 43 sek. Prüfungsstatus Gültig Punkte 871.48 12 CPU's: 4 hours x 12 = 48 Hours. CPU Time 23 hours 48 min. 43 sek?? |
Send message Joined: 2 May 07 Posts: 2260 Credit: 175,581,097 RAC: 15,522 ![]() ![]() ![]() |
native Atlas with this timestamps: Endstatus 0 (0x00000000) Computer ID 10816264 Laufzeit 7 Stunden 51 min. 44 sek. CPU Zeit 3 Stunden 47 min. 11 sek. Prüfungsstatus Gültig CentOS9 -native with all updates, including from yesterday. [2024-10-16 08:02:12] apptainer version 1.3.4-1.el9 https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10816264 |
Send message Joined: 2 May 07 Posts: 2260 Credit: 175,581,097 RAC: 15,522 ![]() ![]() ![]() |
Computer ID 10797673 Laufzeit 1 Stunden 35 min. 14 sek. CPU Zeit 1 min. 14 sek. Prüfungsstatus Gültig Punkte 648.32 First time seeing Atlas Task with this CPU-time! |
![]() Send message Joined: 15 Jun 08 Posts: 2605 Credit: 262,148,552 RAC: 133,342 ![]() ![]() |
Lots of ATLAS tasks are failing due to a missing file "PDGTABLE.MeV". [2024-10-25 09:36:41] 2024-10-25 07:36:26,733 | INFO | exeerrordiag: Non-zero return code from EVNTtoHITS (8); Logfile error in log.EVNTtoHITS: "IOError: [Errno 2] No such file or directory: 'PDGTABLE.MeV'" [2024-10-25 09:36:41] 2024-10-25 07:36:26,733 | INFO | exitcode: 65 [2024-10-25 09:36:41] 2024-10-25 07:36:26,733 | INFO | exitmsg: Non-zero return code from EVNTtoHITS (8); Logfile error in log.EVNTtoHITS: "IOError: [Errno 2] No such file or directory: 'PDGTABLE.MeV'" |
Send message Joined: 2 May 07 Posts: 2260 Credit: 175,581,097 RAC: 15,522 ![]() ![]() ![]() |
In Win11pro today NO Atlas-Task starts correct. Found for example this message at the end of the logfile: <message> upload failure: <file_xfer_error> <file_name>JuRNDmXXXO6n9Rq4apOajLDm4fhM0noT9bVorHsSDmgV5KDmspl9qm_0_r1027194911_ATLAS_result</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> </message> Theory also with problems, but CMS is running for the moment. |
![]() Send message Joined: 28 Sep 04 Posts: 744 Credit: 51,964,501 RAC: 31,375 ![]() ![]() ![]() |
|
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,876,808 RAC: 4,053 ![]() ![]() ![]() |
Got one here, too. I remember that files >2G have been a problem in the past, I thought it had been fixed.... |
![]() Send message Joined: 7 Aug 11 Posts: 105 Credit: 26,099,112 RAC: 1,161 ![]() ![]() ![]() |
Multiple "file size too big" here https://lhcathome.cern.ch/lhcathome/result.php?resultid=415199482 https://lhcathome.cern.ch/lhcathome/result.php?resultid=415206927 https://lhcathome.cern.ch/lhcathome/result.php?resultid=415209263 https://lhcathome.cern.ch/lhcathome/result.php?resultid=415209312 https://lhcathome.cern.ch/lhcathome/result.php?resultid=415209313 |
Send message Joined: 14 Sep 08 Posts: 52 Credit: 66,850,956 RAC: 31,909 ![]() ![]() ![]() |
Same here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=415205300 I also have results for those 2050 events that succeeded because they are just under 2GB. Now I'm not sure if I should just abort the others... |
Send message Joined: 18 Dec 15 Posts: 1841 Credit: 126,292,186 RAC: 124,435 ![]() ![]() ![]() |
So I am lucky: on one of my old notebooks there's a 50 events task running right now :-) |
Send message Joined: 11 Jul 06 Posts: 6 Credit: 2,915,386 RAC: 1 ![]() ![]() |
As I see it, this limit is set up in the work generator, determined by max_nbytes https://boinc.berkeley.edu/trac/wiki/JobTemplates In source code says default is 1 GB. So probably, there is 2 GB limit in the work generator. <output_template> <file_info> <name><OUTFILE_0/></name> <generated_locally/> <upload_when_present/> <max_nbytes>32768</max_nbytes> <url><UPLOAD_URL/></url> [ <gzip_when_done/> ] </file_info> <max_nbytes> maximum file size. If the actual size exceeds this, the file will not be uploaded, and the job will be marked as an error. I also found 2 bad native runs: 2029388744 Oct 24 13:08 shared/HITS.pool.root.1 2067339121 Oct 24 11:39 shared/HITS.pool.root.1 https://lhcathome.cern.ch/lhcathome/result.php?resultid=415199150 https://lhcathome.cern.ch/lhcathome/result.php?resultid=415199151 |
Send message Joined: 2 May 07 Posts: 2260 Credit: 175,581,097 RAC: 15,522 ![]() ![]() ![]() |
DataCenter Networkswitch down today. Saw this on Cern support site. Don't know, if this is a reason for the problems here. |
Send message Joined: 18 Dec 15 Posts: 1841 Credit: 126,292,186 RAC: 124,435 ![]() ![]() ![]() |
DataCenter Networkswitch down today.hm, all 3 subprojects are so far running okay on my hosts. |
©2025 CERN