Message boards :
Theory Application :
Problem of the day
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 2 May 07 Posts: 2220 Credit: 173,696,840 RAC: 24,696 |
2023-12-08 14:40:44 (26036): Guest Log: 14:40:45 CET +01:00 2023-12-08: cranky: [ERROR] 'cvmfs_config probe sft.cern.ch' failed. |
Send message Joined: 15 Jun 08 Posts: 2519 Credit: 250,938,919 RAC: 127,891 |
This Theory native task requested >70 GB RAM before it failed with status code 1: https://lhcathome.cern.ch/lhcathome/result.php?resultid=403724017 Run time 38 min 30 sec CPU time 25 min 47 sec Validate state Valid Credit 39.48 Device peak FLOPS 7.38 GFLOPS Application version Theory Simulation v300.08 (native_theory) x86_64-pc-linux-gnu Peak working set size 70.16 GB Peak swap size 70.74 GB Peak disk usage 7.74 MB 08:46:58 CET +01:00 2023-12-27: cranky: [INFO] mcplots runspec: boinc pp jets 13000 15 - phojet 1.12a default 100000 65 . . . 09:25:20 CET +01:00 2023-12-27: cranky: [INFO] Container Theory_2673-2291577-65_0 finished with status code 1. . . . 09:25:20 (95067): cranky exited; CPU time 278.290212 |
Send message Joined: 15 Jun 08 Posts: 2519 Credit: 250,938,919 RAC: 127,891 |
This Theory native task requested >38 GB RAM before it failed with status code 1: https://lhcathome.cern.ch/lhcathome/result.php?resultid=404260387 The process consuming nearly all of those 38 GB was "rivetvm.exe". Run time 37 min. 19 sek. CPU time 29 min. 43 sek. Application version Theory Simulation v300.08 (native_theory) x86_64-pc-linux-gnu Peak working set size 37.96 GB Peak swap size 38.40 GB Peak disk usage 3.38 MB 07:10:52 CET +01:00 2024-01-09: cranky: [INFO] mcplots runspec: boinc pp jets 13000 900 - phojet 1.12a default 100000 16 . . . 07:48:07 CET +01:00 2024-01-09: cranky: [INFO] Container Theory_2687-2542715-16_2 finished with status code 1. . . . 07:48:07 (112994): cranky exited; CPU time 1781.554838 |
Send message Joined: 2 May 07 Posts: 2220 Credit: 173,696,840 RAC: 24,696 |
There are some phojet 1.12a in mcPlots, but only with work from one user. |
Send message Joined: 2 May 07 Posts: 2220 Credit: 173,696,840 RAC: 24,696 |
2024-01-23 05:30:32 (21140): Guest Log: 05:30:38 CET +01:00 2024-01-23: cranky: [INFO] Checking CVMFS. 2024-01-23 05:30:33 (21140): Guest Log: Probing /cvmfs/sft.cern.ch... Failed! 2024-01-23 05:30:33 (21140): Guest Log: 05:30:38 CET +01:00 2024-01-23: cranky: [ERROR] 'cvmfs_config probe sft.cern.ch' failed. |
Send message Joined: 2 May 07 Posts: 2220 Credit: 173,696,840 RAC: 24,696 |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=406079579 2024-02-12 14:10:42 (13076): Guest Log: job: htmld=/shared/html/job 2024-02-12 14:10:49 (13076): Guest Log: tar: ./.gitignore: Cannot change ownership to uid 19256, gid 1399: Invalid argument 2024-02-12 14:10:49 (13076): Guest Log: tar: ./alpgen/README: Cannot change ownership to uid 19256, gid 1399: Invalid argument 2024-02-12 14:10:49 (13076): Guest Log: tar: ./alpgen/example_dev_alp/Makefile: Cannot change ownership to uid 19256, gid 1399: Invalid argument 2024-02-12 14:10:49 (13076): Guest Log: tar: ./alpgen/example_dev_alp/Makefile.alpgen: Cannot change ownership to uid 19256, gid 1399: Invalid argument |
Send message Joined: 2 May 07 Posts: 2220 Credit: 173,696,840 RAC: 24,696 |
Have stopped Theory under Win11pro. Seeing also Tasks from other Volunteers with no successful end. |
Send message Joined: 4 Sep 22 Posts: 90 Credit: 15,101,160 RAC: 30,979 |
2024-01-23 05:30:32 (21140): Guest Log: 05:30:38 CET +01:00 2024-01-23: cranky: [INFO] Checking CVMFS. Same happening here, only I am running everything under VBox. https://lhcathome.cern.ch/lhcathome/result.php?resultid=406100207 |
Send message Joined: 2 May 07 Posts: 2220 Credit: 173,696,840 RAC: 24,696 |
2024-02-12 19:56:10 (15040): Guest Log: 19:56:09 CET +01:00 2024-02-12: cranky: [INFO] Container 'runc' finished with status code 0. 2024-02-12 19:56:10 (15040): Guest Log: 19:56:09 CET +01:00 2024-02-12: cranky: [INFO] Preparing output. 2024-02-12 19:56:10 (15040): Guest Log: 19:56:09 CET +01:00 2024-02-12: cranky: [ERROR] No output found. Task running normal, no problem with CVMFS, BUT, the outputfile is not transfered to CERN-IT. |
Send message Joined: 17 Aug 17 Posts: 81 Credit: 8,410,301 RAC: 4,238 |
All the vbox units seem to be failing here as well |
Send message Joined: 4 Sep 22 Posts: 90 Credit: 15,101,160 RAC: 30,979 |
New error today. Nearly 30 of them, all reported just after 0900 UTC 16 Feb: 2024-02-16 03:19:00 (32167): Adding storage controller(s) to VM. 2024-02-16 03:19:00 (32167): Adding virtual disk drive to VM. (Theory_2023_12_13.vdi) 2024-02-16 03:19:05 (32167): Error in deregister parent vdi for VM: -2135228404 Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/Theory_2023_12_13.vdi" Output: VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/Theory_2023_12_13.vdi' because it has 2 child media VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "Close()" at line 1875 of file VBoxManageDisk.cpp 2024-02-16 03:19:05 (32167): Could not create VM 2024-02-16 03:19:05 (32167): ERROR: VM failed to start 2024-02-16 03:19:05 (32167): Powering off VM. 2024-02-16 03:19:05 (32167): Deregistering VM. (boinc_d0135c6cd87fd305, slot#12) 2024-02-16 03:19:05 (32167): Removing network bandwidth throttle group from VM. 2024-02-16 03:19:05 (32167): Removing VM from VirtualBox. and then from the VM trace log: 2024-02-16 03:19:00 (32167): Command: VBoxManage -q storageattach "boinc_d0135c6cd87fd305" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/Theory_2023_12_13.vdi" Exit Code: -2135228409 Output: VBoxManage: error: Cannot attach medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/Theory_2023_12_13.vdi': the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later VBoxManage: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component SessionMachine, interface IMachine, callee nsISupports VBoxManage: error: Context: "AttachDevice(Bstr(pszCtl).raw(), port, device, DeviceType_HardDisk, pMedium2Mount)" at line 785 of file VBoxManageStorageController.cpp 2024-02-16 03:19:00 (32167): Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/Theory_2023_12_13.vdi" Exit Code: -2135228404 Output: VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/Theory_2023_12_13.vdi' because it has 2 child media VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "Close()" at line 1875 of file VBoxManageDisk.cpp Note in particular the line that says "...can only be attached to machines that were created with VirtualBox 4.0 or later", which is very strange because I am running version 7.0.12 |
Send message Joined: 2 May 07 Posts: 2220 Credit: 173,696,840 RAC: 24,696 |
This line seeing often in the past. It must be a problem with multiattach i thinking. Only, Cern-IT, (Laurence and the Team) are possible to find a solution. |
Send message Joined: 28 Dec 08 Posts: 334 Credit: 4,833,247 RAC: 1,897 |
What this? 2024-02-19 16:20:37 (21232): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds)) 2024-02-19 16:20:45 (21232): Guest Log: 00:57:41.410265 timesync vgsvcTimeSyncWorker: Radical host time change: 2 046 321 000 000ns (HostNow=1 708 356 041 695 000 000 ns HostLast=1 708 353 995 374 000 000 ns) 2024-02-19 16:20:52 (21232): Guest Log: 00:57:51.417034 timesync vgsvcTimeSyncWorker: Radical guest time change: 2 046 347 637 000ns (GuestNow=1 708 356 051 728 222 000 ns GuestLast=1 708 354 005 380 585 000 ns fSetTimeLastLoop=true ) 2024-02-19 17:13:25 (21232): Status Report: Job Duration: '864000.000000' 2024-02-19 17:13:25 (21232): Status Report: Elapsed Time: '6004.483085' 2024-02-19 17:13:25 (21232): Status Report: CPU Time: '6404.296875' 2024-02-19 17:36:43 (21232): Guest Log: job: run exitcode=0 2024-02-19 17:36:43 (21232): Guest Log: job: diskusage=4132 2024-02-19 17:36:43 (21232): Guest Log: job: logsize=72 k 2024-02-19 17:36:43 (21232): Guest Log: job: times= 2024-02-19 17:36:43 (21232): Guest Log: 0m0.008s 0m0.012s 2024-02-19 17:36:43 (21232): Guest Log: 128m8.477s 0m45.881s 2024-02-19 17:36:43 (21232): Guest Log: job: cpuusage=7734 2024-02-19 17:36:43 (21232): Guest Log: 17:36:43 CET +01:00 2024-02-19: cranky: [INFO] Container 'runc' finished with status code 0. 2024-02-19 17:36:43 (21232): Guest Log: 17:36:43 CET +01:00 2024-02-19: cranky: [INFO] Preparing output. 2024-02-19 17:36:43 (21232): Guest Log: 17:36:43 CET +01:00 2024-02-19: cranky: [ERROR] No output found. 2024-02-19 17:36:43 (21232): Guest Log: [ERROR] Job Failed 2024-02-19 17:36:43 (21232): Guest Log: [INFO] Shutting Down. 2024-02-19 17:36:43 (21232): VM Completion File Detected. 2024-02-19 17:36:43 (21232): VM Completion Message: Job Failed Radical host time change??? What burped an hour in to make it fail? https://lhcathome.cern.ch/lhcathome/result.php?resultid=406272110 You will see it stop and powers off and then restarts. I have other projects that I do, so this tasks time was up for the moment and another project started in it place. Then it restarts and runs an hour and dies. |
Send message Joined: 14 Jan 10 Posts: 1409 Credit: 9,325,730 RAC: 9,392 |
Radical host time change???That was not the reason for the task to fail. The workunit was created during the period 12 Feb until about 17 Feb 2300 CET. All tasks and their resends fail during that period and for resends also thereafter. The mentioned radical time change is because the Linux VM always uses UTC and your Windows host your local time without reference to UTC. You may prevent this: For 64-bit Windows, open regedit then browse to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\TimeZoneInformation. Create a new QWORD entry called RealTimeIsUniversal , then set its value to 1 . Reboot the system. The clock should now be in UTC time. |
Send message Joined: 15 Jun 08 Posts: 2519 Credit: 250,938,919 RAC: 127,891 |
Today I had another phojet task continuously eating up all available RAM. So far >60 GB RAM within less than 30 min runtime. runRivet.log shows >35000 lines like this: 0 events processed and very few lines like this: Rivet.AnalysisHandler: WARN Sub-event weight list has 2000 elements: are the weight numbers correctly set in the input events? |
Send message Joined: 15 Jun 08 Posts: 2519 Credit: 250,938,919 RAC: 127,891 |
Just killed another rogue phojet eating up >60GB RAM. https://lhcathome.cern.ch/lhcathome/result.php?resultid=408012071 Theory_2687-2528715-1157_1 21:49:00 CET +01:00 2024-03-19: cranky: [INFO] mcplots runspec: boinc pp jets 13000 430 - phojet 1.12a default 100000 1157 |
Send message Joined: 2 May 07 Posts: 2220 Credit: 173,696,840 RAC: 24,696 |
Theory_2743-2818781-24 https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=221131540 |
Send message Joined: 2 May 07 Posts: 2220 Credit: 173,696,840 RAC: 24,696 |
cvmfs_config probe sft.cern.ch failed Theory https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=222398066 |
Send message Joined: 27 Apr 24 Posts: 10 Credit: 563,349 RAC: 1,808 |
Previously, my native Theory workunits have been running fine, but now they are failing with 11:18:08 BST +01:00 2024-06-26: cranky-0.1.4: [INFO] Can't find 'runc'. 11:18:08 BST +01:00 2024-06-26: cranky-0.1.4: [ERROR] Major requirements are missing. Can't run this task. 11:18:08 BST +01:00 2024-06-26: cranky-0.1.4: [INFO] Early shutdown initiated due to previous errors. 11:18:08 BST +01:00 2024-06-26: cranky-0.1.4: [INFO] Cleanup will take a few minutes... 11:30:52 (12643): cranky exited; CPU time 0.362101 11:30:52 (12643): app exit status: 0xce 11:30:52 (12643): called boinc_finish(195) I installed runc version 1.1.13, but now my workunits are failing with 14:28:08 BST +01:00 2024-06-26: cranky-0.1.4: [INFO] Found a local runc version 1.1.13. 14:28:08 BST +01:00 2024-06-26: cranky-0.1.4: [ERROR] Major requirements are missing. Can't run this task. 14:28:08 BST +01:00 2024-06-26: cranky-0.1.4: [INFO] Early shutdown initiated due to previous errors. 14:28:08 BST +01:00 2024-06-26: cranky-0.1.4: [INFO] Cleanup will take a few minutes... I've no idea what is wrong here. It's a shame that the stderr.txt doesn't tell you how to fix the problem. |
Send message Joined: 2 May 07 Posts: 2220 Credit: 173,696,840 RAC: 24,696 |
I've no idea what is wrong here. It's a shame that the stderr.txt doesn't tell you how to fix the problem. Have you searched here for runc or cgroup? |
©2024 CERN