Message boards :
ATLAS application :
Repeated computation errors - Missing Files
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Sep 09 Posts: 10 Credit: 1,247,559 RAC: 0 |
Well, my issue is a bit different... The event log shows that a result file is missing and throws a Computation Error. For example: 12/16/2021 8:40:41 PM | LHC@home | Computation for task EThLDmiK0E0np2BDcpmwOghnABFKDmABFKDmt5BNDmABFKDmSg9dqm_2 finished 12/16/2021 8:40:41 PM | LHC@home | Output file EThLDmiK0E0np2BDcpmwOghnABFKDmABFKDmt5BNDmABFKDmSg9dqm_2_r23588790_ATLAS_result for task EThLDmiK0E0np2BDcpmwOghnABFKDmABFKDmt5BNDmABFKDmSg9dqm_2 absent 12/16/2021 8:40:41 PM | LHC@home | Starting task fUVKDm1i1E0nfZGDcpSWOuwoABFKDmABFKDmTTVVDmABFKDmVgv8mm_2 This is occurring on ALL Atlas work units on this machine. The file missing error is similar in each work unit error log entry. This machine is relatively new with BOINC and Virtualbox software downloaded recently too (latest). The Virtualbox matching extension pack is also installed. I have seen some BOINC <--> Virtualbox issues with other unrelated apps that seem to be timing issues. It seems that BOINC may be having problems sequencing priorities when an app is downloaded to use ALL available CPU cores that infrequently become all available (Cosmology@Home). Other LHC@home apps and Rosetta@Home apps are downloading work units and using Virtaulbox just fine. I have not been using Virtualbox for other stuff yet (non-BOINC). Any thoughts? Thanks, Tim IDENTICAL is only a concept... |
Send message Joined: 18 Dec 15 Posts: 1810 Credit: 118,224,978 RAC: 26,910 |
... Other LHC@home apps and Rosetta@Home apps are downloading work units and using Virtaulbox just fine.Rosetta@home is using Virtualbox now? |
Send message Joined: 15 Jun 08 Posts: 2528 Credit: 253,722,201 RAC: 62,755 |
The snippet from the BOINC log doesn't show why the result file is not present. Better information may be present in the task logfiles (stderr.txt) but nobody can see them since your computers are hidden. Hence, you may make your computers visible here: https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project Enable "Should LHC@home show your computers on its web site?". Only if no task has been reported back to the project yet: Locate ".../slots/n/stderr.txt" of a running task and post that here. (with n being the slot number). |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
... Other LHC@home apps and Rosetta@Home apps are downloading work units and using Virtaulbox just fine.Rosetta@home is using Virtualbox now? rosetta python is using VirtualBox with a demand of 9 GB RAM and 19 GB disk. Tullio |
Send message Joined: 25 Sep 17 Posts: 99 Credit: 3,425,566 RAC: 0 |
The current Rosetta VM is created using 6Gb of RAM and an 8 Gb HD. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I have a PC with 24 GB RAM and 10 GB of disk space available to BOINC: Rosetta sends me a message tha it needs 19 GB of disk and does not send me any task. Another PC has 47 GB disk but only 12 GB RAM. I get tasks but cannot use that PC for tasks like reading mail. Rosetta tasks are waiting for more memory. Tullio |
Send message Joined: 25 Sep 17 Posts: 99 Credit: 3,425,566 RAC: 0 |
Tullio, Please post at the Rosetta forum if you need help. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
Tullio, Please post at the Rosetta forum if you need help. Thanks, but I know the limits of my hardware, two desktop running Windows,. a recent laptop also running Windows, and an older laptop running SuSE Linux Leap 15.0, also a Linux Virtual machine on the desktop with more RAM running SuSE Tumbleweeed, which is a development version. I try to give each computer the tasks which suit it more. QuChemPedAI@home needs 1.9 GB RAM, Atlas@home needs 4.9 GB, rosetta python 6.733 GB. Tullio |
Send message Joined: 5 Sep 09 Posts: 10 Credit: 1,247,559 RAC: 0 |
I had to wait for some more Atlas work units, so now I have some data to look at... First, the work units seem to start and process but all of them crash after about 6 minutes. This is consistent and they all show computation error. Some of the data I was looking at from the slots suggest that Virtualbox might not be installed properly, so I stopped BOINC, uninstalled Virtualbox, rebooted, downloaded Vbox and the extension pack, and reinstalled Vbox and the extensions again. The results were the same. Here is a copy of the output in the Atlas slot (before it was replaced by another project).: stderr.txt 2021-12-18 16:38:53 (24508): Detected: vboxwrapper 26197 2021-12-18 16:38:53 (24508): Detected: BOINC client v7.7 2021-12-18 16:38:53 (24508): Status Report: Launching vboxsvc.exe. (PID = '8') 2021-12-18 16:40:34 (24508): Error in guest additions for VM: -182 Command: VBoxManage -q list systemproperties Output: 2021-12-18 16:40:34 (24508): Detected: VirtualBox VboxManage Interface (Version: 6.1.30) 2021-12-18 16:40:34 (24508): Detected: Sandbox Configuration Enabled vboxreplay.txt "VBoxSVC.exe" --logrotate 1 VBoxManage -q --version VBoxManage -q list systemproperties VBoxManage -q list systemproperties VBoxManage -q list systemproperties VBoxManage -q list systemproperties VBoxManage -q list systemproperties VBoxManage -q list systemproperties VBoxManage -q list hostinfo VBoxManage -q list hostinfo VBoxManage -q list hostinfo VBoxManage -q list hostinfo init_data.xml <app_init_data> <major_version>7</major_version> <minor_version>16</minor_version> <release>20</release> <app_version>200</app_version> <userid>163767</userid> <teamid>0</teamid> <hostid>10699976</hostid> <app_name>ATLAS</app_name> <project_preferences> <apps_selected> <app_id>1</app_id> <app_id>11</app_id> <app_id>13</app_id> <app_id>14</app_id> </apps_selected> <allow_non_preferred_apps>1</allow_non_preferred_apps> <max_jobs>0</max_jobs> <max_cpus>0</max_cpus> </project_preferences> <user_name>tgm</user_name> <project_dir>C:\ProgramData\BOINC/projects/lhcathome.cern.ch_lhcathome</project_dir> <boinc_dir>C:\ProgramData\BOINC</boinc_dir> <authenticator>6cdf82f197597539c1f4d644cbc8e49c</authenticator> <wu_name>13lMDmn74C0n9Rq4apoT9bVoABFKDmABFKDmkUTQDmABFKDmB3IJin</wu_name> <result_name>13lMDmn74C0n9Rq4apoT9bVoABFKDmABFKDmkUTQDmABFKDmB3IJin_3</result_name> <comm_obj_name>boinc_4</comm_obj_name> <slot>4</slot> <client_pid>7464</client_pid> <wu_cpu_time>0.000000</wu_cpu_time> <starting_elapsed_time>0.000000</starting_elapsed_time> <using_sandbox>1</using_sandbox> <vm_extensions_disabled>0</vm_extensions_disabled> <user_total_credit>1242355.377128</user_total_credit> <user_expavg_credit>1.707136</user_expavg_credit> <host_total_credit>46.036917</host_total_credit> <host_expavg_credit>1.177390</host_expavg_credit> <resource_share_fraction>0.526316</resource_share_fraction> <checkpoint_period>360.000000</checkpoint_period> <fraction_done_start>0.000000</fraction_done_start> <fraction_done_end>1.000000</fraction_done_end> <gpu_type></gpu_type> <gpu_device_num>-1</gpu_device_num> <gpu_opencl_dev_index>-1</gpu_opencl_dev_index> <gpu_usage>0.000000</gpu_usage> <ncpus>8.000000</ncpus> <rsc_fpops_est>43200000000000.000000</rsc_fpops_est> <rsc_fpops_bound>6000000000000000000.000000</rsc_fpops_bound> <rsc_memory_bound>10695475200.000000</rsc_memory_bound> <rsc_disk_bound>10000000000.000000</rsc_disk_bound> <computation_deadline>1640467162.000000</computation_deadline> <vbox_window>0</vbox_window> <no_priority_change>0</no_priority_change> <process_priority>-1</process_priority> <process_priority_special>-1</process_priority_special> <host_info> <timezone>-18000</timezone> <domain_name>PC6</domain_name> <ip_addr>192.168.56.1</ip_addr> <host_cpid>2621baeceb17c00bfdc599f9f6163fa8</host_cpid> <p_ncpus>28</p_ncpus> <p_vendor>GenuineIntel</p_vendor> <p_model>Intel(R) Core(TM) i9-10940X CPU @ 3.30GHz [Family 6 Model 85 Stepping 7]</p_model> <p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 dca pbe fsgsbase bmi1 smep bmi2</p_features> <p_fpops>4952372366.009323</p_fpops> <p_iops>17043145481.827120</p_iops> <p_membw>71428571.428571</p_membw> <p_calculated>1638181882.457922</p_calculated> <p_vm_extensions_disabled>0</p_vm_extensions_disabled> <m_nbytes>68414869504.000000</m_nbytes> <m_cache>262144.000000</m_cache> <m_swap>78615416832.000000</m_swap> <d_total>1999068200960.000000</d_total> <d_free>1398867492864.000000</d_free> <os_name>Microsoft Windows 10</os_name> <os_version>Professional x64 Edition, (10.00.19044.00)</os_version> <n_usable_coprocs>0</n_usable_coprocs> <wsl_available>0</wsl_available> <virtualbox_version>6.1.30</virtualbox_version> <coprocs> </coprocs> </host_info> <proxy_info> <socks_server_name></socks_server_name> <socks_server_port>80</socks_server_port> <http_server_name></http_server_name> <http_server_port>80</http_server_port> <socks5_user_name></socks5_user_name> <socks5_user_passwd></socks5_user_passwd> <socks5_remote_dns>0</socks5_remote_dns> <http_user_name></http_user_name> <http_user_passwd></http_user_passwd> <no_proxy></no_proxy> <no_autodetect>0</no_autodetect> </proxy_info> <global_preferences> <source_project>http://www.worldcommunitygrid.org/</source_project> <mod_time>1591503487.000000</mod_time> <battery_charge_min_pct>90.000000</battery_charge_min_pct> <battery_max_temperature>40.000000</battery_max_temperature> <run_on_batteries>0</run_on_batteries> <run_if_user_active>1</run_if_user_active> <run_gpu_if_user_active>0</run_gpu_if_user_active> <suspend_if_no_recent_input>0.000000</suspend_if_no_recent_input> <suspend_cpu_usage>25.000000</suspend_cpu_usage> <start_hour>0.000000</start_hour> <end_hour>0.000000</end_hour> <net_start_hour>0.000000</net_start_hour> <net_end_hour>0.000000</net_end_hour> <leave_apps_in_memory>1</leave_apps_in_memory> <confirm_before_connecting>0</confirm_before_connecting> <hangup_if_dialed>0</hangup_if_dialed> <dont_verify_images>0</dont_verify_images> <work_buf_min_days>0.010000</work_buf_min_days> <work_buf_additional_days>0.100000</work_buf_additional_days> <max_ncpus_pct>50.000000</max_ncpus_pct> <cpu_scheduling_period_minutes>60.000000</cpu_scheduling_period_minutes> <disk_interval>360.000000</disk_interval> <disk_max_used_gb>100.000000</disk_max_used_gb> <disk_max_used_pct>10.000000</disk_max_used_pct> <disk_min_free_gb>0.000000</disk_min_free_gb> <vm_max_used_pct>75.000000</vm_max_used_pct> <ram_max_used_busy_pct>50.000000</ram_max_used_busy_pct> <ram_max_used_idle_pct>70.000000</ram_max_used_idle_pct> <idle_time_to_run>3.000000</idle_time_to_run> <max_bytes_sec_up>0.000000</max_bytes_sec_up> <max_bytes_sec_down>0.000000</max_bytes_sec_down> <cpu_usage_limit>5.000000</cpu_usage_limit> <daily_xfer_limit_mb>0.000000</daily_xfer_limit_mb> <daily_xfer_period_days>0</daily_xfer_period_days> <override_file_present>1</override_file_present> <network_wifi_only>1</network_wifi_only> </global_preferences> <app_file>vboxwrapper_26198ab7_windows_x86_64.exe</app_file> <app_file>ATLAS_vbox_2.00_job.xml</app_file> <app_file>ATLAS_vbox_2.00_image.vdi</app_file> </app_init_data> vboxtrace.txt 2021-12-18 16:38:53 (24508): Command: "VBoxSVC.exe" --logrotate 1 Exit Code: 0 Output: 2021-12-18 16:38:54 (24508): Command: VBoxManage -q --version Exit Code: 0 Output: 6.1.30r148432 2021-12-18 16:38:54 (24508): Command: VBoxManage -q list systemproperties Exit Code: -2147024891 Output: VBoxManage.exe: error: Failed to create the VirtualBox object! VBoxManage.exe: error: The object is not ready VBoxManage.exe: error: Details: code E_ACCESSDENIED (0x80070005), component VirtualBoxClientWrap, interface IVirtualBoxClient 2021-12-18 16:38:55 (24508): Command: VBoxManage -q list systemproperties Exit Code: -2147024891 Output: VBoxManage.exe: error: Failed to create the VirtualBox object! VBoxManage.exe: error: The object is not ready VBoxManage.exe: error: Details: code E_ACCESSDENIED (0x80070005), component VirtualBoxClientWrap, interface IVirtualBoxClient 2021-12-18 16:38:56 (24508): Command: VBoxManage -q list systemproperties Exit Code: -2147024891 Output: VBoxManage.exe: error: Failed to create the VirtualBox object! VBoxManage.exe: error: The object is not ready VBoxManage.exe: error: Details: code E_ACCESSDENIED (0x80070005), component VirtualBoxClientWrap, interface IVirtualBoxClient 2021-12-18 16:38:58 (24508): Command: VBoxManage -q list systemproperties Exit Code: -2147024891 Output: VBoxManage.exe: error: Failed to create the VirtualBox object! VBoxManage.exe: error: The object is not ready VBoxManage.exe: error: Details: code E_ACCESSDENIED (0x80070005), component VirtualBoxClientWrap, interface IVirtualBoxClient 2021-12-18 16:39:46 (24508): Command: VBoxManage -q list systemproperties Exit Code: -182 Output: 2021-12-18 16:40:34 (24508): Command: VBoxManage -q list systemproperties Exit Code: -182 Output: 2021-12-18 16:41:21 (24508): Command: VBoxManage -q list hostinfo Exit Code: -182 Output: 2021-12-18 16:42:09 (24508): Command: VBoxManage -q list hostinfo Exit Code: -182 Output: 2021-12-18 16:42:57 (24508): Command: VBoxManage -q list hostinfo Exit Code: -182 Output: 2021-12-18 16:43:45 (24508): Command: VBoxManage -q list hostinfo Exit Code: -182 Output: Hopefully this makes sense to somebody... From the info I saw in BOINCmgr, all work units were trying to process with 8 CPU's. I have 50% allocated in BOINC preferences (of 14 CPU's w/ 28 threads), so there are plenty of resources available to BOINC. I have some additional logfiles from some other crashes but they seem to be essentially the same. Thanks, Tim IDENTICAL is only a concept... |
Send message Joined: 2 May 07 Posts: 2240 Credit: 173,894,884 RAC: 3,757 |
Your Computer is not visable for us. Otherwhile you can go through Yeti's Checklist to find an early answer. |
Send message Joined: 25 Sep 17 Posts: 99 Credit: 3,425,566 RAC: 0 |
Have you tried working through the Yeti checklist? You can also get a look inside the VM on different virtual terminals for some more information. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4302 |
Send message Joined: 15 Jun 08 Posts: 2528 Credit: 253,722,201 RAC: 62,755 |
Your tasks fail in a very early phase which is very unusual: 2021-12-18 22:53:31 (23968): Status Report: Launching vboxsvc.exe. (PID = '10976') 2021-12-18 22:55:12 (23968): Error in guest additions for VM: -182 Some possible reasons 1. The VirtualBox installation is incomplete/damaged Remove VirtualBox completely Clean all related keys from the registry Reboot Install VirtualBox from the scratch Reboot 2. The vboxwrapper currently used for ATLAS may be incompatible with the most recent VirtualBox This is still under investigation and may result in an updated ATLAS app (my guess: not this year) To check whether this causes the issue you may run a CMS task. CMS already uses a compatible vboxwrapper. Other solution: downgrade VirtualBox to the last 5.x 3. A virus scanner or incomplete access rights may block vboxwrapper or vboxsvc.exe Check this and remove the blockade You have been asked a couple of times to unhide your computers since this would make debugging easier. There's no reason to ignore that request. |
Send message Joined: 5 Sep 09 Posts: 10 Credit: 1,247,559 RAC: 0 |
Your Computer is not visable for us. What is it that you are trying to see that I have not provided? I'm certainly don't want my machine to be visible outside of my network. Or is it just a data payload sent back to to the server? I sure would like to know a lot more regarding what specific info is in that payload if this is the case. |
Send message Joined: 25 Sep 17 Posts: 99 Credit: 3,425,566 RAC: 0 |
Click on one of the users that has replied to you in this forum and take a look at what you see under computers and the tasks of that user. |
Send message Joined: 2 May 07 Posts: 2240 Credit: 173,894,884 RAC: 3,757 |
Your Computer is not visable for us. It's ok, to hide your PC's. Yeti's checklist is a useful instrument to see what is needed to run a vbox-task for Atlas. We see in the Info's of the PC's only a few more Infos, not more. What you see from our PC's, is that, what we also see from your PC's, nothing more. |
Send message Joined: 5 Sep 09 Posts: 10 Credit: 1,247,559 RAC: 0 |
Well, I opened up things so you can view my computers... The issues are with Atlas on PC6. Guess what, I've provided FAR, FAR, FAR more diagnostic info than any of those records show. Another tidbit... I see that PC1 is having the same problem. This too has the current Virtualbox, but it's running under Windows 11 (with far fewer resources). In an attempt to repair the Virtualbox, I ended up trashing the mouse drivers and had to go a bit crazy going through a recovery without a mouse. Don't buy into some of the fixes for the -182 error that are out there! I just love Windows!!!!! On Linux, Unix, etc you can do just about anything from the command line. In any case, my next move it to go back to an earlier version of Virtualbox. Tim BTW... is there a way to shut off only Atlas work units from LHC? IDENTICAL is only a concept... |
Send message Joined: 25 Sep 17 Posts: 99 Credit: 3,425,566 RAC: 0 |
This website and then Project, Preferences drop down. Select or deselect the apps you want. |
Send message Joined: 2 May 07 Posts: 2240 Credit: 173,894,884 RAC: 3,757 |
You can change your prefs for LHC@home to select sixtrack, only. This is the only program using NO Virtualbox. BUT, the tasks coming in only in waves. Sometime nothing, sometime a lot of work. |
Send message Joined: 15 Jun 08 Posts: 2528 Credit: 253,722,201 RAC: 62,755 |
Completely (!) disable all sandbox/Hyper-V components. Then do a full (!) reboot (not just a quick restart). Installing the VirtualBox extensions is not a must to run LHC@home tasks. If you install them their version number must be in sync with the main VirtualBox version. |
Send message Joined: 5 Sep 09 Posts: 10 Credit: 1,247,559 RAC: 0 |
Well, I downgraded to Virtualbox 5.2.44 and the results changed but they still have errors. Ran for hours instead of 6 minutes each. It looked like it was working but failed in the end. The install was with the defaults so whatever Virtualbox throws in there for sandbox is what's there. Virtualbox will not even initialize if Hyper-V is installed. I've spent too much time on this, so I'm just disabling ATLAS units for now. I probably will need to run newer Virtualbox anyway where 5.2.x and 6.0 versions have been out of support for more than a year. I have other environments to be concerned with that are the priority. Thanks, Tim IDENTICAL is only a concept... |
©2024 CERN