Message boards :
CMS Application :
Problems with Theory and CMS on MacOS
Message board moderation
Author | Message |
---|---|
Send message Joined: 11 Apr 11 Posts: 23 Credit: 194,876 RAC: 57 ![]() ![]() |
Hello, Since I am trying to find the issue with the Theory and CMS applications and I do not get any feedback from the LHC team I write another post. Apperantly LHC does not want to fix the issues. Can you disable the Theory and CMS application for MacOS? Otherwise I let my computer run the LHC tasks with failing CMS and Theory while waiting for the working Atlas tasks. Please let me know something!! |
Send message Joined: 13 Jan 24 Posts: 11 Credit: 3,554,464 RAC: 17,000 ![]() ![]() ![]() |
Until your problem is resolved you could go to Project> Preferences> then select 'Edit preferences' at the bottom and change the 'Run only the selected applications' setting. You might first try resetting the project on that computer. ![]() |
![]() Send message Joined: 15 Jun 08 Posts: 2634 Credit: 272,025,947 RAC: 79,910 ![]() ![]() |
The missing heartbeat is not the root cause of your problems. Instead, it's a result of earlier errors that need to be identified. It could be helpful to understand how the heartbeat method works. It is identical on Linux/MacOS/Windows and has not been changed for years. 1. VirtualBox must provide a shared folder on the host 2. The VM must call a command to mount the shared folder 3. The VM must download certain scripts from CVMFS (bootstrap) 4. The bootstrap script adds a cron job inside the VM which once a minute touches the heartbeat file from within the VM 5. Vboxwrapper periodically checks the status (st_mtime) of the heartbeat file on the host side and reacts if st_mtime doesn't update Check the logs to see if these steps succeeded As for (1.) This looks good 2025-02-11 20:32:01 (72295): Enabling shared directory for VM. . . . 2025-02-11 20:32:01 (72295): Command: VBoxManage -q sharedfolder add "boinc_42fdb658fcffee8c" --name "shared" --hostpath "/Library/Application Support/BOINC Data/slots/0/shared" Exit Code: 0 As for (2.) and (3.) Taken from another user's log. This is missing in your logs pointing out the VM starts but then hangs. 2025-02-15 17:18:43 (6244): Guest Log: [INFO] Mounting the shared directory 2025-02-15 17:18:43 (6244): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor 2025-02-15 17:18:43 (6244): Guest Log: [INFO] Sourcing essential functions from /cvmfs/grid.cern.ch The overall picture suggests your VM doesn't contact CVMFS to download bootstrap. Since the cron job is not active vboxwrapper finally shuts down the VM as intended. As it works fine on Linux/Windows it appears to be a local issue - may be caused by the sandbox feature used on MacOS - may be caused by a firewall that doesn't allow connections to CVMFS - may be caused by other reasons |
![]() Send message Joined: 15 Jun 08 Posts: 2634 Credit: 272,025,947 RAC: 79,910 ![]() ![]() |
This PR on github describes issues on MacOS: https://github.com/BOINC/boinc/pull/6088 It is primarily related to GPU usage but network connections are also tied to a user. Since CVMFS and even the local shared folder mounts use network functionality it might be worth to investigate if there's a relationship. |
Send message Joined: 11 Apr 11 Posts: 23 Credit: 194,876 RAC: 57 ![]() ![]() |
I found the issue! I copied the VM to another folder so it won't get deleted by BOINC. I changed the network configuration from NAT to bridged. Connection for time with NTP works, sourcing functions works, heartbeat file is created and job is running! With switching back to the wrapper + boinc, the network connections are made. The task is running but the heartbeat file is not updated anymore. |
Send message Joined: 11 Apr 11 Posts: 23 Credit: 194,876 RAC: 57 ![]() ![]() |
I added the following line to CMS_2025_01_16_prod.xml: <network_bridged_mode/> Now the heartbeat file is created, updated and the job is running. (Remote Desktop working but the show graphics is not working :( ) Here are all the files of my first test which succeeded to run after the NIC change from NAT to Bridged. https://drive.google.com/drive/folders/1Ogw7XQcV-cgEePuEDK0GJsxmgQ53qMSn?usp=share_link Link to result: https://lhcathome.cern.ch/lhcathome/result.php?resultid=419673918 |
![]() Send message Joined: 15 Jun 08 Posts: 2634 Credit: 272,025,947 RAC: 79,910 ![]() ![]() |
This is taken from your log: Command: VBoxManage -q showvminfo "boinc_7e2d147d962cddcd" --machinereadable bridgeadapter1="en0: Wi-Fi" nic1="bridged" Looks like your host is connected via wi-fi. This is known to be problematic, especially on MacOS if the guest is set to bridged mode (see the VirtualBox forum). Better to connect the host via cable. Then leave the VM network at NAT and find out why that is not working. This worked for years and according to your earlier posts also on your host (ATLAS). Some settings you should check: Is IPv4 enabled on your host/LAN? If yes, which address range does it use? Ensure it does not conflict with 10.0.2.0/24 which is used as default by VirtualBox. The log from the example you mentioned shows it finally failed. Please verify: Since you copied the VM a couple of times to switch the network settings it is not clear under wich user account if finally ran. Could have been "nentech" since tis name is mentioned in the Hypervisor System Log but it should have been "boinc_project". Check/ensure the user account running vboxheadless has write permission to ".../slots/n/shared/". |
Send message Joined: 11 Apr 11 Posts: 23 Credit: 194,876 RAC: 57 ![]() ![]() |
I will check that later. Wired network is hard for me to test on my Mac. Hereby the last result without copying bur changing the task XML to bridged: [url] https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=230425509[/url] |
Send message Joined: 11 Apr 11 Posts: 23 Credit: 194,876 RAC: 57 ![]() ![]() |
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=230425509 https://lhcathome.cern.ch/lhcathome/result.php?resultid=419674064 I did put all files from a running task in my Google Drive folder: https://drive.google.com/drive/folders/1Ogw7XQcV-cgEePuEDK0GJsxmgQ53qMSn?usp=sharing The files in Working Task are the files of the task which is running now. |
![]() Send message Joined: 15 Jun 08 Posts: 2634 Credit: 272,025,947 RAC: 79,910 ![]() ![]() |
As already mentioned it would be better to solve the issues with NAT instead of using bridged mode. Bridged mode will not become the default for good reasons and using it during tests is not helpful to find the root cause. I suggest to open an issue at BOINC github to make the MacOS experts there aware and maybe also an issue at the VirtualBox forum or their issue tracker. The latter already reports NAT issues for Windows which appear every now and then. |
Send message Joined: 11 Apr 11 Posts: 23 Credit: 194,876 RAC: 57 ![]() ![]() |
I did some other testing via the debug mode. I found out the following: The networking on my Mac is made via IPv6. The DNS is not working inside the VM. Ping via IP address works. When I change IPv6 on my Mac to manual mode, the DNS is working and the machine is starting. I just started my job via the BOINCmanager with IPv6 on manual and it is running now. Remote Desktop and the web application are working now :) |
![]() Send message Joined: 15 Jun 08 Posts: 2634 Credit: 272,025,947 RAC: 79,910 ![]() ![]() |
You have been asked not to switch to bridged mode. If you continue doing so further tests are pretty much useless. As for DNS: VirtualBox forwards the host's DNS servers to the VM. This is typically shown in your logs: 00:00:00.081170 dns-monitor HostDnsMonitor: new information 00:00:00.081183 dns-monitor server 1: 2001:b88:1002::10 00:00:00.081197 dns-monitor server 2: 2001:b88:1202::10 00:00:00.081211 dns-monitor server 3: 2001:730:3e42:1000::53 00:00:00.081223 dns-monitor server 4: 89.101.251.228 00:00:00.081234 dns-monitor server 5: 89.101.251.229 These seem to be public DNS servers which can't resolve IPs inside your LAN. The networking on my Mac is made via IPv6. Did you disable IPv4? If so, you should enable it since IPv6 doesn't provide NAT. |
Send message Joined: 11 Apr 11 Posts: 23 Credit: 194,876 RAC: 57 ![]() ![]() |
I had my tasks still running. I tried last night with the NAT. This 2 tasks did run with NAT enabled and IPv6 put in manual mode without any address set up: https://lhcathome.cern.ch/lhcathome/result.php?resultid=419720786 https://lhcathome.cern.ch/lhcathome/result.php?resultid=419719592 IPv4 was always enabled. I did remove the DNS servers and added 4 DNS servers. 2 IPv4 and 2 IPv6. Now the CMS task is working normally. 2606:4700:4700::64 2606:4700:4700::6400 1.1.1.1 8.8.8.8 |
![]() Send message Joined: 15 Jun 08 Posts: 2634 Credit: 272,025,947 RAC: 79,910 ![]() ![]() |
@Mastha-Hacker Looks good now. Especially these logs: https://lhcathome.cern.ch/lhcathome/result.php?resultid=419737228 https://lhcathome.cern.ch/lhcathome/result.php?resultid=419742735 https://lhcathome.cern.ch/lhcathome/result.php?resultid=419745434 Just to verify the original issues are solved... You ran the task in a sandbox environment under username=boinc_project? This is a must on MacOS. From the logfile: 2025-02-21 04:34:30 (17875): Detected: Sandbox Configuration Enabled Please confirm: [Yes|No] You used vboxwrapper's default network configuration, i.e. NAT? From the logfile: 2025-02-21 04:34:31 (17875): Setting Network Configuration for NAT. Please confirm: [Yes|No] You left the heartbeat check activated as intended by the project and the task regularly updates the heartbeat file in '.../slots/n/shared/'? From the logfile: 2025-02-21 04:34:30 (17875): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds) Please confirm: [Yes|No] |
Send message Joined: 11 Apr 11 Posts: 23 Credit: 194,876 RAC: 57 ![]() ![]() |
@Mastha-Hacker Yes
Yes. I used the latest VBoxwrapper. (26209)
Yes So the cause of the last error was a (for VirtualBox) faulty DNS configuration on MacOS.[/quote] |
Send message Joined: 11 Apr 11 Posts: 23 Credit: 194,876 RAC: 57 ![]() ![]() |
The ATLAS and Theory tasks are also working! :) |
©2025 CERN