Message boards :
CMS Application :
CMS Exits with Error 208
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Nov 18 Posts: 41 Credit: 2,644,024 RAC: 25 ![]() ![]() |
Is Error 208 still a problem? I read 2 months old posts there people were having this problem. Boinc sets error 208 and the VM exits with: 2025-03-07 18:19:58 (349658): Guest Log: [INFO] Shutting Down. 2025-03-07 18:20:28 (349658): VM Completion File Detected. 2025-03-07 18:20:28 (349658): VM Completion Message: glidein exited with return value 1. 2025-03-07 18:20:28 (349658): Powering off VM. Only thing I can find in the log (if it is something important): 2025-03-07 17:55:20 (349658): Guest Log: [DEBUG] Could not download a wpad.dat from lhchomeproxy.{cern.ch|fnal.gov} I've been running around 260 CMS jobs last week/this week without any issues so this is new to me. https://lhcathome.cern.ch/lhcathome/result.php?resultid=420025070 <core_client_version>8.0.2</core_client_version> <![CDATA[ <message> process exited with code 208 (0xd0, -48)</message> <stderr_txt> 2025-03-07 17:53:21 (349658): vboxwrapper version 26208 2025-03-07 17:53:21 (349658): BOINC client version: 8.0.2 2025-03-07 17:53:23 (349658): Detected: VirtualBox VboxManage Interface (Version: 7.1.6) 2025-03-07 17:53:23 (349658): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds) 2025-03-07 17:53:23 (349658): Successfully copied 'init_data.xml' to the shared directory. 2025-03-07 17:53:23 (349658): Create VM. (boinc_cc90ad06bd34ae64, slot#7) 2025-03-07 17:53:23 (349658): Setting Memory Size for VM. (4584MB) 2025-03-07 17:53:23 (349658): Setting CPU Count for VM. (4) 2025-03-07 17:53:23 (349658): Setting Chipset Options for VM. 2025-03-07 17:53:23 (349658): Setting Graphics Controller Options for VM. 2025-03-07 17:53:23 (349658): Setting Boot Options for VM. 2025-03-07 17:53:23 (349658): Setting Network Configuration for NAT. 2025-03-07 17:53:23 (349658): Enabling VM Network Access. 2025-03-07 17:53:23 (349658): Disabling USB Support for VM. 2025-03-07 17:53:23 (349658): Disabling COM Port Support for VM. 2025-03-07 17:53:23 (349658): Disabling LPT Port Support for VM. 2025-03-07 17:53:23 (349658): Disabling Audio Support for VM. 2025-03-07 17:53:24 (349658): Disabling Clipboard Support for VM. 2025-03-07 17:53:24 (349658): Disabling Drag and Drop Support for VM. 2025-03-07 17:53:24 (349658): Adding storage controller(s) to VM. 2025-03-07 17:53:24 (349658): Adding virtual disk drive to VM. (CMS_2025_01_16_prod.vdi) 2025-03-07 17:53:25 (349658): Attempts: 2 2025-03-07 17:53:25 (349658): Adding VirtualBox Guest Additions to VM. 2025-03-07 17:53:25 (349658): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB) 2025-03-07 17:53:25 (349658): forwarding host port 43229 to guest port 80 2025-03-07 17:53:25 (349658): Enabling remote desktop for VM. 2025-03-07 17:53:25 (349658): Required extension pack not installed, remote desktop not enabled. 2025-03-07 17:53:25 (349658): Enabling shared directory for VM. 2025-03-07 17:53:26 (349658): Starting VM using VBoxManage interface. (boinc_cc90ad06bd34ae64, slot#7) 2025-03-07 17:53:27 (349658): Successfully started VM. (PID = '353524') 2025-03-07 17:53:27 (349658): Reporting VM Process ID to BOINC. 2025-03-07 17:53:27 (349658): Guest Log: BIOS: VirtualBox 7.1.6 2025-03-07 17:53:27 (349658): Guest Log: CPUID EDX: 0x178bfbff 2025-03-07 17:53:27 (349658): Guest Log: BIOS: No PCI IDE controller, not probing IDE 2025-03-07 17:53:27 (349658): Guest Log: BIOS: AHCI 0-P#0: PCHS=16383/16/63 LCHS=1024/255/63 0x0000000002800000 sectors 2025-03-07 17:53:27 (349658): VM state change detected. (old = 'poweredoff', new = 'running') 2025-03-07 17:53:27 (349658): Detected: Web Application Enabled (http://localhost:43229) 2025-03-07 17:53:27 (349658): Preference change detected 2025-03-07 17:53:27 (349658): Setting CPU throttle for VM. (100%) 2025-03-07 17:53:27 (349658): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds)) 2025-03-07 17:53:29 (349658): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032 2025-03-07 17:53:29 (349658): Guest Log: BIOS: Booting from Hard Disk... 2025-03-07 17:53:31 (349658): Guest Log: BIOS: KBD: unsupported int 16h function 03 2025-03-07 17:53:31 (349658): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 2025-03-07 17:54:21 (349658): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds 2025-03-07 17:54:21 (349658): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) 2025-03-07 17:54:22 (349658): Guest Log: VBoxService 5.2.6 r120293 (verbosity: 0) linux.amd64 (Jan 15 2018 14:51:00) release log 2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.000142 main Log opened 2025-03-07T16:54:22.063682000Z 2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.000254 main OS Product: Linux 2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.000280 main OS Release: 4.14.232-19.cernvm.x86_64 2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.000299 main OS Version: #1 SMP Fri Apr 30 17:12:25 CEST 2021 2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.000316 main Executable: /usr/sbin/VBoxService 2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.000316 main Process ID: 2285 2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.000317 main Package type: LINUX_64BITS_GENERIC 2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.003826 main 5.2.6 r120293 started. Verbose level = 0 2025-03-07 17:54:35 (349658): Guest Log: [INFO] Mounting the shared directory 2025-03-07 17:54:35 (349658): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor 2025-03-07 17:54:36 (349658): Guest Log: [INFO] Sourcing essential functions from /cvmfs/grid.cern.ch 2025-03-07 17:54:36 (349658): Guest Log: [INFO] Testing connection to cern.ch 2025-03-07 17:54:36 (349658): Guest Log: [INFO] Testing connection to VCCS 2025-03-07 17:54:37 (349658): Guest Log: [INFO] Testing connection to HTCondor 2025-03-07 17:54:37 (349658): Guest Log: [INFO] Testing connection to WMAgent 2025-03-07 17:54:37 (349658): Guest Log: [INFO] Testing connection to EOSCMS 2025-03-07 17:54:38 (349658): Guest Log: [INFO] Testing connection to CMS-Factory 2025-03-07 17:54:38 (349658): Guest Log: [INFO] Testing connection to CMS-Frontier 2025-03-07 17:54:38 (349658): Guest Log: [INFO] Testing connection to Frontier 2025-03-07 17:55:20 (349658): Guest Log: [DEBUG] Could not download a wpad.dat from lhchomeproxy.{cern.ch|fnal.gov} 2025-03-07 17:55:20 (349658): Guest Log: [INFO] Got a proxy from the local BOINC client 2025-03-07 17:55:20 (349658): Guest Log: [INFO] Will use it for CVMFS and Frontier 2025-03-07 17:55:21 (349658): Guest Log: [INFO] Reloading and probing the CVMFS configuration 2025-03-07 17:55:28 (349658): Guest Log: [INFO] Probing /cvmfs/grid.cern.ch... OK 2025-03-07 17:55:31 (349658): Guest Log: [INFO] Probing /cvmfs/oasis.opensciencegrid.org... OK 2025-03-07 17:55:31 (349658): Guest Log: [INFO] Probing /cvmfs/cms-ib.cern.ch... OK 2025-03-07 17:55:32 (349658): Guest Log: [INFO] Probing /cvmfs/cms.cern.ch... OK 2025-03-07 17:55:32 (349658): Guest Log: [INFO] Probing /cvmfs/singularity.opensciencegrid.org... OK 2025-03-07 17:55:33 (349658): Guest Log: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY 2025-03-07 17:55:33 (349658): Guest Log: [INFO] 2.7.2.0 http://s1swinburne-cvmfs.openhtc.io:8080 http://192.168.11.5:3128 2025-03-07 17:55:33 (349658): Guest Log: [INFO] Environment HTTP proxy: http://192.168.11.5:3128 2025-03-07 17:55:34 (349658): Guest Log: [INFO] Reading volunteer information 2025-03-07 17:55:43 (349658): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2025-03-07 17:55:46 (349658): Guest Log: [INFO] Requesting an idtoken from LHC@home 2025-03-07 17:55:47 (349658): Guest Log: [INFO] CMS application starting. Check log files. 2025-03-07 18:19:58 (349658): Guest Log: [ERROR] glidein exited with return value 1. 2025-03-07 18:19:58 (349658): Guest Log: [DEBUG] Volunteer: seanr22a (579282) 2025-03-07 18:19:58 (349658): Guest Log: [INFO] Shutting Down. 2025-03-07 18:20:28 (349658): VM Completion File Detected. 2025-03-07 18:20:28 (349658): VM Completion Message: glidein exited with return value 1. . 2025-03-07 18:20:28 (349658): Powering off VM. 2025-03-07 18:20:28 (349658): Successfully stopped VM. 2025-03-07 18:20:28 (349658): Deregistering VM. (boinc_cc90ad06bd34ae64, slot#7) 2025-03-07 18:20:28 (349658): Removing network bandwidth throttle group from VM. 2025-03-07 18:20:28 (349658): Removing VM from VirtualBox. 2025-03-07 18:20:33 (349658): called boinc_finish(208) </stderr_txt> ]]> |
![]() Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,886,839 RAC: 54,793 ![]() ![]() |
2025-03-07 17:55:08 (349622): Guest Log: [DEBUG] Could not download a wpad.dat from lhchomeproxy.{cern.ch|fnal.gov} This is very unusual since CERN and Fermilab share 4 instances of that service for load balancing and fallback. I just testet them and all are responding. Did you modify your local network setup? Some routers can be configured to reject downloading wpad files. Your logs on other computers from earlier this week do not show this debug message. Is this still the case? If so, it's a configuration issue on that computer and there's no real chance to solve this from here. Anyway, your VMs use a local HTTP proxy. A local proxy always has a higher priority than proxies from lhchomeproxy.{cern.ch|fnal.gov}. 2025-03-07 17:55:08 (349622): Guest Log: [INFO] Got a proxy from the local BOINC client 2025-03-07 17:55:08 (349622): Guest Log: [INFO] Will use it for CVMFS and Frontier The log does not show any obvious error. Just in case the network or VirtualBox somehow got stuck you may try if a reboot helps. Another hint: Please post only a few lines of the stderr.txt. Those you think they may point out an error. If you add a link to the relevant task we can check the log there (at least for a couple of days). |
Send message Joined: 29 Nov 18 Posts: 41 Credit: 2,644,024 RAC: 25 ![]() ![]() |
The ATLAS and Theory apps run fine on this server except yesterday morning when power outage killed three Theory jobs. This was the first time running CMS on this server. This server is in Thailand, the other ones I have in Sweden. Internet speeds to Europe is between 0.5Mb to 70Mb depending on the daily mode, ping to cern.sh 230ms. Maybe CMS is more sensitive to internet speeds/response times than ATLAS and Theory. Found the possible WPAD issue, the squid proxy used the wrong DNS server. It used my local Pihole intended for phones/desktops to get rid of all annoying ads and it blocked WPAD. Will reboot the server as soon the running Theory jobs is finished. When download only one CMS job and see what happens. Thanks! |
Send message Joined: 29 Nov 18 Posts: 41 Credit: 2,644,024 RAC: 25 ![]() ![]() |
Problem solved. It was two separate issues. 1. Wrong DNS server for the squid proxy at my Thai site. It used my local Pihole and it blocks wpad. Fix: Changed to the correct DNS in my firewall. 2. The download of wpad.dat triggers a Firewall rule at my Thai site and blocks the ip -> 2046211 ET INFO WinHttp AutoProxy Request wpad.dat Possible BadTunnel. Fix: disabled the rule for the Boinc server ip -------------- Fixing only the DNS did not solve the problem so I tried to download wpad.dat manually, Thai site: root@pm111:~# wget http://lhchomeproxy.cern.ch/wpad.dat --2025-03-08 07:47:56-- http://lhchomeproxy.cern.ch/wpad.dat Resolving lhchomeproxy.cern.ch (lhchomeproxy.cern.ch)... 128.142.248.156, 128.142.35.143, 2001:1458:301:47::100:c, ... Connecting to lhchomeproxy.cern.ch (lhchomeproxy.cern.ch)|128.142.248.156|:80... wget http://lhchomeproxy.cern.ch/wpad.datfailed: Connection timed out. Connecting to lhchomeproxy.cern.ch (lhchomeproxy.cern.ch)|128.142.35.143|:80... failed: Connection timed out. Connecting to lhchomeproxy.cern.ch (lhchomeproxy.cern.ch)|2001:1458:301:47::100:c|:80... failed: Network is unreachable. Connecting to lhchomeproxy.cern.ch (lhchomeproxy.cern.ch)|2001:1458:301:73::100:9b|:80... failed: Network is unreachable. Swedish site: root@pm104:~# wget http://lhchomeproxy.cern.ch/wpad.dat --2025-03-08 07:47:28-- http://lhchomeproxy.cern.ch/wpad.dat Resolving lhchomeproxy.cern.ch (lhchomeproxy.cern.ch)... 128.142.248.156, 128.142.35.143, 2001:1458:301:73::100:9b, ... Connecting to lhchomeproxy.cern.ch (lhchomeproxy.cern.ch)|128.142.248.156|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 258 [text/plain] Saving to: ‘wpad.dat’ wpad.dat 100%[====================================================================================>] 258 --.-KB/s in 0s 2025-03-08 07:47:28 (28.7 MB/s) - ‘wpad.dat’ saved [258/258] This pointed me to the firewall at the Thai site which I found had a lot of alerts for the 2046211 rule and the cern ip:s was blocked. At some point in time I had disabled this rule at the Swedish site but not at the Thai site. So it was as usual user error :) CMS runs fine now at my Thai site! [Edit] I was happy to early, it crashed again with error 206 this time. It had triggered another firewall rule: 2031747 ET INFO Observed Interesting Content-Type Inbound (application/x-sh) causing it to block vocms0204.cern.ch Ok, try again ... [EDIT2] 10 minutes cpu time now .. fingers crossed |
![]() Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,886,839 RAC: 54,793 ![]() ![]() |
Internet speeds to Europe is between 0.5Mb to 70Mb depending on the daily mode, ping to cern.sh 230ms Can you provide a bit more details? Is this for your server in Thailand? '0.5Mb to 70Mb': is it the usual megabits per second? Is it download or upload (both are important)? The span is rather high. Why? Is it cable connected or any kind of wireless? Is it directly connected to the internet or via a VPN? It used my local Pihole How much RAM does it have and what kind of 'disk' storage. |
Send message Joined: 29 Nov 18 Posts: 41 Credit: 2,644,024 RAC: 25 ![]() ![]() |
This is a server I have in Thailand. The ones in Sweden runs just fine. I'm moving more and more stuff down here so the Swedish servers will be moved down here within a couple of years. Internet within Thailand is fast but as soon you go outside at least to Europe it slows to a crawl. They probably set the traffic priority for an end-user like me to lowest possible when they route outside Thailand. I have two fiber lines with 1Gb/1Gb each (two different ISP:s) Doing a speed test within Thailand I get around 920Mb up/down on both. All computers are wired 1Gb. Download from Europe seems to be on the faster side but it all seems to depend on there the host are in Europe and time of day, an example: Boinc downloading the big CMS_2025_01_16_prod.vdi when you run the CMS project (I think it was around 1.5GB), download speed 0.5Mbit-1Mbit according to Boinc - it took almost an hour to get the file downloaded. So, the variation in speed is all over the place. As I commented in my previous post it seems to be working now. I got one CMS job running for 4 hours now.
The Pihole is out of the loop now. My servers get DNS from the firewall but I had forgotten to do the same with the squid proxy VM. Now the squid proxy has DNS directly from the firewall. |
![]() Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,886,839 RAC: 54,793 ![]() ![]() |
Your internet connection is fast enough. squid proxy VM Looking at the #cores you have already attached and the #cores you plan to add it is recommended to run the Squid on a dedicated hardware rather than in a VM. The #cores for that machine is less critical (>=4), but it should have at least 16GB RAM. You could also use that machine as local DNS proxy. If CVMFS and Frontier (inside the CMS VM) are correctly configured they connect to Cloudflare's CDN rather than directly to CERN. That way you should get the data from a Cloudflare datacenter in Thailand (or close to it). You can check this using wget: wget --timeout=10 -4 -qdO- http://s1cern-cvmfs.openhtc.io/cvmfs/cvmfs-config.cern.ch/.cvmfspublished >/dev/null Close to the end of the headers you find a line like this: CF-RAY: 91d180c19bb99212-MUC MUC stands for the IATA code of the datacenter answering the request, In my example its Munich but may change depending on Cloudflare (I sometimes get others like AMS for Amsterdam or LHR for London). |
Send message Joined: 29 Nov 18 Posts: 41 Credit: 2,644,024 RAC: 25 ![]() ![]() |
I had 2 cores/4GB memory - changed to 4 core 16GB memory for now at both sites.
Yes it's using the CDN CVMFS_USE_CDN=yes
At the Thai site I get CF-RAY: 91d22df67daacde6-SIN. Maybe Singapore? At the Swedish site I get CF-RAY: 91d241bf2fd89294-CPH. Maybe Copenhagen? |
![]() Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,886,839 RAC: 54,793 ![]() ![]() |
At the Thai site I get CF-RAY: 91d22df67daacde6-SIN. Maybe Singapore? Yes. At the Swedish site I get CF-RAY: 91d241bf2fd89294-CPH. Maybe Copenhagen? Yes. |
©2025 CERN