Message boards : CMS Application : CMS Exits with Error 208
Message board moderation

To post messages, you must log in.

AuthorMessage
seanr22a

Send message
Joined: 29 Nov 18
Posts: 40
Credit: 2,580,683
RAC: 985
Message 51650 - Posted: 7 Mar 2025, 18:02:44 UTC

Is Error 208 still a problem? I read 2 months old posts there people were having this problem.

Boinc sets error 208 and the VM exits with:
2025-03-07 18:19:58 (349658): Guest Log: [INFO] Shutting Down.
2025-03-07 18:20:28 (349658): VM Completion File Detected.
2025-03-07 18:20:28 (349658): VM Completion Message: glidein exited with return value 1.
2025-03-07 18:20:28 (349658): Powering off VM.

Only thing I can find in the log (if it is something important): 2025-03-07 17:55:20 (349658): Guest Log: [DEBUG] Could not download a wpad.dat from lhchomeproxy.{cern.ch|fnal.gov}

I've been running around 260 CMS jobs last week/this week without any issues so this is new to me.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=420025070

<core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
process exited with code 208 (0xd0, -48)</message>
<stderr_txt>
2025-03-07 17:53:21 (349658): vboxwrapper version 26208
2025-03-07 17:53:21 (349658): BOINC client version: 8.0.2
2025-03-07 17:53:23 (349658): Detected: VirtualBox VboxManage Interface (Version: 7.1.6)
2025-03-07 17:53:23 (349658): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2025-03-07 17:53:23 (349658): Successfully copied 'init_data.xml' to the shared directory.
2025-03-07 17:53:23 (349658): Create VM. (boinc_cc90ad06bd34ae64, slot#7)
2025-03-07 17:53:23 (349658): Setting Memory Size for VM. (4584MB)
2025-03-07 17:53:23 (349658): Setting CPU Count for VM. (4)
2025-03-07 17:53:23 (349658): Setting Chipset Options for VM.
2025-03-07 17:53:23 (349658): Setting Graphics Controller Options for VM.
2025-03-07 17:53:23 (349658): Setting Boot Options for VM.
2025-03-07 17:53:23 (349658): Setting Network Configuration for NAT.
2025-03-07 17:53:23 (349658): Enabling VM Network Access.
2025-03-07 17:53:23 (349658): Disabling USB Support for VM.
2025-03-07 17:53:23 (349658): Disabling COM Port Support for VM.
2025-03-07 17:53:23 (349658): Disabling LPT Port Support for VM.
2025-03-07 17:53:23 (349658): Disabling Audio Support for VM.
2025-03-07 17:53:24 (349658): Disabling Clipboard Support for VM.
2025-03-07 17:53:24 (349658): Disabling Drag and Drop Support for VM.
2025-03-07 17:53:24 (349658): Adding storage controller(s) to VM.
2025-03-07 17:53:24 (349658): Adding virtual disk drive to VM. (CMS_2025_01_16_prod.vdi)
2025-03-07 17:53:25 (349658): Attempts: 2
2025-03-07 17:53:25 (349658): Adding VirtualBox Guest Additions to VM.
2025-03-07 17:53:25 (349658): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
2025-03-07 17:53:25 (349658): forwarding host port 43229 to guest port 80
2025-03-07 17:53:25 (349658): Enabling remote desktop for VM.
2025-03-07 17:53:25 (349658): Required extension pack not installed, remote desktop not enabled.
2025-03-07 17:53:25 (349658): Enabling shared directory for VM.
2025-03-07 17:53:26 (349658): Starting VM using VBoxManage interface. (boinc_cc90ad06bd34ae64, slot#7)
2025-03-07 17:53:27 (349658): Successfully started VM. (PID = '353524')
2025-03-07 17:53:27 (349658): Reporting VM Process ID to BOINC.
2025-03-07 17:53:27 (349658): Guest Log: BIOS: VirtualBox 7.1.6
2025-03-07 17:53:27 (349658): Guest Log: CPUID EDX: 0x178bfbff
2025-03-07 17:53:27 (349658): Guest Log: BIOS: No PCI IDE controller, not probing IDE
2025-03-07 17:53:27 (349658): Guest Log: BIOS: AHCI 0-P#0: PCHS=16383/16/63 LCHS=1024/255/63 0x0000000002800000 sectors
2025-03-07 17:53:27 (349658): VM state change detected. (old = 'poweredoff', new = 'running')
2025-03-07 17:53:27 (349658): Detected: Web Application Enabled (http://localhost:43229)
2025-03-07 17:53:27 (349658): Preference change detected
2025-03-07 17:53:27 (349658): Setting CPU throttle for VM. (100%)
2025-03-07 17:53:27 (349658): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds))
2025-03-07 17:53:29 (349658): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2025-03-07 17:53:29 (349658): Guest Log: BIOS: Booting from Hard Disk...
2025-03-07 17:53:31 (349658): Guest Log: BIOS: KBD: unsupported int 16h function 03
2025-03-07 17:53:31 (349658): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000
2025-03-07 17:54:21 (349658): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds
2025-03-07 17:54:21 (349658): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
2025-03-07 17:54:22 (349658): Guest Log: VBoxService 5.2.6 r120293 (verbosity: 0) linux.amd64 (Jan 15 2018 14:51:00) release log
2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.000142 main Log opened 2025-03-07T16:54:22.063682000Z
2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.000254 main OS Product: Linux
2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.000280 main OS Release: 4.14.232-19.cernvm.x86_64
2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.000299 main OS Version: #1 SMP Fri Apr 30 17:12:25 CEST 2021
2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.000316 main Executable: /usr/sbin/VBoxService
2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.000316 main Process ID: 2285
2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.000317 main Package type: LINUX_64BITS_GENERIC
2025-03-07 17:54:22 (349658): Guest Log: 00:00:00.003826 main 5.2.6 r120293 started. Verbose level = 0
2025-03-07 17:54:35 (349658): Guest Log: [INFO] Mounting the shared directory
2025-03-07 17:54:35 (349658): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor
2025-03-07 17:54:36 (349658): Guest Log: [INFO] Sourcing essential functions from /cvmfs/grid.cern.ch
2025-03-07 17:54:36 (349658): Guest Log: [INFO] Testing connection to cern.ch
2025-03-07 17:54:36 (349658): Guest Log: [INFO] Testing connection to VCCS
2025-03-07 17:54:37 (349658): Guest Log: [INFO] Testing connection to HTCondor
2025-03-07 17:54:37 (349658): Guest Log: [INFO] Testing connection to WMAgent
2025-03-07 17:54:37 (349658): Guest Log: [INFO] Testing connection to EOSCMS
2025-03-07 17:54:38 (349658): Guest Log: [INFO] Testing connection to CMS-Factory
2025-03-07 17:54:38 (349658): Guest Log: [INFO] Testing connection to CMS-Frontier
2025-03-07 17:54:38 (349658): Guest Log: [INFO] Testing connection to Frontier
2025-03-07 17:55:20 (349658): Guest Log: [DEBUG] Could not download a wpad.dat from lhchomeproxy.{cern.ch|fnal.gov}
2025-03-07 17:55:20 (349658): Guest Log: [INFO] Got a proxy from the local BOINC client
2025-03-07 17:55:20 (349658): Guest Log: [INFO] Will use it for CVMFS and Frontier
2025-03-07 17:55:21 (349658): Guest Log: [INFO] Reloading and probing the CVMFS configuration
2025-03-07 17:55:28 (349658): Guest Log: [INFO] Probing /cvmfs/grid.cern.ch... OK
2025-03-07 17:55:31 (349658): Guest Log: [INFO] Probing /cvmfs/oasis.opensciencegrid.org... OK
2025-03-07 17:55:31 (349658): Guest Log: [INFO] Probing /cvmfs/cms-ib.cern.ch... OK
2025-03-07 17:55:32 (349658): Guest Log: [INFO] Probing /cvmfs/cms.cern.ch... OK
2025-03-07 17:55:32 (349658): Guest Log: [INFO] Probing /cvmfs/singularity.opensciencegrid.org... OK
2025-03-07 17:55:33 (349658): Guest Log: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY
2025-03-07 17:55:33 (349658): Guest Log: [INFO] 2.7.2.0 http://s1swinburne-cvmfs.openhtc.io:8080 http://192.168.11.5:3128
2025-03-07 17:55:33 (349658): Guest Log: [INFO] Environment HTTP proxy: http://192.168.11.5:3128
2025-03-07 17:55:34 (349658): Guest Log: [INFO] Reading volunteer information
2025-03-07 17:55:43 (349658): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2025-03-07 17:55:46 (349658): Guest Log: [INFO] Requesting an idtoken from LHC@home
2025-03-07 17:55:47 (349658): Guest Log: [INFO] CMS application starting. Check log files.
2025-03-07 18:19:58 (349658): Guest Log: [ERROR] glidein exited with return value 1.
2025-03-07 18:19:58 (349658): Guest Log: [DEBUG] Volunteer: seanr22a (579282)
2025-03-07 18:19:58 (349658): Guest Log: [INFO] Shutting Down.
2025-03-07 18:20:28 (349658): VM Completion File Detected.
2025-03-07 18:20:28 (349658): VM Completion Message: glidein exited with return value 1.
.
2025-03-07 18:20:28 (349658): Powering off VM.
2025-03-07 18:20:28 (349658): Successfully stopped VM.
2025-03-07 18:20:28 (349658): Deregistering VM. (boinc_cc90ad06bd34ae64, slot#7)
2025-03-07 18:20:28 (349658): Removing network bandwidth throttle group from VM.
2025-03-07 18:20:28 (349658): Removing VM from VirtualBox.
2025-03-07 18:20:33 (349658): called boinc_finish(208)

</stderr_txt>
]]>
ID: 51650 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2636
Credit: 277,148,051
RAC: 144,143
Message 51651 - Posted: 7 Mar 2025, 19:08:34 UTC - in response to Message 51650.  

2025-03-07 17:55:08 (349622): Guest Log: [DEBUG] Could not download a wpad.dat from lhchomeproxy.{cern.ch|fnal.gov}

This is very unusual since CERN and Fermilab share 4 instances of that service for load balancing and fallback.
I just testet them and all are responding.

Did you modify your local network setup?
Some routers can be configured to reject downloading wpad files.
Your logs on other computers from earlier this week do not show this debug message.
Is this still the case?
If so, it's a configuration issue on that computer and there's no real chance to solve this from here.

Anyway, your VMs use a local HTTP proxy.
A local proxy always has a higher priority than proxies from lhchomeproxy.{cern.ch|fnal.gov}.
2025-03-07 17:55:08 (349622): Guest Log: [INFO] Got a proxy from the local BOINC client
2025-03-07 17:55:08 (349622): Guest Log: [INFO] Will use it for CVMFS and Frontier


The log does not show any obvious error.
Just in case the network or VirtualBox somehow got stuck you may try if a reboot helps.


Another hint:
Please post only a few lines of the stderr.txt.
Those you think they may point out an error.
If you add a link to the relevant task we can check the log there (at least for a couple of days).
ID: 51651 · Report as offensive     Reply Quote
seanr22a

Send message
Joined: 29 Nov 18
Posts: 40
Credit: 2,580,683
RAC: 985
Message 51652 - Posted: 8 Mar 2025, 1:46:14 UTC - in response to Message 51651.  

The ATLAS and Theory apps run fine on this server except yesterday morning when power outage killed three Theory jobs.

This was the first time running CMS on this server. This server is in Thailand, the other ones I have in Sweden. Internet speeds to Europe is between 0.5Mb to 70Mb depending on the daily mode, ping to cern.sh 230ms. Maybe CMS is more sensitive to internet speeds/response times than ATLAS and Theory.

Found the possible WPAD issue, the squid proxy used the wrong DNS server. It used my local Pihole intended for phones/desktops to get rid of all annoying ads and it blocked WPAD.

Will reboot the server as soon the running Theory jobs is finished. When download only one CMS job and see what happens.

Thanks!
ID: 51652 · Report as offensive     Reply Quote
seanr22a

Send message
Joined: 29 Nov 18
Posts: 40
Credit: 2,580,683
RAC: 985
Message 51653 - Posted: 8 Mar 2025, 7:31:51 UTC - in response to Message 51652.  
Last modified: 8 Mar 2025, 8:24:03 UTC

Problem solved. It was two separate issues.

1. Wrong DNS server for the squid proxy at my Thai site. It used my local Pihole and it blocks wpad. Fix: Changed to the correct DNS in my firewall.
2. The download of wpad.dat triggers a Firewall rule at my Thai site and blocks the ip -> 2046211 ET INFO WinHttp AutoProxy Request wpad.dat Possible BadTunnel. Fix: disabled the rule for the Boinc server ip

--------------

Fixing only the DNS did not solve the problem so I tried to download wpad.dat manually,

Thai site:
root@pm111:~# wget http://lhchomeproxy.cern.ch/wpad.dat
--2025-03-08 07:47:56-- http://lhchomeproxy.cern.ch/wpad.dat
Resolving lhchomeproxy.cern.ch (lhchomeproxy.cern.ch)... 128.142.248.156, 128.142.35.143, 2001:1458:301:47::100:c, ...
Connecting to lhchomeproxy.cern.ch (lhchomeproxy.cern.ch)|128.142.248.156|:80... wget http://lhchomeproxy.cern.ch/wpad.datfailed: Connection timed out.
Connecting to lhchomeproxy.cern.ch (lhchomeproxy.cern.ch)|128.142.35.143|:80... failed: Connection timed out.
Connecting to lhchomeproxy.cern.ch (lhchomeproxy.cern.ch)|2001:1458:301:47::100:c|:80... failed: Network is unreachable.
Connecting to lhchomeproxy.cern.ch (lhchomeproxy.cern.ch)|2001:1458:301:73::100:9b|:80... failed: Network is unreachable.

Swedish site:
root@pm104:~# wget http://lhchomeproxy.cern.ch/wpad.dat
--2025-03-08 07:47:28-- http://lhchomeproxy.cern.ch/wpad.dat
Resolving lhchomeproxy.cern.ch (lhchomeproxy.cern.ch)... 128.142.248.156, 128.142.35.143, 2001:1458:301:73::100:9b, ...
Connecting to lhchomeproxy.cern.ch (lhchomeproxy.cern.ch)|128.142.248.156|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 258 [text/plain]
Saving to: ‘wpad.dat’
wpad.dat 100%[====================================================================================>] 258 --.-KB/s in 0s
2025-03-08 07:47:28 (28.7 MB/s) - ‘wpad.dat’ saved [258/258]

This pointed me to the firewall at the Thai site which I found had a lot of alerts for the 2046211 rule and the cern ip:s was blocked. At some point in time I had disabled this rule at the Swedish site but not at the Thai site. So it was as usual user error :)

CMS runs fine now at my Thai site!

[Edit]
I was happy to early, it crashed again with error 206 this time. It had triggered another firewall rule: 2031747 ET INFO Observed Interesting Content-Type Inbound (application/x-sh) causing it to block vocms0204.cern.ch

Ok, try again ...

[EDIT2]
10 minutes cpu time now .. fingers crossed
ID: 51653 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2636
Credit: 277,148,051
RAC: 144,143
Message 51654 - Posted: 8 Mar 2025, 7:39:02 UTC - in response to Message 51653.  

Internet speeds to Europe is between 0.5Mb to 70Mb depending on the daily mode, ping to cern.sh 230ms

Can you provide a bit more details?
Is this for your server in Thailand?
'0.5Mb to 70Mb': is it the usual megabits per second?
Is it download or upload (both are important)?
The span is rather high. Why?
Is it cable connected or any kind of wireless?
Is it directly connected to the internet or via a VPN?


It used my local Pihole

How much RAM does it have and what kind of 'disk' storage.
ID: 51654 · Report as offensive     Reply Quote
seanr22a

Send message
Joined: 29 Nov 18
Posts: 40
Credit: 2,580,683
RAC: 985
Message 51655 - Posted: 8 Mar 2025, 9:31:10 UTC - in response to Message 51654.  
Last modified: 8 Mar 2025, 9:36:03 UTC


Can you provide a bit more details?
Is this for your server in Thailand?
'0.5Mb to 70Mb': is it the usual megabits per second?
Is it download or upload (both are important)?
The span is rather high. Why?
Is it cable connected or any kind of wireless?
Is it directly connected to the internet or via a VPN?


This is a server I have in Thailand. The ones in Sweden runs just fine. I'm moving more and more stuff down here so the Swedish servers will be moved down here within a couple of years.

Internet within Thailand is fast but as soon you go outside at least to Europe it slows to a crawl. They probably set the traffic priority for an end-user like me to lowest possible when they route outside Thailand.

I have two fiber lines with 1Gb/1Gb each (two different ISP:s) Doing a speed test within Thailand I get around 920Mb up/down on both. All computers are wired 1Gb.

Download from Europe seems to be on the faster side but it all seems to depend on there the host are in Europe and time of day, an example: Boinc downloading the big CMS_2025_01_16_prod.vdi when you run the CMS project (I think it was around 1.5GB), download speed 0.5Mbit-1Mbit according to Boinc - it took almost an hour to get the file downloaded. So, the variation in speed is all over the place.

As I commented in my previous post it seems to be working now. I got one CMS job running for 4 hours now.


How much RAM does it have and what kind of 'disk' storage.

The Pihole is out of the loop now. My servers get DNS from the firewall but I had forgotten to do the same with the squid proxy VM. Now the squid proxy has DNS directly from the firewall.
ID: 51655 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2636
Credit: 277,148,051
RAC: 144,143
Message 51656 - Posted: 8 Mar 2025, 10:26:46 UTC - in response to Message 51655.  

Your internet connection is fast enough.

squid proxy VM

Looking at the #cores you have already attached and the #cores you plan to add it is recommended to run the Squid on a dedicated hardware rather than in a VM.
The #cores for that machine is less critical (>=4), but it should have at least 16GB RAM.
You could also use that machine as local DNS proxy.

If CVMFS and Frontier (inside the CMS VM) are correctly configured they connect to Cloudflare's CDN rather than directly to CERN.
That way you should get the data from a Cloudflare datacenter in Thailand (or close to it).
You can check this using wget:
wget --timeout=10 -4 -qdO- http://s1cern-cvmfs.openhtc.io/cvmfs/cvmfs-config.cern.ch/.cvmfspublished >/dev/null

Close to the end of the headers you find a line like this:
CF-RAY: 91d180c19bb99212-MUC
MUC stands for the IATA code of the datacenter answering the request,
In my example its Munich but may change depending on Cloudflare (I sometimes get others like AMS for Amsterdam or LHR for London).
ID: 51656 · Report as offensive     Reply Quote
seanr22a

Send message
Joined: 29 Nov 18
Posts: 40
Credit: 2,580,683
RAC: 985
Message 51657 - Posted: 8 Mar 2025, 12:17:06 UTC - in response to Message 51656.  
Last modified: 8 Mar 2025, 12:27:56 UTC


The #cores for that machine is less critical (>=4), but it should have at least 16GB RAM.

I had 2 cores/4GB memory - changed to 4 core 16GB memory for now at both sites.


If CVMFS and Frontier (inside the CMS VM) are correctly configured they connect to Cloudflare's CDN rather than directly to CERN.

Yes it's using the CDN CVMFS_USE_CDN=yes


That way you should get the data from a Cloudflare datacenter in Thailand (or close to it).

At the Thai site I get CF-RAY: 91d22df67daacde6-SIN. Maybe Singapore?
At the Swedish site I get CF-RAY: 91d241bf2fd89294-CPH. Maybe Copenhagen?
ID: 51657 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2636
Credit: 277,148,051
RAC: 144,143
Message 51659 - Posted: 8 Mar 2025, 14:20:11 UTC - in response to Message 51657.  

At the Thai site I get CF-RAY: 91d22df67daacde6-SIN. Maybe Singapore?

Yes.


At the Swedish site I get CF-RAY: 91d241bf2fd89294-CPH. Maybe Copenhagen?

Yes.
ID: 51659 · Report as offensive     Reply Quote

Message boards : CMS Application : CMS Exits with Error 208


©2025 CERN