Message boards : CMS Application : New network connectivity test
Message board moderation

To post messages, you must log in.

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1084
Credit: 9,230,050
RAC: 6,179
Message 51774 - Posted: 27 Mar 2025, 10:42:34 UTC

You may have noticed a new test in your task logs:
Testing connection to http://cms-frontier.openhtc.io:8080/FrontierProd/

This is to try to circumvent an occasional failure to connect to the Frontier ("conditions database") servers which appears to be related to poorly-configured network connections. We do a rudimentary connectivity test, but that doesn't involve data transfer. Hosts which pass this test can still fail to properly acquire data from the servers, leading them to cycle through the four Frontier servers (first using IPv4 protocol, then with IPv6) before failing. Typically this is repeated many times by the same host (and sometimes other hosts from the same volunteer!).
This failure is not obvious in the task log, so we have instituted a slightly more sophisticated test. If this fails, you will then see the message:
Check your firewall and your network load
in your log.
It will be several days before it becomes obvious to us how effective this test is.
ID: 51774 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1084
Credit: 9,230,050
RAC: 6,179
Message 51801 - Posted: 3 Apr 2025, 12:45:08 UTC - in response to Message 51774.  

OK, that test didn't winkle out the problem connections. We're trying something more complicated.
ID: 51801 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1084
Credit: 9,230,050
RAC: 6,179
Message 51803 - Posted: 3 Apr 2025, 14:22:49 UTC - in response to Message 51801.  

OK, that test didn't winkle out the problem connections. We're trying something more complicated.

Oops, there were typos and mistakes in the updated script. Hope it works now...
ID: 51803 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1868
Credit: 136,039,509
RAC: 89,765
Message 51804 - Posted: 3 Apr 2025, 14:54:13 UTC - in response to Message 51803.  

OK, that test didn't winkle out the problem connections. We're trying something more complicated.

Oops, there were typos and mistakes in the updated script. Hope it works now...
there still seems to be a problem - tasks error out after about 2 minutes:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=420976190
"2025-04-03 16:50:34 (8480): VM Completion Message: Could not connect to all required network services"
ID: 51804 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2634
Credit: 272,025,947
RAC: 92,648
Message 51805 - Posted: 3 Apr 2025, 15:09:48 UTC - in response to Message 51804.  

Same here.
The test (even the most recent modification) cause all tasks to fail.
ID: 51805 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1084
Credit: 9,230,050
RAC: 6,179
Message 51806 - Posted: 3 Apr 2025, 15:39:50 UTC - in response to Message 51805.  

Same here.
The test (even the most recent modification) cause all tasks to fail.

Yes, a substring in the URL went missing. It might take a while to get the fix into CVMFS, so better set No New Tasks for the time being.
ID: 51806 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1084
Credit: 9,230,050
RAC: 6,179
Message 51807 - Posted: 3 Apr 2025, 16:58:47 UTC - in response to Message 51806.  

Same here.
The test (even the most recent modification) cause all tasks to fail.

Yes, a substring in the URL went missing. It might take a while to get the fix into CVMFS, so better set No New Tasks for the time being.

OK, new script is in place. Manual test works. Try it! (I don't have a running BOINC machine at home...)
ID: 51807 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2634
Credit: 272,025,947
RAC: 92,648
Message 51808 - Posted: 3 Apr 2025, 17:06:39 UTC - in response to Message 51807.  

Works here.
ID: 51808 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1084
Credit: 9,230,050
RAC: 6,179
Message 51809 - Posted: 3 Apr 2025, 17:32:55 UTC - in response to Message 51808.  

Works here.

Good! I see running jobs are starting to increase again, but gauging the effect is going to take a few days.
ID: 51809 · Report as offensive     Reply Quote

Message boards : CMS Application : New network connectivity test


©2025 CERN