Message boards : CMS Application : Problems connecting to servers?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Lem Novantotto

Send message
Joined: 24 May 23
Posts: 48
Credit: 4,119,070
RAC: 18,377
Message 50554 - Posted: 15 Aug 2024, 17:53:10 UTC

I have two CMS tasks in this situation at the moment:

2024-08-15 19:39:54,399:ERROR:StageOutImpl:Attempt 1 to stage out failed.
Automatically retrying in 300 secs
 Error details:
<@========== WMException Start ==========@>
Exception Class: StageOutError
Message: Command exited non-zero, ExitCode:112
Output: stdout: Thu Aug 15 19:29:51 CEST 2024
WARNING Could not load the user credentials: impossible to open :  : error:02001002:system library:fopen:No such file or directory
WARNING Could not load the user credentials: impossible to open :  : error:02001002:system library:fopen:No such file or directory
WARNING (SEToken) Could not retrieve any token for https://vc-data-bridge.cern.ch/myfed/cms-output/store/unmerged/logs/prod/2024/8/15/ireid_TC_Backfill_IDR_CMS_Multi_240811_202049_5978/BPH_RunIISummer20UL18GEN_00262_0/0003/0/31d91523-0926-464c-bf63-fccaf9921303-550-0-logArchive.tar.gz
WARNING Could not load the user credentials: impossible to open :  : error:02001002:system library:fopen:No such file or directory
WARNING Could not load the user credentials: impossible to open :  : error:02001002:system library:fopen:No such file or directory
WARNING (SEToken) Could not retrieve any token for https://vc-data-bridge.cern.ch/myfed/cms-output/store/unmerged/logs/prod/2024/8/15/ireid_TC_Backfill_IDR_CMS_Multi_240811_202049_5978/BPH_RunIISummer20UL18GEN_00262_0/0003/0/31d91523-0926-464c-bf63-fccaf9921303-550-0-logArchive.tar.gz
Copying 1465769 bytes file:///srv/job/WMTaskSpace/logArch1/logArchive.tar.gz => https://vc-data-bridge.cern.ch/myfed/cms-output/store/unmerged/logs/prod/2024/8/15/ireid_TC_Backfill_IDR_CMS_Multi_240811_202049_5978/BPH_RunIISummer20UL18GEN_00262_0/0003/0/31d91523-0926-464c-bf63-fccaf9921303-550-0-logArchive.tar.gz
event: [1723743292759] BOTH   GFAL2:CORE:COPY	LIST:ENTER	
event: [1723743292759] BOTH   GFAL2:CORE:COPY	LIST:ITEM	file:///srv/job/WMTaskSpace/logArch1/logArchive.tar.gz => https://vc-data-bridge.cern.ch/myfed/cms-output/store/unmerged/logs/prod/2024/8/15/ireid_TC_Backfill_IDR_CMS_Multi_240811_202049_5978/BPH_RunIISummer20UL18GEN_00262_0/0003/0/31d91523-0926-464c-bf63-fccaf9921303-550-0-logArchive.tar.gz
event: [1723743292759] BOTH   GFAL2:CORE:COPY	LIST:EXIT	
event: [1723743292759] BOTH   http_plugin	PREPARE:ENTER	file:///srv/job/WMTaskSpace/logArch1/logArchive.tar.gz => https://vc-data-bridge.cern.ch/myfed/cms-output/store/unmerged/logs/prod/2024/8/15/ireid_TC_Backfill_IDR_CMS_Multi_240811_202049_5978/BPH_RunIISummer20UL18GEN_00262_0/0003/0/31d91523-0926-464c-bf63-fccaf9921303-550-0-logArchive.tar.gz
WARNING Could not load the user credentials: impossible to open :  : error:02001002:system library:fopen:No such file or directory
WARNING Could not load the user credentials: impossible to open :  : error:02001002:system library:fopen:No such file or directory
WARNING (SEToken) Could not retrieve any token for https://vc-data-bridge.cern.ch/myfed/cms-output/store/unmerged/logs/prod/2024/8/15/ireid_TC_Backfill_IDR_CMS_Multi_240811_202049_5978/BPH_RunIISummer20UL18GEN_00262_0/0003/0/31d91523-0926-464c-bf63-fccaf9921303-550-0-logArchive.tar.gz
gfal-copy exit status: 112
ERROR: gfal-copy exited with 112
Cleaning up failed file:
Thu Aug 15 19:37:23 CEST 2024
https://vc-data-bridge.cern.ch/myfed/cms-output/store/unmerged/logs/prod/2024/8/15/ireid_TC_Backfill_IDR_CMS_Multi_240811_202049_5978/BPH_RunIISummer20UL18GEN_00262_0/0003/0/31d91523-0926-464c-bf63-fccaf9921303-550-0-logArchive.tar.gz	FAILED

stderr: /srv/startup_environment.sh: line 3: BASHOPTS: readonly variable
/srv/startup_environment.sh: line 10: BASH_VERSINFO: readonly variable
/srv/startup_environment.sh: line 33: EUID: readonly variable
/srv/startup_environment.sh: line 148: PPID: readonly variable
/srv/startup_environment.sh: line 156: SHELLOPTS: readonly variable
/srv/startup_environment.sh: line 173: UID: readonly variable
/srv/startup_environment.sh: line 203: syntax error near unexpected token `('
/srv/startup_environment.sh: line 203: `export probe_cvmfs_repos () '
gfal-copy error: 112 (Host is down) - DESTINATION OVERWRITE   Result Could not connect to server after 1 attempts
/srv/startup_environment.sh: line 3: BASHOPTS: readonly variable
/srv/startup_environment.sh: line 10: BASH_VERSINFO: readonly variable
/srv/startup_environment.sh: line 33: EUID: readonly variable
/srv/startup_environment.sh: line 148: PPID: readonly variable
/srv/startup_environment.sh: line 156: SHELLOPTS: readonly variable
/srv/startup_environment.sh: line 173: UID: readonly variable
/srv/startup_environment.sh: line 203: syntax error near unexpected token `('
/srv/startup_environment.sh: line 203: `export probe_cvmfs_repos () '
gfal-rm error: 112 (Host is down) - Result Could not connect to server after 1 attempts
 
	ClassName : None
	ModuleName : WMCore.Storage.StageOutError
	MethodName : __init__
	ClassInstance : None
	FileName : /srv/job/WMCore.zip/WMCore/Storage/StageOutError.py
	LineNumber : 32
	ErrorNr : 0
	Command : #!/bin/bash
env -i X509_USER_PROXY=$X509_USER_PROXY JOBSTARTDIR=$JOBSTARTDIR bash -c '. $JOBSTARTDIR/startup_environment.sh; date; gfal-copy -t 2400 -T 2400 -p -v --abort-on-failure   file:///srv/job/WMTaskSpace/logArch1/logArchive.tar.gz https://vc-data-bridge.cern.ch/myfed/cms-output/store/unmerged/logs/prod/2024/8/15/ireid_TC_Backfill_IDR_CMS_Multi_240811_202049_5978/BPH_RunIISummer20UL18GEN_00262_0/0003/0/31d91523-0926-464c-bf63-fccaf9921303-550-0-logArchive.tar.gz'
            EXIT_STATUS=$?
            echo "gfal-copy exit status: $EXIT_STATUS"
            if [[ $EXIT_STATUS != 0 ]]; then
               echo "ERROR: gfal-copy exited with $EXIT_STATUS"
               echo "Cleaning up failed file:"
               env -i X509_USER_PROXY=$X509_USER_PROXY JOBSTARTDIR=$JOBSTARTDIR bash -c '. $JOBSTARTDIR/startup_environment.sh; date; gfal-rm -t 600 https://vc-data-bridge.cern.ch/myfed/cms-output/store/unmerged/logs/prod/2024/8/15/ireid_TC_Backfill_IDR_CMS_Multi_240811_202049_5978/BPH_RunIISummer20UL18GEN_00262_0/0003/0/31d91523-0926-464c-bf63-fccaf9921303-550-0-logArchive.tar.gz '
            fi
            exit $EXIT_STATUS
            
	ExitCode : 112
	ErrorCode : 60311
	ErrorType : GeneralStageOutFailure

Traceback: 

<@---------- WMException End ----------@>


Bye.
ID: 50554 · Report as offensive     Reply Quote
Lem Novantotto

Send message
Joined: 24 May 23
Posts: 48
Credit: 4,119,070
RAC: 18,377
Message 50555 - Posted: 15 Aug 2024, 18:10:26 UTC - in response to Message 50554.  

It looks like they're running again, now.

Bye.
ID: 50555 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1072
Credit: 8,401,013
RAC: 5,916
Message 50558 - Posted: 20 Aug 2024, 14:05:27 UTC - in response to Message 50555.  

Unfortunately, that's a known problem. Even more unfortunately, there's no known solution as yet... It doesn't affect many jobs -- less than 200 in the last 5,000 -- but we are looking for ways to improve the situation.
ID: 50558 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1193
Credit: 58,946,960
RAC: 63,211
Message 50574 - Posted: 25 Aug 2024, 20:09:32 UTC - in response to Message 50558.  

I see we had a few of these today but maybe it has been updated ( I stopped running them myself)
https://lhcathome.cern.ch/lhcathome/result.php?resultid=413805611

https://lhcathome.cern.ch/lhcathome/result.php?resultid=413485212

VM Completion Message: Could not connect to all required network services
ID: 50574 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2260
Credit: 175,581,097
RAC: 15,522
Message 50830 - Posted: 17 Oct 2024, 8:03:07 UTC
Last modified: 17 Oct 2024, 8:22:55 UTC

Overnight 5 Jobs inside the CMS-Task finished.
The 6. finished 6:28 UTC, after this the Task doing nothing.
2024-10-17 08:28:54,195:INFO:Report:addOutputFile method fileRef: , whole tree: {}
2024-10-17 08:28:54,195:INFO:LogArchive:Success job! Not saving its logs to CERN EOS recent area.
2024-10-17 08:28:54,196:INFO:LogArchive:Steps.Executors.LogArchive.post called
2024-10-17 08:28:54,197:INFO:ExecuteMaster:StepName: logArch1, StepType: LogArchive, with result: 0
2024-10-17 08:28:54,197:INFO:Watchdog:MonitorThread: JobEnded
2024-10-17 08:28:54,197:INFO:Watchdog:MonitorState: Shutdown called
2024-10-17 08:28:54,197:INFO:Startup:Completing task at directory: /srv/job/WMTaskSpace
2024-10-17 08:28:54,198:INFO:WMTask:Looking for master report at /srv/job/WMTaskSpace/../../Report.0.pkl
2024-10-17 08:28:54,198:INFO:WMTask: found it!
2024-10-17 08:28:54,198:INFO:WMTask:Looking for a taskStep report at /srv/job/WMTaskSpace/cmsRun1/Report.pkl
2024-10-17 08:28:54,198:INFO:WMTask: found it!
2024-10-17 08:28:54,199:INFO:WMTask:Looking for a taskStep report at /srv/job/WMTaskSpace/stageOut1/Report.pkl
2024-10-17 08:28:54,199:INFO:WMTask: found it!
2024-10-17 08:28:54,199:INFO:WMTask:Looking for a taskStep report at /srv/job/WMTaskSpace/logArch1/Report.pkl
2024-10-17 08:28:54,200:INFO:WMTask: found it!
2024-10-17 08:28:54,200:INFO:Startup:Shutting down monitor

Does this Task waiting for the 18 hour shutdown?

ISP had a disconnect at 06:20 UTC. This was the reason, sorry.
ID: 50830 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2260
Credit: 175,581,097
RAC: 15,522
Message 50831 - Posted: 17 Oct 2024, 9:23:06 UTC - in response to Message 50830.  

Finished 8:36 UTC.
This evening activating a new CMS-Task for overnight.
ID: 50831 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2260
Credit: 175,581,097
RAC: 15,522
Message 50837 - Posted: 18 Oct 2024, 7:51:09 UTC - in response to Message 50831.  
Last modified: 18 Oct 2024, 7:52:59 UTC

Finished 8:36 UTC.
This evening activating a new CMS-Task for overnight.

Eight Jobs inside the Task finished successful!
Laufzeit 14 Stunden 0 min. 0 sek.
CPU Zeit 2 Tage 0 Stunden 32 min. 26 sek.
ID: 50837 · Report as offensive     Reply Quote
Saturn911

Send message
Joined: 3 Nov 12
Posts: 68
Credit: 150,046,597
RAC: 124,944
Message 51076 - Posted: 16 Nov 2024, 19:55:28 UTC

For now all WUs fail with
[ERROR] Could not connect to vocms0840.cern.ch on port 9618
ID: 51076 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1193
Credit: 58,946,960
RAC: 63,211
Message 51078 - Posted: 16 Nov 2024, 21:07:22 UTC
Last modified: 16 Nov 2024, 21:20:57 UTC

Same here......if I hadn't looked just now I would have lost 100's of these CMS
But I haven't checked them all yet so I might have
VM Completion Message: Could not connect to all required network services

(same thing at -dev btw)
ID: 51078 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 105
Credit: 26,099,112
RAC: 1,414
Message 51080 - Posted: 17 Nov 2024, 5:03:28 UTC

95 down with this. Here's the first, whatever happened to the server happened near the end of this unit
https://lhcathome.cern.ch/lhcathome/result.php?resultid=416589013

2024-11-16 20:16:38 (1131410): Guest Log: NCAT DEBUG: Using system default trusted CA certificates and those in /usr/share/ncat/ca-bundle.crt.
2024-11-16 20:16:38 (1131410): Guest Log: NCAT DEBUG: Unable to load trusted CA certificates from /usr/share/ncat/ca-bundle.crt: error:02001002:system library:fopen:No such file or directory
2024-11-16 20:16:38 (1131410): Guest Log: libnsock nsi_new2(): nsi_new (IOD #1)
2024-11-16 20:16:38 (1131410): Guest Log: libnsock nsock_connect_tcp(): TCP connection requested to 137.138.156.85:9618 (IOD #1) EID 8
2024-11-16 20:16:38 (1131410): Guest Log: libnsock nsock_trace_handler_callback(): Callback: CONNECT TIMEOUT for EID 8 [137.138.156.85:9618]
2024-11-16 20:16:38 (1131410): Guest Log: Ncat: Connection timed out.
2024-11-16 20:16:38 (1131410): Guest Log: [ERROR] Could not connect to vocms0840.cern.ch on port 9618
2024-11-16 20:16:38 (1131410): Guest Log: [INFO] Testing connection to WMAgent
2024-11-16 20:16:39 (1131410): Guest Log: [INFO] Testing connection to EOSCMS
2024-11-16 20:16:39 (1131410): Guest Log: [INFO] Testing connection to CMS-Factory
2024-11-16 20:16:40 (1131410): Guest Log: [INFO] Testing connection to CMS-Frontier
2024-11-16 20:16:40 (1131410): Guest Log: [INFO] Testing connection to Frontier
2024-11-16 20:16:41 (1131410): Guest Log: [DEBUG] Check your firewall and your network load
2024-11-16 20:16:41 (1131410): Guest Log: [ERROR] Could not connect to all required network services
2024-11-16 20:16:41 (1131410): Guest Log: [DEBUG] Volunteer: Dark Angel (268818)
2024-11-16 20:16:41 (1131410): Guest Log: [INFO] Shutting Down.
2024-11-16 20:17:11 (1131410): VM Completion File Detected.
2024-11-16 20:17:11 (1131410): VM Completion Message: Could not connect to all required network services
ID: 51080 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1840
Credit: 126,183,857
RAC: 123,286
Message 51081 - Posted: 17 Nov 2024, 6:04:42 UTC - in response to Message 51076.  

For now all WUs fail with
[ERROR] Could not connect to vocms0840.cern.ch on port 9618
too bad that I didn't notice it until this morning - thousands of failing tasks on my 20 hosts all night long :-(
ID: 51081 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2260
Credit: 175,581,097
RAC: 15,522
Message 51082 - Posted: 17 Nov 2024, 9:57:37 UTC

2024-11-17 10:53:17 (29336): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat )
2024-11-17 10:53:17 (29336): Guest Log: NCAT DEBUG: Using system default trusted CA certificates and those in /usr/share/ncat/ca-bundle.crt.
2024-11-17 10:53:17 (29336): Guest Log: NCAT DEBUG: Unable to load trusted CA certificates from /usr/share/ncat/ca-bundle.crt: error:02001002:system library:fopen:No such file or directory
2024-11-17 10:53:17 (29336): Guest Log: libnsock nsi_new2(): nsi_new (IOD #1)
2024-11-17 10:53:17 (29336): Guest Log: libnsock nsock_connect_tcp(): TCP connection requested to 137.138.156.85:9618 (IOD #1) EID 8
2024-11-17 10:53:17 (29336): Guest Log: libnsock nsock_trace_handler_callback(): Callback: CONNECT TIMEOUT for EID 8 [137.138.156.85:9618]
ID: 51082 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1193
Credit: 58,946,960
RAC: 63,211
Message 51083 - Posted: 17 Nov 2024, 10:03:48 UTC

ID: 51083 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1072
Credit: 8,401,013
RAC: 5,916
Message 51084 - Posted: 17 Nov 2024, 13:53:13 UTC - in response to Message 51082.  

2024-11-17 10:53:17 (29336): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat )
2024-11-17 10:53:17 (29336): Guest Log: NCAT DEBUG: Using system default trusted CA certificates and those in /usr/share/ncat/ca-bundle.crt.
2024-11-17 10:53:17 (29336): Guest Log: NCAT DEBUG: Unable to load trusted CA certificates from /usr/share/ncat/ca-bundle.crt: error:02001002:system library:fopen:No such file or directory
2024-11-17 10:53:17 (29336): Guest Log: libnsock nsi_new2(): nsi_new (IOD #1)
2024-11-17 10:53:17 (29336): Guest Log: libnsock nsock_connect_tcp(): TCP connection requested to 137.138.156.85:9618 (IOD #1) EID 8
2024-11-17 10:53:17 (29336): Guest Log: libnsock nsock_trace_handler_callback(): Callback: CONNECT TIMEOUT for EID 8 [137.138.156.85:9618]

I'm not sure exactly what the problem is. vocms0840 is to be taken out of service, replaced by a newer AlmaLinux9 VM, but we've been waiting for confirmation that the updated scripts are ready for the changeover. Either the machine has been taken out of service without our being notified, or it's developed a communication problem.
ID: 51084 · Report as offensive     Reply Quote
Profile Guy
Avatar

Send message
Joined: 9 Feb 08
Posts: 55
Credit: 1,521,616
RAC: 3,319
Message 51086 - Posted: 17 Nov 2024, 20:50:02 UTC
Last modified: 17 Nov 2024, 20:51:21 UTC

My computer:
OpenSuSE Tumbleweed [6.11.8-1-default|libc 2.40]
i7-4790k, 32 GB RAM, 2 TB M.2 SSD, nVidia RTX 2060 (driver: 550.99 OpenCL: 3.0).
Virtualbox (7.1.4_SUSEr165100)
BOINC version 8.0.4
Full spec: 10860321

My PC looks like this -

CMS tasks are failing to start.
For example:
Task          Work unit     Computer      Sent                          Time reported   Status                  Run     CPU     Application
                                                                        or deadline                             time    time
416651829     227670244     10860321      17 Nov 2024, 19:45:47 UTC     20:07:00 UTC    Error while computing   167.04  24.57   CMS Simulation v70.30 (vbox64_mt_mcore_cms)
                                                                                                                                x86_64-pc-linux-gnu
416648976     227667391     10860321      17 Nov 2024, 18:29:17 UTC     19:45:47 UTC    Error while computing   161.43  22.69   CMS Simulation v70.30 (vbox64_mt_mcore_cms)
                                                                                                                                x86_64-pc-linux-gnu
416646675     227665093     10860321      17 Nov 2024, 17:19:14 UTC     18:23:04 UTC    Error while computing   156.96  23.88   CMS Simulation v70.30 (vbox64_mt_mcore_cms)
                                                                                                                                x86_64-pc-linux-gnu

stderr for above tasks:
416651829
416648976
416646675

The following error occurs towards the end of each of the above stderr outpouts:
...
2024-11-17 20:03:37 (14664): Guest Log: [INFO] Testing connection to HTCondor
2024-11-17 20:03:53 (14664): Guest Log: [DEBUG] Status run 1 of up to 3: 1
2024-11-17 20:04:14 (14664): Guest Log: [DEBUG] Status run 2 of up to 3: 1
2024-11-17 20:04:39 (14664): Guest Log: [DEBUG] Status run 3 of up to 3: 1
2024-11-17 20:04:39 (14664): Guest Log: [DEBUG] run 1
2024-11-17 20:04:39 (14664): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat )
2024-11-17 20:04:39 (14664): Guest Log: Ncat: Connection timed out.
2024-11-17 20:04:39 (14664): Guest Log: run 2
2024-11-17 20:04:39 (14664): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat )
2024-11-17 20:04:39 (14664): Guest Log: Ncat: Connection timed out.
2024-11-17 20:04:39 (14664): Guest Log: run 3
2024-11-17 20:04:39 (14664): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat )
2024-11-17 20:04:39 (14664): Guest Log: NCAT DEBUG: Using system default trusted CA certificates and those in /usr/share/ncat/ca-bundle.crt.
2024-11-17 20:04:39 (14664): Guest Log: NCAT DEBUG: Unable to load trusted CA certificates from /usr/share/ncat/ca-bundle.crt: error:02001002:system library:fopen:No such file or directory
2024-11-17 20:04:39 (14664): Guest Log: libnsock nsi_new2(): nsi_new (IOD #1)
2024-11-17 20:04:39 (14664): Guest Log: libnsock nsock_connect_tcp(): TCP connection requested to 137.138.156.85:9618 (IOD #1) EID 8
2024-11-17 20:04:39 (14664): Guest Log: libnsock nsock_trace_handler_callback(): Callback: CONNECT TIMEOUT for EID 8 [137.138.156.85:9618]
2024-11-17 20:04:39 (14664): Guest Log: Ncat: Connection timed out.
2024-11-17 20:04:39 (14664): Guest Log: [ERROR] Could not connect to vocms0840.cern.ch on port 9618
2024-11-17 20:04:39 (14664): Guest Log: [INFO] Testing connection to WMAgent
2024-11-17 20:04:39 (14664): Guest Log: [INFO] Testing connection to EOSCMS
2024-11-17 20:04:40 (14664): Guest Log: [INFO] Testing connection to CMS-Factory
2024-11-17 20:04:40 (14664): Guest Log: [INFO] Testing connection to CMS-Frontier
2024-11-17 20:04:40 (14664): Guest Log: [INFO] Testing connection to Frontier
2024-11-17 20:04:40 (14664): Guest Log: [DEBUG] Check your firewall and your network load
2024-11-17 20:04:40 (14664): Guest Log: [ERROR] Could not connect to all required network services
...

Any help with this would be welcome. Thanks.
ID: 51086 · Report as offensive     Reply Quote
Sputnik

Send message
Joined: 31 Oct 16
Posts: 2
Credit: 24,836,174
RAC: 13,863
Message 51087 - Posted: 17 Nov 2024, 21:35:54 UTC - in response to Message 51086.  
Last modified: 17 Nov 2024, 21:38:55 UTC

Hej and Hello.

I also got these errors today for all CMS tasks - they just run some minutes and then fail:

...
2024-11-17 21:03:46 (476): Guest Log: NCAT DEBUG: Using system default trusted CA certificates and those in /usr/share/ncat/ca-bundle.crt.
2024-11-17 21:03:46 (476): Guest Log: NCAT DEBUG: Unable to load trusted CA certificates from /usr/share/ncat/ca-bundle.crt: error:02001002:system library:fopen:No such file or directory
2024-11-17 21:03:46 (476): Guest Log: libnsock nsi_new2(): nsi_new (IOD #1)
2024-11-17 21:03:46 (476): Guest Log: libnsock nsock_connect_tcp(): TCP connection requested to 137.138.156.85:9618 (IOD #1) EID 8
2024-11-17 21:03:46 (476): Guest Log: libnsock nsock_trace_handler_callback(): Callback: CONNECT ERROR [Connection refused (111)] for EID 8 [137.138.156.85:9618]
2024-11-17 21:03:46 (476): Guest Log: Ncat: Connection refused.
2024-11-17 21:03:46 (476): Guest Log: [ERROR] Could not connect to vocms0840.cern.ch on port 9618


THX
Sputnik
ID: 51087 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2260
Credit: 175,581,097
RAC: 15,522
Message 51091 - Posted: 18 Nov 2024, 11:55:01 UTC

Ivan have the answer, one message before yours.
ID: 51091 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1072
Credit: 8,401,013
RAC: 5,916
Message 51093 - Posted: 18 Nov 2024, 16:18:38 UTC - in response to Message 51091.  

The definitive answer is that the firewall to the HTCondor machine was closed on Saturday, before the Submission Infrastructure team were able to activate its substitute (there seems to have been at least one mix-up with service tickets being misdirected). Sorry about this, it's one of the disadvantages of such a long "supply chain" where people responsible for one part don't necessarily know who is dependent on it downstream <:frowny face:>.
ID: 51093 · Report as offensive     Reply Quote
Profile Guy
Avatar

Send message
Joined: 9 Feb 08
Posts: 55
Credit: 1,521,616
RAC: 3,319
Message 51094 - Posted: 18 Nov 2024, 18:55:01 UTC - in response to Message 51093.  

Thanks Ivan.
I do appreciate you reconfirming this!

The definitive answer is...
ID: 51094 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1840
Credit: 126,183,857
RAC: 123,286
Message 51098 - Posted: 19 Nov 2024, 6:57:29 UTC - in response to Message 51093.  
Last modified: 19 Nov 2024, 6:59:22 UTC

The definitive answer is that the firewall to the HTCondor machine was closed on Saturday, before the Submission Infrastructure team were able to activate its substitute (there seems to have been at least one mix-up with service tickets being misdirected). Sorry about this, it's one of the disadvantages of such a long "supply chain" where people responsible for one part don't necessarily know who is dependent on it downstream <:frowny face:>.
Thanks, Ivan, for the information. Could well be that it will take a while until everything works again.
So: what about stopping the download queue until everything is straightened out?
ID: 51098 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : CMS Application : Problems connecting to servers?


©2025 CERN