Message boards : CMS Application : CMS Simulation error
Message board moderation

To post messages, you must log in.

AuthorMessage
user

Send message
Joined: 18 Jun 14
Posts: 17
Credit: 10,539,030
RAC: 0
Message 47287 - Posted: 24 Sep 2022, 13:47:32 UTC

CMS Simulation error

hello. I have a question.

During the LHC project, nothing else is wrong, but only the CMS Simulation gives an operation error.

What are the causes and solutions?
ID: 47287 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 998
Credit: 6,264,307
RAC: 71
Message 47298 - Posted: 26 Sep 2022, 9:25:20 UTC - in response to Message 47287.  

CMS Simulation error

hello. I have a question.

During the LHC project, nothing else is wrong, but only the CMS Simulation gives an operation error.

What are the causes and solutions?

I don't have a definitive answer. Your tasks seem to be failing to acquire a job from the Condor pool, and the glidein fails after a 20-minute time-out:
2022-09-24 18:53:01 (2736): Guest Log: [INFO] CMS application starting. Check log files.
2022-09-24 19:13:21 (2736): Guest Log: [ERROR] glidein exited with return value 1.

I'll ask the glidein specialist what can cause time-outs leading to Error 1.
ID: 47298 · Report as offensive     Reply Quote
user

Send message
Joined: 18 Jun 14
Posts: 17
Credit: 10,539,030
RAC: 0
Message 47299 - Posted: 26 Sep 2022, 9:55:11 UTC - in response to Message 47298.  

Thank you for your answer. It doesn't seem like a common case...

I tried formatting, just in case, but still the same symptom.

Even later, if you have a solution, please share. :)
ID: 47299 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 998
Credit: 6,264,307
RAC: 71
Message 47304 - Posted: 26 Sep 2022, 15:18:37 UTC - in response to Message 47299.  

If you can do some debugging, it might help us.
    Start a CMS task within boincmgr
    Select the task in the Tasks tab, and click on Show VM Console
    You should see the startup console running, wait until it shows "CMS application starting. Check log files."
    Close the console, then click on Show graphics tab -- a page should open in your browser
    Go to the logs page and copy the contents of each log file to a file on your computer
    Send the files as attachments to the address I'll send as a PM

Thanks


ID: 47304 · Report as offensive     Reply Quote
user

Send message
Joined: 18 Jun 14
Posts: 17
Credit: 10,539,030
RAC: 0
Message 47307 - Posted: 27 Sep 2022, 0:57:13 UTC - in response to Message 47304.  

Stderr output value in task information

are you talking about?

ex) <core_client_version>7.20.2</core_client_version>
<![CDATA[
<message>
- exit code 208 (0xd0)</message>
<stderr_txt>
2022-09-24 18:42:33 (21388): Detected: vboxwrapper 26202
2022-09-24 18:42:33 (21388): Detected: BOINC client v7.20.2
2022-09-24 18:42:33 (21388): Detected: VirtualBox VboxManage Interface (Version: 6.1.34)
2022-09-24 18:42:33 (21388): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2022-09-24 18:42:33 (21388): Successfully copied 'init_data.xml' to the shared directory.
2022-09-24 18:42:35 (21388): Create VM. (boinc_ff9b89d4415a8683, slot#16)
2022-09-24 18:42:35 (21388): Setting Memory Size for VM. (2048 MB)
ID: 47307 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,503,137
RAC: 3,956
Message 47308 - Posted: 27 Sep 2022, 6:55:16 UTC - in response to Message 47307.  

Required extension pack not installed, remote desktop not enabled.

You didn't d/l the Extension Pack with the current version of Oracle VB so you can look at the "Log" page which is not the STDERR and actually will show you the running Logs for each task
THEN you will be able to see the running logs when you go to the Boinc Manager and click on your running task and then click on the "Show Graphics" on the left side of that task.

When you do that another web page will open where you can click on "Logs" and then see what is actually going on and if the task actually started running.

https://www.virtualbox.org/wiki/Downloads
ID: 47308 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,991,326
RAC: 136,411
Message 47309 - Posted: 27 Sep 2022, 7:11:02 UTC - in response to Message 47308.  

VirtualBox extensions are not required to view the logs Ivan requested.
Just select a running task and click on "Show Graphics" as already described.
This opens a browser window where you can navigate to the requested logs.
ID: 47309 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,503,137
RAC: 3,956
Message 47310 - Posted: 27 Sep 2022, 9:08:51 UTC - in response to Message 47309.  

He already said that and many years ago it was that way but then you were not a member and didn't run any back then but you do seem to have something to say any time I post anything which is why I usually only do it via pm's since many members don't like your style.
ID: 47310 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,178,946
RAC: 105,697
Message 47311 - Posted: 27 Sep 2022, 10:32:03 UTC - in response to Message 47310.  

He already said that and many years ago it was that way but then you were not a member and didn't run any back then but you do seem to have something to say any time I post anything which is why I usually only do it via pm's since many members don't like your style.

+1
ID: 47311 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,991,326
RAC: 136,411
Message 47312 - Posted: 27 Sep 2022, 10:35:04 UTC - in response to Message 47310.  

Many members?
Might be 1 or 2 who post lots of wrong suggestions again and again.
ID: 47312 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,178,946
RAC: 105,697
Message 47329 - Posted: 29 Sep 2022, 10:37:25 UTC - in response to Message 47312.  

Many members?
Might be 1 or 2 who post lots of wrong suggestions again and again.

Say the names!
ID: 47329 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,178,946
RAC: 105,697
Message 47409 - Posted: 22 Oct 2022, 23:32:19 UTC - in response to Message 47329.  

2022-10-22 22:21:03 (10892): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat )
2022-10-22 22:21:03 (10892): Guest Log: NCAT DEBUG: Using system default trusted CA certificates and those in /usr/share/ncat/ca-bundle.crt.
2022-10-22 22:21:03 (10892): Guest Log: NCAT DEBUG: Unable to load trusted CA certificates from /usr/share/ncat/ca-bundle.crt: error:02001002:system library:fopen:No such file or directory
2022-10-22 22:21:03 (10892): Guest Log: libnsock nsi_new2(): nsi_new (IOD #1)
2022-10-22 22:21:03 (10892): Guest Log: libnsock nsock_connect_tcp(): TCP connection requested to 137.138.156.85:9618 (IOD #1) EID 8
2022-10-22 22:21:03 (10892): Guest Log: libnsock nsock_trace_handler_callback(): Callback: CONNECT TIMEOUT for EID 8 [137.138.156.85:9618]
2022-10-22 22:21:03 (10892): Guest Log: Ncat: Connection timed out.
2022-10-22 22:21:03 (10892): Guest Log: [ERROR] Could not connect to vocms0840.cern.ch on port 9618
ID: 47409 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 998
Credit: 6,264,307
RAC: 71
Message 47416 - Posted: 23 Oct 2022, 15:01:20 UTC - in response to Message 47409.  

Whatever the problem with vocms0840, it looks to be fixed now.
ID: 47416 · Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 30 Oct 11
Posts: 26
Credit: 4,772,123
RAC: 6
Message 47560 - Posted: 25 Nov 2022, 4:42:15 UTC - in response to Message 47416.  

Whatever the problem with vocms0840, it looks to be fixed now.


It could be back again as all I'm getting right now are "error while computing" for every single task.

The main message seems to be 'glidein exited with return value 127' but here's the Stderr file for one of my tasks:

Stderr output
<core_client_version>7.20.2</core_client_version>
<![CDATA[
<message>
The global filename characters, * or ?, are entered incorrectly or too many global filename characters are specified.
(0xd0) - exit code 208 (0xd0)</message>
<stderr_txt>
2022-11-24 22:43:30 (24408): Detected: vboxwrapper 26206
2022-11-24 22:43:30 (24408): Detected: BOINC client v7.20.2
2022-11-24 22:43:30 (24408): Detected: VirtualBox VboxManage Interface (Version: 7.0.4)
2022-11-24 22:43:31 (24408): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2022-11-24 22:43:31 (24408): Successfully copied 'init_data.xml' to the shared directory.
2022-11-24 22:43:31 (24408): Create VM. (boinc_40e995cbe1124736, slot#5)
2022-11-24 22:43:32 (24408): Setting Memory Size for VM. (2048MB)
2022-11-24 22:43:32 (24408): Setting CPU Count for VM. (1)
2022-11-24 22:43:32 (24408): Setting Chipset Options for VM.
2022-11-24 22:43:32 (24408): Setting Graphics Controller Options for VM.
2022-11-24 22:43:33 (24408): Setting Boot Options for VM.
2022-11-24 22:43:33 (24408): Setting Network Configuration for NAT.
2022-11-24 22:43:33 (24408): Enabling VM Network Access.
2022-11-24 22:43:33 (24408): Disabling USB Support for VM.
2022-11-24 22:43:34 (24408): Disabling COM Port Support for VM.
2022-11-24 22:43:34 (24408): Disabling LPT Port Support for VM.
2022-11-24 22:43:34 (24408): Disabling Audio Support for VM.
2022-11-24 22:43:34 (24408): Disabling Clipboard Support for VM.
2022-11-24 22:43:35 (24408): Disabling Drag and Drop Support for VM.
2022-11-24 22:43:35 (24408): Adding storage controller(s) to VM.
2022-11-24 22:43:35 (24408): Adding virtual disk drive to VM. (CMS_2022_09_07_prod.vdi)
2022-11-24 22:43:36 (24408): Adding VirtualBox Guest Additions to VM.
2022-11-24 22:43:36 (24408): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
2022-11-24 22:43:36 (24408): forwarding host port 53158 to guest port 80
2022-11-24 22:43:36 (24408): Enabling remote desktop for VM.
2022-11-24 22:43:37 (24408): Required extension pack not installed, remote desktop not enabled.
2022-11-24 22:43:37 (24408): Enabling shared directory for VM.
2022-11-24 22:43:37 (24408): Starting VM using VBoxManage interface. (boinc_40e995cbe1124736, slot#5)
2022-11-24 22:43:43 (24408): Successfully started VM. (PID = '17604')
2022-11-24 22:43:43 (24408): Reporting VM Process ID to BOINC.
2022-11-24 22:43:43 (24408): Guest Log: BIOS: VirtualBox 7.0.4
2022-11-24 22:43:43 (24408): Guest Log: CPUID EDX: 0x178bfbff
2022-11-24 22:43:43 (24408): Guest Log: BIOS: No PCI IDE controller, not probing IDE
2022-11-24 22:43:43 (24408): Guest Log: BIOS: AHCI 0-P#0: PCHS=16383/16/63 LCHS=1024/255/63 0x0000000002800000 sectors
2022-11-24 22:43:43 (24408): VM state change detected. (old = 'poweredoff', new = 'running')
2022-11-24 22:43:43 (24408): Detected: Web Application Enabled (http://localhost:53158)
2022-11-24 22:43:43 (24408): Preference change detected
2022-11-24 22:43:43 (24408): Setting CPU throttle for VM. (90%)
2022-11-24 22:43:43 (24408): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 900 seconds) or (Vbox_job.xml: 600 seconds))
2022-11-24 22:43:45 (24408): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2022-11-24 22:43:45 (24408): Guest Log: BIOS: Booting from Hard Disk...
2022-11-24 22:43:47 (24408): Guest Log: BIOS: KBD: unsupported int 16h function 03
2022-11-24 22:43:47 (24408): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000
2022-11-24 22:44:12 (24408): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds
2022-11-24 22:44:12 (24408): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
2022-11-24 22:44:13 (24408): Guest Log: VBoxService 5.2.6 r120293 (verbosity: 0) linux.amd64 (Jan 15 2018 14:51:00) release log
2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.000065 main Log opened 2022-11-25T03:44:13.139615000Z
2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.000177 main OS Product: Linux
2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.000211 main OS Release: 4.14.232-19.cernvm.x86_64
2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.000236 main OS Version: #1 SMP Fri Apr 30 17:12:25 CEST 2021
2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.000260 main Executable: /usr/sbin/VBoxService
2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.000261 main Process ID: 2171
2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.000261 main Package type: LINUX_64BITS_GENERIC
2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.002069 main 5.2.6 r120293 started. Verbose level = 0
2022-11-24 22:44:24 (24408): Guest Log: [INFO] Mounting the shared directory
2022-11-24 22:44:24 (24408): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor
2022-11-24 22:44:24 (24408): Guest Log: [INFO] Sourcing essential functions from /cvmfs/grid.cern.ch
2022-11-24 22:44:24 (24408): Guest Log: [INFO] Testing connection to cern.ch
2022-11-24 22:44:25 (24408): Guest Log: [INFO] Testing connection to VCCS
2022-11-24 22:44:25 (24408): Guest Log: [INFO] Testing connection to HTCondor
2022-11-24 22:44:25 (24408): Guest Log: [INFO] Testing connection to WMAgent
2022-11-24 22:44:25 (24408): Guest Log: [INFO] Testing connection to EOSCMS
2022-11-24 22:44:25 (24408): Guest Log: [INFO] Testing connection to CMS-Factory
2022-11-24 22:44:26 (24408): Guest Log: [INFO] Testing connection to CMS-Frontier
2022-11-24 22:44:26 (24408): Guest Log: [INFO] Testing connection to Frontier
2022-11-24 22:44:26 (24408): Guest Log: [INFO] Could not find a local HTTP proxy
2022-11-24 22:44:26 (24408): Guest Log: [INFO] CVMFS and Frontier will have to use DIRECT connections
2022-11-24 22:44:26 (24408): Guest Log: [INFO] This makes the application less efficient
2022-11-24 22:44:27 (24408): Guest Log: [INFO] It also puts higher load on the project servers
2022-11-24 22:44:27 (24408): Guest Log: [INFO] Setting up a local HTTP proxy is highly recommended
2022-11-24 22:44:27 (24408): Guest Log: [INFO] Advice can be found in the project forum
2022-11-24 22:44:27 (24408): Guest Log: [INFO] Reloading and probing the CVMFS configuration
2022-11-24 22:44:32 (24408): Guest Log: [INFO] Probing /cvmfs/grid.cern.ch... OK
2022-11-24 22:44:33 (24408): Guest Log: [INFO] Probing /cvmfs/oasis.opensciencegrid.org... OK
2022-11-24 22:44:33 (24408): Guest Log: [INFO] Probing /cvmfs/singularity.opensciencegrid.org... OK
2022-11-24 22:44:34 (24408): Guest Log: [INFO] Probing /cvmfs/cms.cern.ch... OK
2022-11-24 22:44:38 (24408): Guest Log: [INFO] Probing /cvmfs/cms-ib.cern.ch... OK
2022-11-24 22:44:40 (24408): Guest Log: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY
2022-11-24 22:44:40 (24408): Guest Log: [INFO] 2.7.2.0 http://s1ral-cvmfs.openhtc.io DIRECT
2022-11-24 22:44:40 (24408): Guest Log: [INFO] Environment HTTP proxy: not set
2022-11-24 22:44:40 (24408): Guest Log: [INFO] Reading volunteer information
2022-11-24 22:44:47 (24408): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2022-11-24 22:44:50 (24408): Guest Log: [INFO] CMS application starting. Check log files.
2022-11-24 22:44:50 (24408): Guest Log: [INFO] Requesting an idtoken from LHC@home
2022-11-24 22:44:51 (24408): Guest Log: [ERROR] glidein exited with return value 127.
2022-11-24 22:44:51 (24408): Guest Log: [DEBUG] Volunteer: mikey (223772)
2022-11-24 22:44:51 (24408): Guest Log: [INFO] Shutting Down.
2022-11-24 22:45:21 (24408): VM Completion File Detected.
2022-11-24 22:45:21 (24408): VM Completion Message: glidein exited with return value 127.
.
2022-11-24 22:45:21 (24408): Powering off VM.
2022-11-24 22:45:21 (24408): Successfully stopped VM.
2022-11-24 22:45:21 (24408): Deregistering VM. (boinc_40e995cbe1124736, slot#5)
2022-11-24 22:45:21 (24408): Removing network bandwidth throttle group from VM.
2022-11-24 22:45:22 (24408): Removing VM from VirtualBox.
22:45:27 (24408): called boinc_finish(208)

</stderr_txt>
]]>
ID: 47560 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 998
Credit: 6,264,307
RAC: 71
Message 47561 - Posted: 25 Nov 2022, 10:08:58 UTC - in response to Message 47560.  

Yes, there's a problem.
No, I haven't found it yet.
If I try to look at our condor jobs from CERN I get
[lxplus780:~] > condor_q -name vocms0267.cern.ch -pool vocms0840.cern.ch 

-- Failed to fetch ads from: <137.138.52.94:4080?addrs=137.138.52.94-4080+[2001-1458-d00-17--43]-4080&alias=vocms0267.cern.ch&noUDP&sock=schedd_1783_9173> : vocms0267.cern.ch
AUTHENTICATE:1003:Failed to authenticate with any method

and the pool is depleted:
[lxplus780:~] > condor_status -pool vocms0840.cern.ch
Name                                         OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

glidein_4670_387833785@107107-10489077-18060 LINUX      X86_64 Claimed   Busy      2.330 2500  0+01:00:46
glidein_4680_323803200@107107-10489077-31377 LINUX      X86_64 Claimed   Busy      3.070 2500  0+03:49:52
glidein_4648_45869712@107107-10511611-11990  LINUX      X86_64 Claimed   Busy      1.250 2500  0+01:03:09
glidein_4656_822528270@107107-10511611-15555 LINUX      X86_64 Claimed   Busy      1.210 2500  0+02:43:25
glidein_4653_323462880@107107-10511611-16804 LINUX      X86_64 Claimed   Busy      1.430 2500  0+03:35:45
glidein_4646_361662954@107107-10511611-18323 LINUX      X86_64 Claimed   Busy      1.210 2500  0+03:24:27
glidein_4648_563620040@107107-10511611-22062 LINUX      X86_64 Claimed   Busy      2.260 2500  0+02:32:14
glidein_4664_267919542@107107-10511611-30334 LINUX      X86_64 Claimed   Busy      1.260 2500  0+01:59:35
glidein_4656_121852692@107107-10511611-30765 LINUX      X86_64 Claimed   Busy      2.260 2500  0+00:11:12
glidein_4670_72564800@107107-10511679-3097   LINUX      X86_64 Claimed   Busy      2.270 2500  0+00:37:28
glidein_4657_476277228@107107-10511679-15034 LINUX      X86_64 Claimed   Busy      1.690 2500  0+00:40:13
glidein_4656_29749126@107107-10511679-18362  LINUX      X86_64 Claimed   Busy      2.310 2500  0+02:11:44
glidein_4645_96284106@107107-10511679-21818  LINUX      X86_64 Claimed   Busy      1.500 2500  0+01:00:07
glidein_4874_26963200@107107-10511679-23757  LINUX      X86_64 Claimed   Busy      1.420 2500  0+01:33:39
glidein_4656_572271750@107107-10511679-29938 LINUX      X86_64 Claimed   Busy      1.530 2500  0+03:33:02
glidein_4661_72786555@107107-10511679-31672  LINUX      X86_64 Claimed   Busy      0.810 2500  0+00:01:37
glidein_4651_831188082@107107-10574756-27824 LINUX      X86_64 Claimed   Busy      1.850 2500  0+01:08:16
glidein_4646_553968128@107107-10574756-28110 LINUX      X86_64 Claimed   Busy      1.500 2500  0+01:59:27
glidein_4645_266261752@107107-10574756-28244 LINUX      X86_64 Claimed   Busy      2.010 2500  0+01:58:33
glidein_4644_353209032@107107-10574756-29278 LINUX      X86_64 Claimed   Busy      1.740 2500  0+02:51:58
glidein_4649_107372800@107107-10574756-29861 LINUX      X86_64 Claimed   Busy      1.770 2500  0+00:35:45
glidein_4646_432356158@107107-10574756-29943 LINUX      X86_64 Claimed   Busy      1.910 2500  0+01:10:31
glidein_4647_591695930@107107-10574756-31949 LINUX      X86_64 Claimed   Busy      1.740 2500  0+01:24:21
glidein_4650_109275384@107107-10574767-7200  LINUX      X86_64 Claimed   Busy      1.760 2500  0+03:16:18
glidein_4654_146209624@107107-10574767-7326  LINUX      X86_64 Claimed   Busy      2.040 2500  0+02:10:39
glidein_4651_667607907@107107-10574767-11294 LINUX      X86_64 Claimed   Busy      2.230 2500  0+02:30:02
glidein_4643_676121242@107107-10574767-11526 LINUX      X86_64 Claimed   Busy      0.000 2500  0+00:00:00
glidein_4750_71875323@107107-10574767-22926  LINUX      X86_64 Claimed   Busy      1.730 2500  0+02:27:44
glidein_4653_455226972@107107-10574767-24067 LINUX      X86_64 Claimed   Busy      1.410 2500  0+00:22:12
glidein_4652_527975600@107107-10574767-24820 LINUX      X86_64 Claimed   Busy      1.440 2500  0+00:32:53
glidein_4647_415785516@107107-10574816-7681  LINUX      X86_64 Claimed   Busy      1.660 2500  0+01:57:45
glidein_4652_241555920@107107-10574816-12265 LINUX      X86_64 Claimed   Busy      1.970 2500  0+00:36:03
glidein_4650_91299585@107107-10574816-12578  LINUX      X86_64 Claimed   Busy      1.030 2500  0+03:04:33
glidein_4649_824674365@107107-10574816-18524 LINUX      X86_64 Claimed   Busy      0.240 2500  0+00:00:00
glidein_4651_914472658@107107-10574816-23179 LINUX      X86_64 Claimed   Busy      1.770 2500  0+01:32:32
glidein_4667_286539704@107107-10574816-29186 LINUX      X86_64 Claimed   Busy      2.400 2500  0+01:10:41
glidein_4645_450114845@107107-10803618-56    LINUX      X86_64 Claimed   Busy      1.490 2500  0+00:58:22
glidein_4647_188663238@107107-10803618-2988  LINUX      X86_64 Claimed   Busy      1.150 2500  0+01:27:43
glidein_4666_47581620@107107-10803618-6932   LINUX      X86_64 Unclaimed Idle      0.150 2500  0+00:00:18
glidein_4648_530694856@107107-10803618-9013  LINUX      X86_64 Claimed   Busy      1.290 2500  0+02:44:29
glidein_4652_13214578@107107-10803618-13539  LINUX      X86_64 Claimed   Busy      1.330 2500  0+01:48:24
glidein_4649_228988870@107107-10803618-15298 LINUX      X86_64 Claimed   Busy      1.580 2500  0+01:12:38
glidein_4647_21341012@107107-10803618-23311  LINUX      X86_64 Claimed   Busy      1.610 2500  0+03:11:40
glidein_4513_225847896@176180-10679437-10022 LINUX      X86_64 Claimed   Busy      2.750 2500  0+01:23:28
glidein_4636_427199591@792560-10695588-13481 LINUX      X86_64 Claimed   Busy      0.000 2500  0+05:50:14

               Machines Owner Claimed Unclaimed Matched Preempting  Drain

  X86_64/LINUX       45     0      44         1       0          0      0

         Total       45     0      44         1       0          0      0

I presume the error code (127) from the glidein is due to the problem with vocms0267 (our WMAgent) but on the other hand it is showing OK in WMStats.
Investigating...
Hmm, that's strange. Almost all of the VMs in the pool are from the same user -- Ah! I know who it is, I'll expect an email from him soon.
ID: 47561 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 998
Credit: 6,264,307
RAC: 71
Message 47562 - Posted: 25 Nov 2022, 10:31:04 UTC - in response to Message 47561.  

We changed a condor parameter yesterday, altering the time that condor waited for a disconnected machine to resume from one day to two hours. It is possible that this is conflicting with some other requirement, leading glidein to abort. Emails have been sent and I'm searching for glidein code to see what error 127 indicates.
ID: 47562 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 998
Credit: 6,264,307
RAC: 71
Message 47563 - Posted: 25 Nov 2022, 10:42:26 UTC - in response to Message 47562.  

From a responsible:
Exit code 127 usually means that the executable wasn't found which gives a big clue.

It seems we might have picked up some other changes when we modified the hibernation time-out. Investigations continue.
ID: 47563 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 998
Credit: 6,264,307
RAC: 71
Message 47564 - Posted: 25 Nov 2022, 15:27:34 UTC - in response to Message 47563.  

From a responsible:
Exit code 127 usually means that the executable wasn't found which gives a big clue.

It seems we might have picked up some other changes when we modified the hibernation time-out. Investigations continue.

We've reverted the change that garbled our glidein script -- I'm running main and -dev jobs successfully now.
ID: 47564 · Report as offensive     Reply Quote

Message boards : CMS Application : CMS Simulation error


©2024 CERN