Message boards :
CMS Application :
CMS Simulation error
Message board moderation
Author | Message |
---|---|
Send message Joined: 18 Jun 14 Posts: 17 Credit: 10,539,032 RAC: 0 |
CMS Simulation error hello. I have a question. During the LHC project, nothing else is wrong, but only the CMS Simulation gives an operation error. What are the causes and solutions? |
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,734,854 RAC: 2,594 |
CMS Simulation error I don't have a definitive answer. Your tasks seem to be failing to acquire a job from the Condor pool, and the glidein fails after a 20-minute time-out: 2022-09-24 18:53:01 (2736): Guest Log: [INFO] CMS application starting. Check log files. 2022-09-24 19:13:21 (2736): Guest Log: [ERROR] glidein exited with return value 1. I'll ask the glidein specialist what can cause time-outs leading to Error 1. |
Send message Joined: 18 Jun 14 Posts: 17 Credit: 10,539,032 RAC: 0 |
Thank you for your answer. It doesn't seem like a common case... I tried formatting, just in case, but still the same symptom. Even later, if you have a solution, please share. :) |
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,734,854 RAC: 2,594 |
If you can do some debugging, it might help us.
Select the task in the Tasks tab, and click on Show VM Console You should see the startup console running, wait until it shows "CMS application starting. Check log files." Close the console, then click on Show graphics tab -- a page should open in your browser Go to the logs page and copy the contents of each log file to a file on your computer Send the files as attachments to the address I'll send as a PM Thanks |
Send message Joined: 18 Jun 14 Posts: 17 Credit: 10,539,032 RAC: 0 |
Stderr output value in task information are you talking about? ex) <core_client_version>7.20.2</core_client_version> <![CDATA[ <message> - exit code 208 (0xd0)</message> <stderr_txt> 2022-09-24 18:42:33 (21388): Detected: vboxwrapper 26202 2022-09-24 18:42:33 (21388): Detected: BOINC client v7.20.2 2022-09-24 18:42:33 (21388): Detected: VirtualBox VboxManage Interface (Version: 6.1.34) 2022-09-24 18:42:33 (21388): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds) 2022-09-24 18:42:33 (21388): Successfully copied 'init_data.xml' to the shared directory. 2022-09-24 18:42:35 (21388): Create VM. (boinc_ff9b89d4415a8683, slot#16) 2022-09-24 18:42:35 (21388): Setting Memory Size for VM. (2048 MB) |
Send message Joined: 24 Oct 04 Posts: 1172 Credit: 54,685,889 RAC: 15,649 |
Required extension pack not installed, remote desktop not enabled. You didn't d/l the Extension Pack with the current version of Oracle VB so you can look at the "Log" page which is not the STDERR and actually will show you the running Logs for each task THEN you will be able to see the running logs when you go to the Boinc Manager and click on your running task and then click on the "Show Graphics" on the left side of that task. When you do that another web page will open where you can click on "Logs" and then see what is actually going on and if the task actually started running. https://www.virtualbox.org/wiki/Downloads |
Send message Joined: 15 Jun 08 Posts: 2528 Credit: 253,722,201 RAC: 62,755 |
VirtualBox extensions are not required to view the logs Ivan requested. Just select a running task and click on "Show Graphics" as already described. This opens a browser window where you can navigate to the requested logs. |
Send message Joined: 24 Oct 04 Posts: 1172 Credit: 54,685,889 RAC: 15,649 |
He already said that and many years ago it was that way but then you were not a member and didn't run any back then but you do seem to have something to say any time I post anything which is why I usually only do it via pm's since many members don't like your style. |
Send message Joined: 2 May 07 Posts: 2240 Credit: 173,894,884 RAC: 3,757 |
He already said that and many years ago it was that way but then you were not a member and didn't run any back then but you do seem to have something to say any time I post anything which is why I usually only do it via pm's since many members don't like your style. +1 |
Send message Joined: 15 Jun 08 Posts: 2528 Credit: 253,722,201 RAC: 62,755 |
Many members? Might be 1 or 2 who post lots of wrong suggestions again and again. |
Send message Joined: 2 May 07 Posts: 2240 Credit: 173,894,884 RAC: 3,757 |
Many members? Say the names! |
Send message Joined: 2 May 07 Posts: 2240 Credit: 173,894,884 RAC: 3,757 |
2022-10-22 22:21:03 (10892): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat ) 2022-10-22 22:21:03 (10892): Guest Log: NCAT DEBUG: Using system default trusted CA certificates and those in /usr/share/ncat/ca-bundle.crt. 2022-10-22 22:21:03 (10892): Guest Log: NCAT DEBUG: Unable to load trusted CA certificates from /usr/share/ncat/ca-bundle.crt: error:02001002:system library:fopen:No such file or directory 2022-10-22 22:21:03 (10892): Guest Log: libnsock nsi_new2(): nsi_new (IOD #1) 2022-10-22 22:21:03 (10892): Guest Log: libnsock nsock_connect_tcp(): TCP connection requested to 137.138.156.85:9618 (IOD #1) EID 8 2022-10-22 22:21:03 (10892): Guest Log: libnsock nsock_trace_handler_callback(): Callback: CONNECT TIMEOUT for EID 8 [137.138.156.85:9618] 2022-10-22 22:21:03 (10892): Guest Log: Ncat: Connection timed out. 2022-10-22 22:21:03 (10892): Guest Log: [ERROR] Could not connect to vocms0840.cern.ch on port 9618 |
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,734,854 RAC: 2,594 |
|
Send message Joined: 30 Oct 11 Posts: 26 Credit: 4,940,164 RAC: 1 |
Whatever the problem with vocms0840, it looks to be fixed now. It could be back again as all I'm getting right now are "error while computing" for every single task. The main message seems to be 'glidein exited with return value 127' but here's the Stderr file for one of my tasks: Stderr output <core_client_version>7.20.2</core_client_version> <![CDATA[ <message> The global filename characters, * or ?, are entered incorrectly or too many global filename characters are specified. (0xd0) - exit code 208 (0xd0)</message> <stderr_txt> 2022-11-24 22:43:30 (24408): Detected: vboxwrapper 26206 2022-11-24 22:43:30 (24408): Detected: BOINC client v7.20.2 2022-11-24 22:43:30 (24408): Detected: VirtualBox VboxManage Interface (Version: 7.0.4) 2022-11-24 22:43:31 (24408): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds) 2022-11-24 22:43:31 (24408): Successfully copied 'init_data.xml' to the shared directory. 2022-11-24 22:43:31 (24408): Create VM. (boinc_40e995cbe1124736, slot#5) 2022-11-24 22:43:32 (24408): Setting Memory Size for VM. (2048MB) 2022-11-24 22:43:32 (24408): Setting CPU Count for VM. (1) 2022-11-24 22:43:32 (24408): Setting Chipset Options for VM. 2022-11-24 22:43:32 (24408): Setting Graphics Controller Options for VM. 2022-11-24 22:43:33 (24408): Setting Boot Options for VM. 2022-11-24 22:43:33 (24408): Setting Network Configuration for NAT. 2022-11-24 22:43:33 (24408): Enabling VM Network Access. 2022-11-24 22:43:33 (24408): Disabling USB Support for VM. 2022-11-24 22:43:34 (24408): Disabling COM Port Support for VM. 2022-11-24 22:43:34 (24408): Disabling LPT Port Support for VM. 2022-11-24 22:43:34 (24408): Disabling Audio Support for VM. 2022-11-24 22:43:34 (24408): Disabling Clipboard Support for VM. 2022-11-24 22:43:35 (24408): Disabling Drag and Drop Support for VM. 2022-11-24 22:43:35 (24408): Adding storage controller(s) to VM. 2022-11-24 22:43:35 (24408): Adding virtual disk drive to VM. (CMS_2022_09_07_prod.vdi) 2022-11-24 22:43:36 (24408): Adding VirtualBox Guest Additions to VM. 2022-11-24 22:43:36 (24408): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB) 2022-11-24 22:43:36 (24408): forwarding host port 53158 to guest port 80 2022-11-24 22:43:36 (24408): Enabling remote desktop for VM. 2022-11-24 22:43:37 (24408): Required extension pack not installed, remote desktop not enabled. 2022-11-24 22:43:37 (24408): Enabling shared directory for VM. 2022-11-24 22:43:37 (24408): Starting VM using VBoxManage interface. (boinc_40e995cbe1124736, slot#5) 2022-11-24 22:43:43 (24408): Successfully started VM. (PID = '17604') 2022-11-24 22:43:43 (24408): Reporting VM Process ID to BOINC. 2022-11-24 22:43:43 (24408): Guest Log: BIOS: VirtualBox 7.0.4 2022-11-24 22:43:43 (24408): Guest Log: CPUID EDX: 0x178bfbff 2022-11-24 22:43:43 (24408): Guest Log: BIOS: No PCI IDE controller, not probing IDE 2022-11-24 22:43:43 (24408): Guest Log: BIOS: AHCI 0-P#0: PCHS=16383/16/63 LCHS=1024/255/63 0x0000000002800000 sectors 2022-11-24 22:43:43 (24408): VM state change detected. (old = 'poweredoff', new = 'running') 2022-11-24 22:43:43 (24408): Detected: Web Application Enabled (http://localhost:53158) 2022-11-24 22:43:43 (24408): Preference change detected 2022-11-24 22:43:43 (24408): Setting CPU throttle for VM. (90%) 2022-11-24 22:43:43 (24408): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 900 seconds) or (Vbox_job.xml: 600 seconds)) 2022-11-24 22:43:45 (24408): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032 2022-11-24 22:43:45 (24408): Guest Log: BIOS: Booting from Hard Disk... 2022-11-24 22:43:47 (24408): Guest Log: BIOS: KBD: unsupported int 16h function 03 2022-11-24 22:43:47 (24408): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 2022-11-24 22:44:12 (24408): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds 2022-11-24 22:44:12 (24408): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) 2022-11-24 22:44:13 (24408): Guest Log: VBoxService 5.2.6 r120293 (verbosity: 0) linux.amd64 (Jan 15 2018 14:51:00) release log 2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.000065 main Log opened 2022-11-25T03:44:13.139615000Z 2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.000177 main OS Product: Linux 2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.000211 main OS Release: 4.14.232-19.cernvm.x86_64 2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.000236 main OS Version: #1 SMP Fri Apr 30 17:12:25 CEST 2021 2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.000260 main Executable: /usr/sbin/VBoxService 2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.000261 main Process ID: 2171 2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.000261 main Package type: LINUX_64BITS_GENERIC 2022-11-24 22:44:13 (24408): Guest Log: 00:00:00.002069 main 5.2.6 r120293 started. Verbose level = 0 2022-11-24 22:44:24 (24408): Guest Log: [INFO] Mounting the shared directory 2022-11-24 22:44:24 (24408): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor 2022-11-24 22:44:24 (24408): Guest Log: [INFO] Sourcing essential functions from /cvmfs/grid.cern.ch 2022-11-24 22:44:24 (24408): Guest Log: [INFO] Testing connection to cern.ch 2022-11-24 22:44:25 (24408): Guest Log: [INFO] Testing connection to VCCS 2022-11-24 22:44:25 (24408): Guest Log: [INFO] Testing connection to HTCondor 2022-11-24 22:44:25 (24408): Guest Log: [INFO] Testing connection to WMAgent 2022-11-24 22:44:25 (24408): Guest Log: [INFO] Testing connection to EOSCMS 2022-11-24 22:44:25 (24408): Guest Log: [INFO] Testing connection to CMS-Factory 2022-11-24 22:44:26 (24408): Guest Log: [INFO] Testing connection to CMS-Frontier 2022-11-24 22:44:26 (24408): Guest Log: [INFO] Testing connection to Frontier 2022-11-24 22:44:26 (24408): Guest Log: [INFO] Could not find a local HTTP proxy 2022-11-24 22:44:26 (24408): Guest Log: [INFO] CVMFS and Frontier will have to use DIRECT connections 2022-11-24 22:44:26 (24408): Guest Log: [INFO] This makes the application less efficient 2022-11-24 22:44:27 (24408): Guest Log: [INFO] It also puts higher load on the project servers 2022-11-24 22:44:27 (24408): Guest Log: [INFO] Setting up a local HTTP proxy is highly recommended 2022-11-24 22:44:27 (24408): Guest Log: [INFO] Advice can be found in the project forum 2022-11-24 22:44:27 (24408): Guest Log: [INFO] Reloading and probing the CVMFS configuration 2022-11-24 22:44:32 (24408): Guest Log: [INFO] Probing /cvmfs/grid.cern.ch... OK 2022-11-24 22:44:33 (24408): Guest Log: [INFO] Probing /cvmfs/oasis.opensciencegrid.org... OK 2022-11-24 22:44:33 (24408): Guest Log: [INFO] Probing /cvmfs/singularity.opensciencegrid.org... OK 2022-11-24 22:44:34 (24408): Guest Log: [INFO] Probing /cvmfs/cms.cern.ch... OK 2022-11-24 22:44:38 (24408): Guest Log: [INFO] Probing /cvmfs/cms-ib.cern.ch... OK 2022-11-24 22:44:40 (24408): Guest Log: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY 2022-11-24 22:44:40 (24408): Guest Log: [INFO] 2.7.2.0 http://s1ral-cvmfs.openhtc.io DIRECT 2022-11-24 22:44:40 (24408): Guest Log: [INFO] Environment HTTP proxy: not set 2022-11-24 22:44:40 (24408): Guest Log: [INFO] Reading volunteer information 2022-11-24 22:44:47 (24408): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2022-11-24 22:44:50 (24408): Guest Log: [INFO] CMS application starting. Check log files. 2022-11-24 22:44:50 (24408): Guest Log: [INFO] Requesting an idtoken from LHC@home 2022-11-24 22:44:51 (24408): Guest Log: [ERROR] glidein exited with return value 127. 2022-11-24 22:44:51 (24408): Guest Log: [DEBUG] Volunteer: mikey (223772) 2022-11-24 22:44:51 (24408): Guest Log: [INFO] Shutting Down. 2022-11-24 22:45:21 (24408): VM Completion File Detected. 2022-11-24 22:45:21 (24408): VM Completion Message: glidein exited with return value 127. . 2022-11-24 22:45:21 (24408): Powering off VM. 2022-11-24 22:45:21 (24408): Successfully stopped VM. 2022-11-24 22:45:21 (24408): Deregistering VM. (boinc_40e995cbe1124736, slot#5) 2022-11-24 22:45:21 (24408): Removing network bandwidth throttle group from VM. 2022-11-24 22:45:22 (24408): Removing VM from VirtualBox. 22:45:27 (24408): called boinc_finish(208) </stderr_txt> ]]> |
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,734,854 RAC: 2,594 |
Yes, there's a problem. No, I haven't found it yet. If I try to look at our condor jobs from CERN I get [lxplus780:~] > condor_q -name vocms0267.cern.ch -pool vocms0840.cern.ch -- Failed to fetch ads from: <137.138.52.94:4080?addrs=137.138.52.94-4080+[2001-1458-d00-17--43]-4080&alias=vocms0267.cern.ch&noUDP&sock=schedd_1783_9173> : vocms0267.cern.ch AUTHENTICATE:1003:Failed to authenticate with any method and the pool is depleted: [lxplus780:~] > condor_status -pool vocms0840.cern.ch Name OpSys Arch State Activity LoadAv Mem ActvtyTime glidein_4670_387833785@107107-10489077-18060 LINUX X86_64 Claimed Busy 2.330 2500 0+01:00:46 glidein_4680_323803200@107107-10489077-31377 LINUX X86_64 Claimed Busy 3.070 2500 0+03:49:52 glidein_4648_45869712@107107-10511611-11990 LINUX X86_64 Claimed Busy 1.250 2500 0+01:03:09 glidein_4656_822528270@107107-10511611-15555 LINUX X86_64 Claimed Busy 1.210 2500 0+02:43:25 glidein_4653_323462880@107107-10511611-16804 LINUX X86_64 Claimed Busy 1.430 2500 0+03:35:45 glidein_4646_361662954@107107-10511611-18323 LINUX X86_64 Claimed Busy 1.210 2500 0+03:24:27 glidein_4648_563620040@107107-10511611-22062 LINUX X86_64 Claimed Busy 2.260 2500 0+02:32:14 glidein_4664_267919542@107107-10511611-30334 LINUX X86_64 Claimed Busy 1.260 2500 0+01:59:35 glidein_4656_121852692@107107-10511611-30765 LINUX X86_64 Claimed Busy 2.260 2500 0+00:11:12 glidein_4670_72564800@107107-10511679-3097 LINUX X86_64 Claimed Busy 2.270 2500 0+00:37:28 glidein_4657_476277228@107107-10511679-15034 LINUX X86_64 Claimed Busy 1.690 2500 0+00:40:13 glidein_4656_29749126@107107-10511679-18362 LINUX X86_64 Claimed Busy 2.310 2500 0+02:11:44 glidein_4645_96284106@107107-10511679-21818 LINUX X86_64 Claimed Busy 1.500 2500 0+01:00:07 glidein_4874_26963200@107107-10511679-23757 LINUX X86_64 Claimed Busy 1.420 2500 0+01:33:39 glidein_4656_572271750@107107-10511679-29938 LINUX X86_64 Claimed Busy 1.530 2500 0+03:33:02 glidein_4661_72786555@107107-10511679-31672 LINUX X86_64 Claimed Busy 0.810 2500 0+00:01:37 glidein_4651_831188082@107107-10574756-27824 LINUX X86_64 Claimed Busy 1.850 2500 0+01:08:16 glidein_4646_553968128@107107-10574756-28110 LINUX X86_64 Claimed Busy 1.500 2500 0+01:59:27 glidein_4645_266261752@107107-10574756-28244 LINUX X86_64 Claimed Busy 2.010 2500 0+01:58:33 glidein_4644_353209032@107107-10574756-29278 LINUX X86_64 Claimed Busy 1.740 2500 0+02:51:58 glidein_4649_107372800@107107-10574756-29861 LINUX X86_64 Claimed Busy 1.770 2500 0+00:35:45 glidein_4646_432356158@107107-10574756-29943 LINUX X86_64 Claimed Busy 1.910 2500 0+01:10:31 glidein_4647_591695930@107107-10574756-31949 LINUX X86_64 Claimed Busy 1.740 2500 0+01:24:21 glidein_4650_109275384@107107-10574767-7200 LINUX X86_64 Claimed Busy 1.760 2500 0+03:16:18 glidein_4654_146209624@107107-10574767-7326 LINUX X86_64 Claimed Busy 2.040 2500 0+02:10:39 glidein_4651_667607907@107107-10574767-11294 LINUX X86_64 Claimed Busy 2.230 2500 0+02:30:02 glidein_4643_676121242@107107-10574767-11526 LINUX X86_64 Claimed Busy 0.000 2500 0+00:00:00 glidein_4750_71875323@107107-10574767-22926 LINUX X86_64 Claimed Busy 1.730 2500 0+02:27:44 glidein_4653_455226972@107107-10574767-24067 LINUX X86_64 Claimed Busy 1.410 2500 0+00:22:12 glidein_4652_527975600@107107-10574767-24820 LINUX X86_64 Claimed Busy 1.440 2500 0+00:32:53 glidein_4647_415785516@107107-10574816-7681 LINUX X86_64 Claimed Busy 1.660 2500 0+01:57:45 glidein_4652_241555920@107107-10574816-12265 LINUX X86_64 Claimed Busy 1.970 2500 0+00:36:03 glidein_4650_91299585@107107-10574816-12578 LINUX X86_64 Claimed Busy 1.030 2500 0+03:04:33 glidein_4649_824674365@107107-10574816-18524 LINUX X86_64 Claimed Busy 0.240 2500 0+00:00:00 glidein_4651_914472658@107107-10574816-23179 LINUX X86_64 Claimed Busy 1.770 2500 0+01:32:32 glidein_4667_286539704@107107-10574816-29186 LINUX X86_64 Claimed Busy 2.400 2500 0+01:10:41 glidein_4645_450114845@107107-10803618-56 LINUX X86_64 Claimed Busy 1.490 2500 0+00:58:22 glidein_4647_188663238@107107-10803618-2988 LINUX X86_64 Claimed Busy 1.150 2500 0+01:27:43 glidein_4666_47581620@107107-10803618-6932 LINUX X86_64 Unclaimed Idle 0.150 2500 0+00:00:18 glidein_4648_530694856@107107-10803618-9013 LINUX X86_64 Claimed Busy 1.290 2500 0+02:44:29 glidein_4652_13214578@107107-10803618-13539 LINUX X86_64 Claimed Busy 1.330 2500 0+01:48:24 glidein_4649_228988870@107107-10803618-15298 LINUX X86_64 Claimed Busy 1.580 2500 0+01:12:38 glidein_4647_21341012@107107-10803618-23311 LINUX X86_64 Claimed Busy 1.610 2500 0+03:11:40 glidein_4513_225847896@176180-10679437-10022 LINUX X86_64 Claimed Busy 2.750 2500 0+01:23:28 glidein_4636_427199591@792560-10695588-13481 LINUX X86_64 Claimed Busy 0.000 2500 0+05:50:14 Machines Owner Claimed Unclaimed Matched Preempting Drain X86_64/LINUX 45 0 44 1 0 0 0 Total 45 0 44 1 0 0 0 I presume the error code (127) from the glidein is due to the problem with vocms0267 (our WMAgent) but on the other hand it is showing OK in WMStats. Investigating... Hmm, that's strange. Almost all of the VMs in the pool are from the same user -- Ah! I know who it is, I'll expect an email from him soon. |
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,734,854 RAC: 2,594 |
We changed a condor parameter yesterday, altering the time that condor waited for a disconnected machine to resume from one day to two hours. It is possible that this is conflicting with some other requirement, leading glidein to abort. Emails have been sent and I'm searching for glidein code to see what error 127 indicates. |
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,734,854 RAC: 2,594 |
|
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,734,854 RAC: 2,594 |
From a responsible:Exit code 127 usually means that the executable wasn't found which gives a big clue. We've reverted the change that garbled our glidein script -- I'm running main and -dev jobs successfully now. |
©2024 CERN