1)
Message boards :
CMS Application :
CMS Tasks Failing
(Message 42814)
Posted 8 Jun 2020 by PaoloNasca Post: CMS tasks are designed to run 12h. Are you meaning the CMS tasks need a minimum CPU power? I explain me better. The first fine ended CMS task lasted near 17 hours. I think my PC isn’t suitable to help CMS@home project. Your reply is really appreciated. Thanks |
2)
Message boards :
CMS Application :
CMS Tasks Failing
(Message 42813)
Posted 8 Jun 2020 by PaoloNasca Post: Did you have a look into the Console (ALT-F2) or Graphics (logfile) to see how many events were processed before the task was killed? I don’t know how to read what you asked me. Please give me more info and I’ll be very happy and proud to help the development team. Thanks for your time. |
3)
Message boards :
CMS Application :
CMS Tasks Failing
(Message 42810)
Posted 8 Jun 2020 by PaoloNasca Post: Only one CMS WU ended successfully. I'd like to share a thought with Ivan and the developing team. The WU will successfully end If the Elapsed Time is less than Job Duration (64800 sec = 18 hours). My conclusion is: the VM has to be in “Running” state consecutively, I mean without any break/suspension. What do you think to build a VM with 2 or more CPUs? The Elapsed Time is inversely proportional to the CPU power. Here below an extract from https://lhcathome.cern.ch/lhcathome/result.php?resultid=276449837 ….omitted…. 2020-06-07 01:03:35 (12068): VM state change detected. (old = 'Running', new = 'Paused') 2020-06-07 01:08:35 (12068): VM state change detected. (old = 'Paused', new = 'Running') 2020-06-07 01:20:09 (12068): VM state change detected. (old = 'Running', new = 'Paused') 2020-06-07 01:29:16 (12068): VM state change detected. (old = 'Paused', new = 'Running') 2020-06-07 02:01:05 (12068): Status Report: Job Duration: '64800.000000' 2020-06-07 02:01:05 (12068): Status Report: Elapsed Time: '6000.992973' 2020-06-07 02:01:05 (12068): Status Report: CPU Time: '5208.796875' 2020-06-07 03:41:48 (12068): Status Report: Job Duration: '64800.000000' 2020-06-07 03:41:48 (12068): Status Report: Elapsed Time: '12000.992973' 2020-06-07 03:41:48 (12068): Status Report: CPU Time: '10897.078125' 2020-06-07 05:22:14 (12068): Status Report: Job Duration: '64800.000000' 2020-06-07 05:22:14 (12068): Status Report: Elapsed Time: '18000.992973' 2020-06-07 05:22:14 (12068): Status Report: CPU Time: '16861.296875' 2020-06-07 07:02:59 (12068): Status Report: Job Duration: '64800.000000' 2020-06-07 07:02:59 (12068): Status Report: Elapsed Time: '24000.992973' 2020-06-07 07:02:59 (12068): Status Report: CPU Time: '22545.656250' 2020-06-07 08:43:26 (12068): Status Report: Job Duration: '64800.000000' 2020-06-07 08:43:26 (12068): Status Report: Elapsed Time: '30001.494833' 2020-06-07 08:43:26 (12068): Status Report: CPU Time: '28544.906250' 2020-06-07 10:09:12 (12068): VM state change detected. (old = 'Running', new = 'Paused') 2020-06-07 10:16:02 (12068): VM state change detected. (old = 'Paused', new = 'Running') 2020-06-07 10:30:53 (12068): Status Report: Job Duration: '64800.000000' 2020-06-07 10:30:53 (12068): Status Report: Elapsed Time: '36001.494833' 2020-06-07 10:30:53 (12068): Status Report: CPU Time: '34246.296875' 2020-06-07 12:11:26 (12068): Status Report: Job Duration: '64800.000000' 2020-06-07 12:11:26 (12068): Status Report: Elapsed Time: '42001.494833' 2020-06-07 12:11:26 (12068): Status Report: CPU Time: '39958.921875' 2020-06-07 13:52:11 (12068): Status Report: Job Duration: '64800.000000' 2020-06-07 13:52:11 (12068): Status Report: Elapsed Time: '48001.494833' 2020-06-07 13:52:11 (12068): Status Report: CPU Time: '45994.109375' 2020-06-07 15:32:39 (12068): Status Report: Job Duration: '64800.000000' 2020-06-07 15:32:39 (12068): Status Report: Elapsed Time: '54001.494833' 2020-06-07 15:32:39 (12068): Status Report: CPU Time: '48189.859375' 2020-06-07 17:12:49 (12068): Status Report: Job Duration: '64800.000000' 2020-06-07 17:12:49 (12068): Status Report: Elapsed Time: '60001.494833' 2020-06-07 17:12:49 (12068): Status Report: CPU Time: '48252.187500' 2020-06-07 18:32:55 (12068): Powering off VM. |
4)
Message boards :
CMS Application :
CMS Tasks Failing
(Message 42809)
Posted 7 Jun 2020 by PaoloNasca Post: Just to share and to comment what happened to two failed CMS WUs. https://lhcathome.cern.ch/lhcathome/result.php?resultid=276498495 https://lhcathome.cern.ch/lhcathome/result.php?resultid=276571790 The VM has been being in "Running" state for 12 minutes (from 00:14:57 to 00:26:51). Then the VM has been being in "Paused" state for 18 hours (66684 seconds, from 00:26:51 to 18:38:07). After that, the VM has again been being in "Running" state for 12 minutes. At 18:50:23, the log shows the error: "Condor ended after 66684 second". I'm really surprised. 18 hours of "Paused" state, it's no sense. I explain me better. After 18 hours of inactivity some procedures could face some time-out issue. <core_client_version>7.16.7</core_client_version> <![CDATA[ <message> Funzione non corretta. (0x1) - exit code 1 (0x1)</message> <stderr_txt> 2020-06-07 00:14:39 (12788): Detected: vboxwrapper 26197 2020-06-07 00:14:39 (12788): Detected: BOINC client v7.7 2020-06-07 00:14:40 (12788): Detected: VirtualBox VboxManage Interface (Version: 5.2.8) ...omitted.... 2020-06-07 00:14:57 (12788): VM state change detected. (old = 'PoweredOff', new = 'Running') ...omitted.... 2020-06-07 00:19:02 (12788): Guest Log: [DEBUG] HTCondor ping 2020-06-07 00:19:03 (12788): Guest Log: [DEBUG] 0 2020-06-07 00:26:51 (12788): VM state change detected. (old = 'Running', new = 'Paused') 2020-06-07 18:38:07 (12788): VM state change detected. (old = 'Paused', new = 'Running') 2020-06-07 18:38:08 (12788): Guest Log: 00:10:40.055696 timesync vgsvcTimeSyncWorker: Radical host time change: 65 481 854 000 000ns (HostNow=1 591 547 868 663 000 000 ns HostLast=1 591 482 386 809 000 000 ns) 2020-06-07 18:38:18 (12788): Guest Log: 00:10:50.056476 timesync vgsvcTimeSyncWorker: Radical guest time change: 65 470 267 403 000ns (GuestNow=1 591 547 878 663 789 000 ns GuestLast=1 591 482 408 396 386 000 ns fSetTimeLastLoop=true ) 2020-06-07 18:50:23 (12788): Guest Log: [ERROR] Condor ended after 66684 seconds. 2020-06-07 18:50:23 (12788): Guest Log: [INFO] Shutting Down. |
5)
Message boards :
CMS Application :
CMS Tasks Failing
(Message 42398)
Posted 10 May 2020 by PaoloNasca Post: Today the error is about Condor https://lhcathome.cern.ch/lhcathome/result.php?resultid=272679370 Guest Log: [INFO] CMS application starting. Check log files. Guest Log: [DEBUG] HTCondor ping Guest Log: [DEBUG] 0 Guest Log: [ERROR] Condor ended after 1324 seconds. Guest Log: [INFO] Shutting Down. |
6)
Message boards :
CMS Application :
CMS Tasks Failing
(Message 42360)
Posted 1 May 2020 by PaoloNasca Post: I'm facing the same issue. All WUs from "CMS Simulation v50.00 (vbox64) windows_x86_64" application fail. https://lhcathome.cern.ch/lhcathome/result.php?resultid=272165131 ERROR: Couldn't read proxy from: /tmp/x509up_u0 globus_credential: Error reading proxy credential globus_credential: Error reading proxy credential: Couldn't read PEM from bio OpenSSL Error: pem_lib.c:703: in library: PEM routines, function PEM_read_bio: no start line Guest Log: Use -debug for further information. [ERROR] Could not get an x509 credential [ERROR] The x509 proxy creation failed. |
©2024 CERN