Message boards :
CMS Application :
CMS jobs failing at the LogArchive stage
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,101,197 RAC: 892 ![]() |
Some of you may have noticed that since Friday night the CMS@Home job graphs have been showing a 100% failure rate. This seems to be a failure in sending the job logs to storage at the CERN DataBridge: <@========== WMException Start ==========@> Exception Class: StageOutError Message: Command exited non-zero, ExitCode:112 Output: stdout: Sat Jul 10 13:41:48 UTC 2021 Copying 327892 bytes file:///srv/job/WMTaskSpace/logArch1/logArchive.tar.gz => https://data-bridge.cern.ch/myfed/cms-output/store/unmerged/logs/prod/2021/7/10/ireid_TC_SLC7_IDR_CMS_Home_210708_103444_8611/SinglePiE50HCAL_pythia8_2018_GenSimFull/0002/2/14aed0d0-6a1b-4993-a85a-cac211d51285-112-2-logArchive.tar.gz gfal-copy exit status: 112 ERROR: gfal-copy exited with 112 I've been unable to contact my colleagues at CERN (it's high holiday season in that part of Europe...), and I cannot connect to the DataBridge with my browser, so I've opened a problem ticket with CERN IT support. ![]() |
![]() Send message Joined: 28 Sep 04 Posts: 604 Credit: 37,009,942 RAC: 17,003 ![]() ![]() ![]() |
I have noticed that there are long pauses in the CPU activity during crunching. This happens when one set of jobs have finished and it should download a new set. This results in a 2...3 hours of difference in runtime and CPU time. This gives the CPU a breather while the summer temperatures are higher so not all in all a bad thing. All my CMS tasks are getting the usual credits though, so no errors in Boinc. ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,101,197 RAC: 892 ![]() |
Yes, we don't pass through all HTCondor exit codes to BOINC, so even if condor/WMAgent thinks a job has failed BOINC will still give you credit. The current failures are when archiving the .log.gz files. I'm not sure if the .root result files are getting stored -- I cannot log-into DataBridge to check. Still no response from CERN IT. ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 929 Credit: 6,101,197 RAC: 892 ![]() |
|
©2023 CERN