Message boards : CMS Application : Subtask Results don't upload
Message board moderation

To post messages, you must log in.

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1691
Credit: 103,248,283
RAC: 103,118
Message 43253 - Posted: 22 Aug 2020, 11:37:29 UTC

My CMS tasks are running fine but all attempts from various machines to upload the subtask results to data-bridge.cern.ch using gfal-copy fail.
No firewall complaints, hence I suspect a project side issue.
ID: 43253 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 751
Credit: 5,688,626
RAC: 0
Message 43259 - Posted: 23 Aug 2020, 11:44:51 UTC - in response to Message 43253.  

We are having an inordinate number of stage-out failures, both for result files and log files, it seems. I've messaged Laurence.
ID: 43259 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 751
Credit: 5,688,626
RAC: 0
Message 43260 - Posted: 23 Aug 2020, 20:07:23 UTC - in response to Message 43259.  
Last modified: 23 Aug 2020, 20:08:14 UTC

We are definitely having problems at the moment -- it seemed to start around 2100 UTC Friday night. All stage-outs (uploads of data and logs) are failing, although this is not reflected in your user credits. I've been trying to find out what's wrong but given that it's a Sunday in August in Europe, there has yet to be any response. If you can, change to another project while we track this down, to minimise our failed traffic load.
ID: 43260 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 751
Credit: 5,688,626
RAC: 0
Message 43276 - Posted: 25 Aug 2020, 15:07:18 UTC - in response to Message 43259.  

I have now opened a ticket on this with CERN IT.
ID: 43276 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 751
Credit: 5,688,626
RAC: 0
Message 43279 - Posted: 26 Aug 2020, 10:59:49 UTC - in response to Message 43276.  

I have now opened a ticket on this with CERN IT.

There was a problem with the DataBridge which now seems to have been resolved. The failure rate has been rather more sensible since about 0730 GMT.
ID: 43279 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1691
Credit: 103,248,283
RAC: 103,118
Message 43281 - Posted: 26 Aug 2020, 16:26:24 UTC

Since 11:15 UTC I had 6 successful subtasks from different computers and all correctly uploaded their results and logs.

Thanks Ivan.
ID: 43281 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1691
Credit: 103,248,283
RAC: 103,118
Message 43282 - Posted: 26 Aug 2020, 20:46:01 UTC

The failure rate is again at 100% since 19:24 UTC.
Sorry Ivan.
This requires more investigation.
ID: 43282 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 751
Credit: 5,688,626
RAC: 0
Message 43283 - Posted: 26 Aug 2020, 21:40:53 UTC - in response to Message 43282.  
Last modified: 26 Aug 2020, 21:41:55 UTC

The failure rate is again at 100% since 19:24 UTC.
Sorry Ivan.
This requires more investigation.

Yes, I've just noticed that. I've updated the incident ticket. Bummer... I haven't heard yet exactly what caused the initial problem.
ID: 43283 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 751
Credit: 5,688,626
RAC: 0
Message 43285 - Posted: 27 Aug 2020, 9:33:00 UTC - in response to Message 43283.  

The failure rate is again at 100% since 19:24 UTC.
Sorry Ivan.
This requires more investigation.

Yes, I've just noticed that. I've updated the incident ticket. Bummer... I haven't heard yet exactly what caused the initial problem.

Apparently an automatic update broke things again. It's working now, and updates have been disabled...
ID: 43285 · Report as offensive     Reply Quote

Message boards : CMS Application : Subtask Results don't upload


©2021 CERN