Message boards : News : Server outage - uploads failing
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 216
Credit: 4,649,252
RAC: 7,665
Message 41331 - Posted: 23 Jan 2020, 7:31:25 UTC

Due to a network problem in the CERN computer centre early Thursday morning, our BOINC servers have lost access to a storage cluster. Hence uploads are failing and access to web pages as well. Hopefully this should be fixed soon.
ID: 41331 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1345
Credit: 67,211,050
RAC: 95,935
Message 41332 - Posted: 23 Jan 2020, 9:25:00 UTC - in response to Message 41331.  

Thanks Nils.

Got fresh work for CMS and Theory but not yet for ATLAS.
Might be due to a huge number of work requests.
ID: 41332 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 875
Credit: 32,381,809
RAC: 46,055
Message 41333 - Posted: 23 Jan 2020, 9:42:57 UTC - in response to Message 41332.  
Last modified: 23 Jan 2020, 9:44:27 UTC

Thank you, Nils and your Team,
Yes, Atlas-native say, no tasks are avalaible.
Server-Status page for ATLAS 0 Tasks.
ID: 41333 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1345
Credit: 67,211,050
RAC: 95,935
Message 41335 - Posted: 23 Jan 2020, 9:59:52 UTC

Now I got an ATLAS task but it failed with "-186 (0xFFFFFF46) ERR_RESULT_DOWNLOAD":
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4387&postid=41334
ID: 41335 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1345
Credit: 67,211,050
RAC: 95,935
Message 41336 - Posted: 23 Jan 2020, 12:25:20 UTC

CMS tasks are starting fine and all of them get a subtask immediately after the initial setup but when the 1st subtask has finished they have problems getting a new one.
ID: 41336 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 875
Credit: 32,381,809
RAC: 46,055
Message 41341 - Posted: 24 Jan 2020, 6:12:52 UTC

Since Yesterday evening 20 UTC are some interrupts for up and download shown.
It need a manually reaction with Boinc transfer- try again.
ID: 41341 · Report as offensive     Reply Quote
Gunde

Send message
Joined: 9 Jan 15
Posts: 65
Credit: 318,629,602
RAC: 193,399
Message 41342 - Posted: 24 Jan 2020, 8:14:37 UTC

Atlas

WU download error: couldn't get input files:
<file_xfer_error>
<file_name>VlLMDmUZ6EwnsSi4apGgGQJmABFKDmABFKDmvjyKDmABFKDmCNuF5m_EVNT.19652175._000455.pool.root.1</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>
ID: 41342 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1345
Credit: 67,211,050
RAC: 95,935
Message 41344 - Posted: 24 Jan 2020, 8:35:56 UTC - in response to Message 41342.  

A link to the affected task or WU would have been helpful.
Guess it's this one:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=259798525

If you take look at the WU overview:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=131025881
you may notice that all of your wingmen get the same error.
The reason is that an input file is missing on the download server.
Since max#errors is set to 3 this WU will be send out 3 times until the server will automatically remove it from the queue.
ID: 41344 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 216
Credit: 4,649,252
RAC: 7,665
Message 41345 - Posted: 24 Jan 2020, 8:53:54 UTC - in response to Message 41342.  

One of our 3 file servers still had issues, it should be fixed now.
ID: 41345 · Report as offensive     Reply Quote
Gunde

Send message
Joined: 9 Jan 15
Posts: 65
Credit: 318,629,602
RAC: 193,399
Message 41346 - Posted: 24 Jan 2020, 16:07:31 UTC - in response to Message 41344.  

Yes for this error it this wu got affected. I was on break at work and short of time and didn't digg to much on history of other wu's or history into this on. My conclusion would be server would be in bad shape to send data as put out http error.
It was example and if one failed it would be several more this wu's would not be the only one.

I opt-out of Atlas as soon i saw it and now turn it back when i got home and my hosts download just fine now.
ID: 41346 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 394
Credit: 11,773,203
RAC: 6,517
Message 41347 - Posted: 24 Jan 2020, 16:36:29 UTC

At just that time, I picked up one of the usual errors "207 (0x000000CF) EXIT_NO_SUB_TASKS".
https://lhcathome.cern.ch/lhcathome/result.php?resultid=259641851

However, the next eight gave "194 (0x000000C2) EXIT_ABORTED_BY_CLIENT".
https://lhcathome.cern.ch/lhcathome/result.php?resultid=259690076

I had posted on this error message before, and I think this explains it.
ID: 41347 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 875
Credit: 32,381,809
RAC: 46,055
Message 41353 - Posted: 25 Jan 2020, 6:56:45 UTC

FTM some Download-Error are shown for Atlas and Atlas-native.
Sorry, but it is Weekend.
ID: 41353 · Report as offensive     Reply Quote

Message boards : News : Server outage - uploads failing


©2020 CERN