Message boards : ATLAS application : hits file upload fails immediately
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2534
Credit: 254,022,666
RAC: 46,571
Message 49700 - Posted: 5 Mar 2024, 7:17:13 UTC - in response to Message 49698.  

Let them finish and upload the smaller logfile.
When the huge logfile upload gets stuck, cancel that upload (only the upload, not the task!).
That way you may get credits for the task (worked for 2 of them from my hosts that got stuck recently).

The scientific work gets lost but may be rescheduled by the backend systems.
ID: 49700 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2243
Credit: 173,902,375
RAC: 1,652
Message 49701 - Posted: 5 Mar 2024, 7:43:04 UTC - in response to Message 49699.  

Are those Uploadfiles compressed before upload?
ID: 49701 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 248
Credit: 5,974,599
RAC: 0
Message 49703 - Posted: 5 Mar 2024, 9:08:30 UTC - in response to Message 49701.  

We have added swap on most of the file servers, so hopefully they should finish uploads and not crash.
ID: 49703 · Report as offensive     Reply Quote
Ken_g6

Send message
Joined: 4 Jul 06
Posts: 7
Credit: 339,475
RAC: 0
Message 49718 - Posted: 6 Mar 2024, 9:37:19 UTC - in response to Message 49703.  

Still not working on my 1.37GB file. I'll likely try canceling the upload tomorrow as suggested, if it still doesn't work.
ID: 49718 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2243
Credit: 173,902,375
RAC: 1,652
Message 49719 - Posted: 6 Mar 2024, 10:09:28 UTC - in response to Message 49718.  

Fehler Paket abgebrochen
This is shown in the details of the Task.
Saw the last days some tasks as wingman: Canceled from the Server.
So, thinking your Task is also one of them.
ID: 49719 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 248
Credit: 5,974,599
RAC: 0
Message 49720 - Posted: 7 Mar 2024, 9:08:06 UTC - in response to Message 49719.  

Sorry for this, we have added more file servers, so hopefully the upload for those tasks should work now if you retry.
ID: 49720 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 52
Credit: 63,618,302
RAC: 29,267
Message 49726 - Posted: 7 Mar 2024, 17:00:23 UTC - in response to Message 49720.  

Still failing the same way and still only for those 1.4G uploads while the smaller ones upload just fine.

I saw some WUs were aborted from server side two days ago. Example: https://lhcathome.cern.ch/lhcathome/result.php?resultid=407100028. Does that mean those WUs are just mistakenly generated and have no science value anyway? If that's the case for all other such big uploads, I feel we might as well just abort them. Personally I don't care much about credits if the results aren't meaningful anyway. Losing them is better than crashing upload server all the time.
ID: 49726 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 118
Credit: 12,573,537
RAC: 1,238
Message 49737 - Posted: 8 Mar 2024, 13:28:02 UTC - in response to Message 49726.  

Failing here, too. Using a local proxy. This is an extract from the BOINC log. From the "Payload too large" error
does it look as if a definite limit is being exceeded rather than something is running out of
memory?

[http] [ID#12] Sent header to server: Content-Type: application/x-www-form-urlencoded
[http] [ID#12] Sent header to server: Accept-Language: en_GB
[http] [ID#12] Sent header to server: Content-Length: 1486489603
[http] [ID#12] Sent header to server: Expect: 100-continue
[http] [ID#12] Sent header to server:
[http] [ID#12] Received header from server: HTTP/1.1 413 Payload Too Large
[http] [ID#12] Received header from server: Date: Fri, 08 Mar 2024 00:21:32 GMT
[http] [ID#12] Received header from server: Server: Apache
[http] [ID#12] Received header from server: Content-Type: text/html; charset=iso-8859-1
[http] [ID#12] Received header from server: X-Cache: MISS from Teec00
[http] [ID#12] Received header from server: X-Cache-Lookup: MISS from Teec00:3128
[http] [ID#12] Received header from server: Transfer-Encoding: chunked
[http] [ID#12] Received header from server: Via: 1.1 Teec00 (squid/3.5.12)
[http] [ID#12] Received header from server: Connection: keep-alive
[http] [ID#12] Info: HTTP error before end of send, stop sending
[http] [ID#12] Received header from server:
[http_xfer] [ID#12] HTTP: wrote 316 bytes
[http] [ID#12] Info: Closing connection 3
[file_xfer] http op done; retval -224 (permanent HTTP error)
[file_xfer] file transfer status -224 (permanent HTTP error)
Backing off 05:58:26 on upload of ID3NDmgTuz4np2BDcpmwOghnABFKDmABFKDm73LSDmv4hKDmP85t6n_1_r717556734_ATLAS_hits
ID: 49737 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 118
Credit: 12,573,537
RAC: 1,238
Message 49738 - Posted: 8 Mar 2024, 14:57:06 UTC - in response to Message 49737.  
Last modified: 8 Mar 2024, 15:00:57 UTC

I ran out of editing time...

I have changed the "client_request_buffer_max_size" setting in squid_conf to 1500 MB (was previously set at 10240 KB)
see here which applies to much later squid versions than that in use here and may be relevant.

A log extract is:-

Fri 08 Mar 2024 13:58:16 GMT | LHC@home | [http] [ID#17] Sent header to server: Accept-Language: en_GB
Fri 08 Mar 2024 13:58:16 GMT | LHC@home | [http] [ID#17] Sent header to server: Content-Length: 1483171612
Fri 08 Mar 2024 13:58:16 GMT | LHC@home | [http] [ID#17] Sent header to server: Expect: 100-continue
Fri 08 Mar 2024 13:58:16 GMT | LHC@home | [http] [ID#17] Sent header to server:
Fri 08 Mar 2024 13:58:16 GMT | LHC@home | [http] [ID#17] Received header from server: HTTP/1.1 100 Continue
Fri 08 Mar 2024 13:58:16 GMT | LHC@home | [http] [ID#17] Received header from server: Connection: keep-alive
Fri 08 Mar 2024 13:58:28 GMT | LHC@home | [http] [ID#17] Info: Recv failure: Connection reset by peer
Fri 08 Mar 2024 13:58:28 GMT | LHC@home | [http] [ID#17] Info: Closing connection 7
Fri 08 Mar 2024 13:58:28 GMT | LHC@home | [http] HTTP error: Failure when receiving data from the peer
Fri 08 Mar 2024 13:58:29 GMT | | Project communication failed: attempting access to reference site
Fri 08 Mar 2024 13:58:29 GMT | | [http] HTTP_OP::init_get(): http://www.google.com/
Fri 08 Mar 2024 13:58:29 GMT | LHC@home | [file_xfer] http op done; retval -184 (transient HTTP error)
Fri 08 Mar 2024 13:58:29 GMT | LHC@home | [file_xfer] file transfer status -184 (transient HTTP error)
Fri 08 Mar 2024 13:58:29 GMT | LHC@home | Temporarily failed upload of ID3NDmgTuz4np2BDcpmwOghnABFKDmABFKDm73LSDmv4hKDmP85t6n_1_r717556734_ATLAS_hits: transient HTTP error
Fri 08 Mar 2024 13:58:29 GMT | LHC@home | Backing off 04:35:21 on upload of ID3NDmgTuz4np2BDcpmwOghnABFKDmABFKDm73LSDmv4hKDmP85t6n_1_r717556734_ATLAS_hits

Which looks completely different... although the upload still fails.
ID: 49738 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 248
Credit: 5,974,599
RAC: 0
Message 49739 - Posted: 8 Mar 2024, 15:26:22 UTC - in response to Message 49671.  

More recent web servers may have a default limit of 1GB, while there was no limit in the past. We have increased the limit to 2 on our latest file servers, so uploads should normally work on the next attempt. However, if the squids are limited too, you might be blocked earlier.
ID: 49739 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 52
Credit: 63,618,302
RAC: 29,267
Message 49742 - Posted: 8 Mar 2024, 17:29:49 UTC - in response to Message 49739.  

Thank you for fixing this. I see my pending uploads start draining since a few hours ago. Cheers.
ID: 49742 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : ATLAS application : hits file upload fails immediately


©2024 CERN