Message boards : News : File upload issues
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 158
Credit: 1,613,374
RAC: 1,638
Message 34038 - Posted: 22 Jan 2018, 15:39:12 UTC
Last modified: 22 Jan 2018, 15:45:59 UTC

As part of our cleanup campaign, BOINC antique_file_deleter made our NFS server hit the limit of maximum number of open files. Now the NFS server should accept connections again.

We are trying to debug this intermittent file upload issue. During our debugging, we will stop upload for short periods.

Files will eventually upload, please remain patient and sorry for this.

We will also have more server reboots over the next days as Ivan mentions.
ID: 34038 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 158
Credit: 1,613,374
RAC: 1,638
Message 34045 - Posted: 23 Jan 2018, 9:25:12 UTC

The underlying cause of the NFS server saturation is that files are left open when the BOINC file upload handler script times out. When a number of BOINC clients retry failed uploads frequently, the effect on our file servers is similar to a denial of service attack. It seems that our move to a load-balanced cluster some time back to increase capacity simply moved the bottleneck to the NFS storage layer. We will need to change our system architecture to get a permanent fix.
ID: 34045 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 315
Credit: 43,707,464
RAC: 34,923
Message 34090 - Posted: 26 Jan 2018, 8:29:16 UTC

Are you aware that the file-server again actual reports:

LHC@home 26-01-2018 09:25 [error] Error reported by file upload server: [Lm9LDm3i3yrnDDn7oo6G73TpABFKDmABFKDmpdKKDmABFKDm0Izqbn_0_r1695784418_ATLAS_result] locked by file_upload_handler PID=-1


Supporting BOINC, a great concept !
ID: 34090 · Report as offensive     Reply Quote
dduggan47

Send message
Joined: 1 Sep 04
Posts: 9
Credit: 1,195,231
RAC: 1,555
Message 34102 - Posted: 26 Jan 2018, 16:09:16 UTC - in response to Message 34090.  

Downloads now hung up as well.
ID: 34102 · Report as offensive     Reply Quote
dduggan47

Send message
Joined: 1 Sep 04
Posts: 9
Credit: 1,195,231
RAC: 1,555
Message 34104 - Posted: 26 Jan 2018, 16:46:11 UTC - in response to Message 34102.  

Downloads now hung up as well.


Never mind!
ID: 34104 · Report as offensive     Reply Quote
Speedy

Send message
Joined: 28 Jul 05
Posts: 31
Credit: 157,796
RAC: 265
Message 34110 - Posted: 27 Jan 2018, 8:32:26 UTC - in response to Message 34090.  

Are you aware that the file-server again actual reports:

LHC@home 26-01-2018 09:25 [error] Error reported by file upload server: [Lm9LDm3i3yrnDDn7oo6G73TpABFKDmABFKDmpdKKDmABFKDm0Izqbn_0_r1695784418_ATLAS_result] locked by file_upload_handler PID=-1

I get the following messages
27-Jan-18 9:05:18 PM | LHC@home | Temporarily failed upload of dGfLDmQvTzrnDDn7oo6G73TpABFKDmABFKDmy2KKDmABFKDmoDT3rm_0_r1755418522_ATLAS_result: transient HTTP error
27-Jan-18 9:05:18 PM | LHC@home | Backing off 00:27:35 on upload of dGfLDmQvTzrnDDn7oo6G73TpABFKDmABFKDmy2KKDmABFKDmoDT3rm_0_r1755418522_ATLAS_result
27-Jan-18 9:05:20 PM | | Internet access OK - project servers may be temporarily down.

Have A Crunching Good day
ID: 34110 · Report as offensive     Reply Quote
Speedy

Send message
Joined: 28 Jul 05
Posts: 31
Credit: 157,796
RAC: 265
Message 34134 - Posted: 28 Jan 2018, 20:54:07 UTC - in response to Message 34110.  

Are you aware that the file-server again actual reports:

LHC@home 26-01-2018 09:25 [error] Error reported by file upload server: [Lm9LDm3i3yrnDDn7oo6G73TpABFKDmABFKDmpdKKDmABFKDm0Izqbn_0_r1695784418_ATLAS_result] locked by file_upload_handler PID=-1

I get the following messages
27-Jan-18 9:05:18 PM | LHC@home | Temporarily failed upload of dGfLDmQvTzrnDDn7oo6G73TpABFKDmABFKDmy2KKDmABFKDmoDT3rm_0_r1755418522_ATLAS_result: transient HTTP error
27-Jan-18 9:05:18 PM | LHC@home | Backing off 00:27:35 on upload of dGfLDmQvTzrnDDn7oo6G73TpABFKDmABFKDmy2KKDmABFKDmoDT3rm_0_rMy upload has cleared1755418522_ATLAS_result
27-Jan-18 9:05:20 PM | | Internet access OK - project servers may be temporarily down.

My upload has cleared

Have A Crunching Good day
ID: 34134 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 251
Credit: 7,342,289
RAC: 10,601
Message 34135 - Posted: 28 Jan 2018, 21:09:13 UTC

My uploads have cleared too. Maybe it is because Atlas tasks out in the field has dropped to 5900?
ID: 34135 · Report as offensive     Reply Quote
Speedy

Send message
Joined: 28 Jul 05
Posts: 31
Credit: 157,796
RAC: 265
Message 34136 - Posted: 29 Jan 2018, 2:59:16 UTC - in response to Message 34135.  
Last modified: 29 Jan 2018, 3:00:45 UTC

My uploads have cleared too. Maybe it is because Atlas tasks out in the field has dropped to 5900?

Maybe results out in the field have dropped to 5390. Currently currently there is none of these these tasks to send out. Perhaps there wanting to clear some space of the discs? I certainly know upload speeds have increased from like 86 kilobytes a second last night New Zealand time to over 400 this afternoon. A much nicer speed.

Have A Crunching Good day
ID: 34136 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : News : File upload issues


©2018 CERN