Message boards :
Number crunching :
Error reported by file upload server: Server is out of disk space !?
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Dec 05 Posts: 57 Credit: 835,284 RAC: 0 |
hello dear experts, since this morning I got the following error messages and I couldn't upload my results: 29.01.2015 09:09:19 | LHC@home 1.0 | Started upload of w1_jobhllhc10_sflathv_000_w1__26__s__62.31_60.32__24_26__5__37.5_1_sixvf_boinc10400_1_0 29.01.2015 09:09:19 | LHC@home 1.0 | Started upload of w1_jobhllhc10_sflathv_000_w1__27__s__62.31_60.32__4_6__5__67.5_1_sixvf_boinc10415_0_0 29.01.2015 09:09:21 | LHC@home 1.0 | [error] Error reported by file upload server: Server is out of disk space 29.01.2015 09:09:21 | LHC@home 1.0 | Temporarily failed upload of w1_jobhllhc10_sflathv_000_w1__26__s__62.31_60.32__24_26__5__37.5_1_sixvf_boinc10400_1_0: transient upload error 29.01.2015 09:09:21 | LHC@home 1.0 | Backing off 00:11:50 on upload of w1_jobhllhc10_sflathv_000_w1__26__s__62.31_60.32__24_26__5__37.5_1_sixvf_boinc10400_1_0 29.01.2015 09:09:22 | LHC@home 1.0 | Started upload of w1_jobhllhc10_sflathv_000_w1__27__s__62.31_60.32__6_8__5__45_1_sixvf_boinc10423_0_0 29.01.2015 09:09:23 | LHC@home 1.0 | [error] Error reported by file upload server: can't write file /data/boinc/project/sixtrack/upload/8c/w1_jobhllhc10_sflathv_000_w1__27__s__62.31_60.32__4_6__5__67.5_1_sixvf_boinc10415_0_0: No space left on server 29.01.2015 09:09:23 | LHC@home 1.0 | [error] Error reported by file upload server: Server is out of disk space 29.01.2015 09:09:23 | LHC@home 1.0 | Temporarily failed upload of w1_jobhllhc10_sflathv_000_w1__27__s__62.31_60.32__4_6__5__67.5_1_sixvf_boinc10415_0_0: transient upload error 29.01.2015 09:09:23 | LHC@home 1.0 | Backing off 00:09:30 on upload of w1_jobhllhc10_sflathv_000_w1__27__s__62.31_60.32__4_6__5__67.5_1_sixvf_boinc10415_0_0 29.01.2015 09:09:23 | LHC@home 1.0 | Temporarily failed upload of w1_jobhllhc10_sflathv_000_w1__27__s__62.31_60.32__6_8__5__45_1_sixvf_boinc10423_0_0: transient upload error 29.01.2015 09:09:23 | LHC@home 1.0 | Backing off 00:13:44 on upload of w1_jobhllhc10_sflathv_000_w1__27__s__62.31_60.32__6_8__5__45_1_sixvf_boinc10423_0_0 On the server status page there is no probroblem marked, everything is green. What happened and how long will it take to fix it ? Thanks for help Dr.Mabuse |
Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0 |
Getting two variants of the same thing: 29/01/2015 08:48:47 | LHC@home 1.0 | [error] Error reported by file upload server: Server is out of disk space 29/01/2015 08:50:56 | LHC@home 1.0 | [error] Error reported by file upload server: can't write file /data/boinc/project/sixtrack/upload/ea/w200_HLLHC_RFcav_scanb3_50000.BOINC__3__s__62.31_60.32__16_18__5__28.5_1_sixvf_boinc668_1_0: No space left on server Just a side effect of the large volume of work we've been doing recently. Some uploads are being accepted, presumably as older tasks are processed and files are deleted. |
Send message Joined: 17 Oct 07 Posts: 5 Credit: 177,594 RAC: 0 |
Uploads are stalling - possibly due to ore smaller files being returned and the server cant keep up in processing? The Einstein project had a problem with uploads - thing that was due to number of files. |
Send message Joined: 30 Dec 05 Posts: 57 Credit: 835,284 RAC: 0 |
It does the uploading now, about 1 file per hour. |
Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0 |
Uploads are stalling - possibly due to ore smaller files being returned and the server cant keep up in processing? Unfortunately, the files seem to be all the same size, whether the task runs for seconds or several hours. We just fill up the space much more quickly if a lot of short-running tasks are passing through the system. It probably helps (marginally) if we report any tasks which have successfully made it through the uploading stage, so that the next stage of processing can take place and the space can be freed up. The Einstein project had a problem with uploads - thing that was due to number of files. The problem at Einstein was that the server disk file system became incredibly slow - it was taking up to 8 seconds to locate a free 'inode' so that uploaded files could be stored and indexed. Given the thousands of files that a server needs to process, that slowed everything down to a crawl. |
Send message Joined: 17 Feb 07 Posts: 86 Credit: 968,855 RAC: 0 |
Early this morning I was able to upload by manual intervention. Now at 11:31UTC that is no longer working. So perhaps someone need to look at he process. Greetings from, TJ |
Send message Joined: 17 Oct 07 Posts: 5 Credit: 177,594 RAC: 0 |
Yeah, same problem - capacity. Suggest no new tasks be sent out to clear the backlog which also gives time to extend filesystem one way or another. |
Send message Joined: 24 Jul 05 Posts: 56 Credit: 5,602,899 RAC: 0 |
Me too! 1/29/2015 4:29:31 AM | LHC@home 1.0 | [error] Error reported by file upload server: Server is out of disk space Let's crunch for our future. |
Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0 |
I fear that we may already have reached catch-22... Most of the work I've succeeded in uploading this morning has gone into 'pending', and is waiting for a wingmate to upload their work so it can validate and move on. And I've got 55 completed tasks backed up, waiting to upload so they can validate somebody else's work. But how can get the two queues ever to meet each other? |
Send message Joined: 17 Oct 07 Posts: 5 Credit: 177,594 RAC: 0 |
I've emailed these guys to see who can help or at least contact the correct person to help •Eric McIntosh (eric.mcintosh@cern.ch) •Harry Renshall (harry.renshall@cern.ch) - CERN BE-ABP-LCU - SixTrack expert •Frank Schmidt (frank.schmidt@cern.ch) - CERN BE-ABP-ICE - SixTrack author and co-author tracking environment •Igor Zacharov (igor.zacharov@gmail.com) - EPFL - Boinc system expert • Massimo Giovannozzi (massimo.giovannozzi@cern.ch) - CERN BE-ABP-LCU - Responsible of LHC Commissioning and Upgrade Section |
Send message Joined: 17 Oct 07 Posts: 5 Credit: 177,594 RAC: 0 |
I've emailed the below to see if they can get the correct person to investigate/fix •Eric McIntosh (eric.mcintosh@cern.ch) •Harry Renshall (harry.renshall@cern.ch) - CERN BE-ABP-LCU - SixTrack expert •Frank Schmidt (frank.schmidt@cern.ch) - CERN BE-ABP-ICE - SixTrack author and co-author tracking environment •Igor Zacharov (igor.zacharov@gmail.com) - EPFL - Boinc system expert • Massimo Giovannozzi (massimo.giovannozzi@cern.ch) - CERN BE-ABP-LCU - Responsible of LHC Commissioning and Upgrade Section |
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
The transitioner, file deleter, database purger, the assimilators, and the test work unit validator have gone down as of this writing. |
Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0 |
I've had a PM reply from Eric Mcintosh I have reported to CERN BOINC support. |
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
Could the problem be that the server is out of inodes? (In case you are wondering, an inode is a data structure that contains data about one file like file attributes and pointers to file block locations or to other pointers that point to either file block locations or other pointers to file block locations as needed depending on the file's size.) Since we are still able to post messages to the message board (which is on the same computer as the upload server according to the server status page), I don't think that the disks are out of free disk space blocks. As for Einstein@home, its biggest problem was that its table of inodes got completely used, so its server started having to search the inode table to find a free inode to handle a file upload, with each search taking 8 seconds. Einstein@home's long term solution will be to reformat the crippled server with a newer version of XFS that allows free inodes to be tracked with a b-tree. While this creates overhead in consuming an inode to create a file or releasing an inode when deleting a file because the b-tree needs to be maintained, searching a b-tree for free inodes when there are no unused inodes left is much cheaper than searching the inode table row by row for a free inode. |
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
My uploads just went through. |
Send message Joined: 1 Jan 09 Posts: 32 Credit: 1,106,567 RAC: 0 |
Mine too. |
Send message Joined: 9 Jan 08 Posts: 66 Credit: 727,923 RAC: 0 |
Same here :) |
Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0 |
And the ones I got back in return all seem to be short-running, so I'm already generating new uploads to fill up the new disk (or cluster filesystem quota, as I suspect it may be). I hope the CERN admins are in a position to keep an eye on it overnight. |
Send message Joined: 9 Jan 08 Posts: 66 Credit: 727,923 RAC: 0 |
I got a lot of long ones that are resends, so that will atleast clear some up. |
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
Reporting work units won't help clear out the old files because the transitioner is down as of this writing. Therefore, the validator will not know that it needs to validate any files. The assimilator cannot copy good results into the database due to not knowing it needs to do its job. The file deleter cannot delete any files due to not knowing that there are files that need removal. |
©2024 CERN