Message boards : Theory Application : file_xfer_error
Message board moderation

To post messages, you must log in.

AuthorMessage
Ryan Munro

Send message
Joined: 17 Aug 17
Posts: 77
Credit: 6,120,681
RAC: 17,196
Message 49694 - Posted: 4 Mar 2024, 21:51:06 UTC

After about 10 days if work a these seemed to fail on completion, I saw a few down as failed and watched this one tick over to check and sure enough it failed on 100%
https://lhcathome.cern.ch/lhcathome/result.php?resultid=406318361

Its giving a file_xfer_error?

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>Theory_2687-2527341-808_1_r686114138_result</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>
</message>
]]>
ID: 49694 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 677
Credit: 43,756,866
RAC: 14,855
Message 49695 - Posted: 4 Mar 2024, 22:07:56 UTC
Last modified: 4 Mar 2024, 22:08:28 UTC

The default maximum run time for a Theory tasks is 10 days (same as deadline). After that it gets aborted automatically.
ID: 49695 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 58
Credit: 8,543,170
RAC: 14,152
Message 49696 - Posted: 5 Mar 2024, 1:56:27 UTC - in response to Message 49694.  

After about 10 days if work a these seemed to fail on completion, I saw a few down as failed and watched this one tick over to check and sure enough it failed on 100%
https://lhcathome.cern.ch/lhcathome/result.php?resultid=406318361

Its giving a file_xfer_error?

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>Theory_2687-2527341-808_1_r686114138_result</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>
</message>
]]>

If you look up a bit, you will find this:
2024-03-04 19:29:47 (2245072): Status Report: Job Duration: '864000.000000'
2024-03-04 19:29:47 (2245072): Status Report: Elapsed Time: '860427.000000'
2024-03-04 19:29:47 (2245072): Status Report: CPU Time: '4690.400000'


Note the CPU time. In 10 days, the CPU has been in use for barely one hour and 18 minutes.
Ordinarily, I think the CPU time should always lag elapsed time by less than one hour. After an hour, if CPU time doesn't increase very nearly as fast as elapsed time, personally I believe there is no point in keeping the task running -- just abort it and get another one.
ID: 49696 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 58
Credit: 8,543,170
RAC: 14,152
Message 49697 - Posted: 5 Mar 2024, 2:02:48 UTC - in response to Message 49695.  

The default maximum run time for a Theory tasks is 10 days (same as deadline). After that it gets aborted automatically.

I have never had any Theory task fail because it ran into the maximum run time. I have had numerous tasks run for 9 days and around 22 or 23 hours, then fail for no apparent reason. In all instances, I do not recall total CPU time ever being more than one hour behind elapsed time; moreover, the tasks have always run to about 99.95 completion, only to fail with a "computation error".
Memory on this next bit is a little foggy, but I do believe the most common reason for failure has been "too many results".
ID: 49697 · Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 17 Aug 17
Posts: 77
Credit: 6,120,681
RAC: 17,196
Message 49702 - Posted: 5 Mar 2024, 8:55:26 UTC - in response to Message 49696.  

Boinc hasn't been paused much in that time, the chip is a 3950x, any idea why its seemingly been idle then whilst reporting working?
ID: 49702 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1286
Credit: 8,515,710
RAC: 2,852
Message 49704 - Posted: 5 Mar 2024, 10:51:45 UTC - in response to Message 49702.  

Boinc hasn't been paused much in that time, the chip is a 3950x, any idea why its seemingly been idle then whilst reporting working?

This is the reason: 2024-02-23 14:23:35 (2189670): Guest Log: Probing /cvmfs/sft.cern.ch... Failed! - 2 minutes and 10 seconds after the start.

At that moment your system could not connect to CERN.

Unfortunately, the software is not written so that after ... retries the task is aborted automatically
ID: 49704 · Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 17 Aug 17
Posts: 77
Credit: 6,120,681
RAC: 17,196
Message 49708 - Posted: 5 Mar 2024, 11:45:52 UTC

But it still carried on for another 10 days before failing?
ID: 49708 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2120
Credit: 159,924,350
RAC: 80,112
Message 49709 - Posted: 5 Mar 2024, 12:01:47 UTC - in response to Message 49708.  

Under jobs - Theory in this website header,
you can checking how the results are for this Theory task.
ID: 49709 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1286
Credit: 8,515,710
RAC: 2,852
Message 49712 - Posted: 5 Mar 2024, 13:04:37 UTC - in response to Message 49708.  

But it still carried on for another 10 days before failing?
It did not really start, so it cannot fail.
The Virtual Machine for this task got a shutdown signal from vboxwrapper. Default is after 864.000 seconds (10 days),
but if you see a Theory task running without CPU it's better to kill such a task.
If you like hanky panky: such a task could be saved by
- suspend the task without leave in memory set - The task wll be saved to disk.
- remove the saved state with VirtualBox Manager
- start the task with VBox Manager
- After the task is processing his first events, stop the task with VBox Manager (save to disk)
- Start the task again with BOINC Manager.
ID: 49712 · Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 17 Aug 17
Posts: 77
Credit: 6,120,681
RAC: 17,196
Message 49715 - Posted: 5 Mar 2024, 15:07:20 UTC

Cheers, I think I might just stop doing theory altogether, the project seems a mess at the moment, even the native apps are failing almost instantly.
ID: 49715 · Report as offensive     Reply Quote

Message boards : Theory Application : file_xfer_error


©2024 CERN