Message boards : Number crunching : Task resends are not working properly
Message board moderation

To post messages, you must log in.

AuthorMessage
S. Dagorath

Send message
Joined: 7 Feb 13
Posts: 19
Credit: 1,478
RAC: 0
Message 25458 - Posted: 17 Feb 2013, 7:19:51 UTC

Some months ago it was decided to configure the server to issue task resends to fast, reliable hosts and decrease the deadline on resends. That does not seem to be working as it should. It seems to work in some cases but not others. Below I give you a case where the deadline was shortened followed by a case where the deadline was not shortened.

worked properly

The evidence is work unit 6486652 which began with the usual 2 replications on Feb. 4: 14113520 and 14113521. From the difference between the sent and expired datetime stamps we see the two tasks were issued with the standard 8 day deadline.

One of the resends was 14298947 which also timed out and from the difference between its two datetime stamps we see the deadline was ~4 days 8 hrs. This shows the deadline was reduced from the standard 8 days as it should be.

did not work properly

The evidence is work unit 6568028 which began with the standard 2 replications, one of which erred. The resend is task 14476672 and from it's issued datetime stamp of 15 Feb 2013, 21:27:20 UTC and deadline datetime stamp of 23 Feb 2013, 12:59:34 UTC we see it's deadline is 8 days which is not proper since it is a resend.

From the above two cases it seems the deadline is reduced for resends due to a "timed out-no response" result but not for resends due to a "compute error" result. Is that intended behavior, a bug or a misconfiguration?


ID: 25458 · Report as offensive     Reply Quote

Message boards : Number crunching : Task resends are not working properly


©2024 CERN