Message boards : Theory Application : taks 411433609 10 days/24h ......error
Message board moderation

To post messages, you must log in.

AuthorMessage
Emmanuel Mar
Avatar

Send message
Joined: 9 Feb 09
Posts: 25
Credit: 2,450,915
RAC: 1,821
Message 50367 - Posted: 9 Jun 2024, 23:36:56 UTC

After 10 days/24 hours is what it showed me,(error) I have another task that takes 3 days and the period is also estimated to be 10 days.

Should I cancel it or let it happen?
ID: 50367 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1422
Credit: 9,484,585
RAC: 573
Message 50371 - Posted: 10 Jun 2024, 6:52:19 UTC - in response to Message 50367.  

It depends on what the Virtual Machine is doing inside. You may use 'Show VM Console' from BOINC Manager.
You need VirtualBox Extension Pack installed to use it, else the button will be greyed.
When you have a remote desktop to the Console use keystrokes ALT-F3 for CPU-usage of the VM (~100% when event processing is busy), ALT-F2 for the events processing progress
and ALT-F1 shows you the job-info like generator and the number of events to be processed. Most of the times 100,000 events.
Example of job descriptoion from your error task: cranky: [INFO] ===> [runRivet] Thu May 30 22:42:58 UTC 2024 [boinc pp z1j 8000 - - herwig++ 2.5.2 LHC-UE-EE-2-2760 100000 222]
ID: 50371 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 92
Credit: 16,008,656
RAC: 5,452
Message 50385 - Posted: 10 Jun 2024, 21:00:04 UTC - in response to Message 50367.  

After 10 days/24 hours is what it showed me,(error) I have another task that takes 3 days and the period is also estimated to be 10 days.

Should I cancel it or let it happen?

The uploaded details for that failed task show a file transfer error:
2024-06-10 00:58:21 (14460): Status Report: Job Duration: '864000.000000'
2024-06-10 00:58:21 (14460): Status Report: Elapsed Time: '864000.554790'
2024-06-10 00:58:21 (14460): Status Report: CPU Time: '711710.531250'
2024-06-10 00:58:21 (14460): Powering off VM.
2024-06-10 00:58:22 (14460): Successfully stopped VM.
2024-06-10 00:58:22 (14460): Deregistering VM. (boinc_70006df3a91772f6, slot#0)
2024-06-10 00:58:22 (14460): Removing network bandwidth throttle group from VM.
2024-06-10 00:58:22 (14460): Removing VM from VirtualBox.
2024-06-10 00:58:27 (14460): called boinc_finish(0)

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>Theory_2743-2857705-222_0_r1232496514_result</file_name>
  <error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
</message>

From this it appears that the task completed successfully, but there was an unrecoverable error in transmitting the results file to the LHC server.
However, that is not certain. The task continued right up until the job limit of 864000 seconds (run time) and then stopped. There is no indication in the above that the task actually finished processing all the events.
You may be able to find further details in the stdoutdae.txt file in the boinc directory.
You say the other task has been running 3 days (maybe 4 now, since you posted this yesterday), which means it would have been sent to you probably on June 8. All the in progress Theory tasks listed for you were sent to you after 0600 UTC today, so I do not know what task you might be talking about.
From experience, I can tell you that 3 days is far too early in this kind of situation for you to be making a decision on whether or not to abort the task -- except for one situation. Check the properties for this task and compare the CPU time and the elapsed time. These should be reasonably close to each other. For a task running several days, "reasonably close" might mean one or two hours. If this is the case, let the task keep running. If not, inspect the task properties twice, exactly 5 minutes apart. Each time, write down both times, CPU and elapsed.
Now note the change in the CPU time. It should be within seconds of 5 minutes. If it is not, then the process has essentially stopped doing meaningful work towards completing the task, and you may abort it.
ID: 50385 · Report as offensive     Reply Quote
Emmanuel Mar
Avatar

Send message
Joined: 9 Feb 09
Posts: 25
Credit: 2,450,915
RAC: 1,821
Message 50406 - Posted: 14 Jun 2024, 10:27:31 UTC - in response to Message 50385.  

unit work 223295691



I am not going to search for the file, in my opinion the task needed more time than the limit set and it did not end with a finished result to send.

The task has 3/3 failed computers sent.
ID: 50406 · Report as offensive     Reply Quote

Message boards : Theory Application : taks 411433609 10 days/24h ......error


©2025 CERN