Thread 'taks 411433609 10 days/24h ......error'

Author	Message
Emmanuel Mar Send message Joined: 9 Feb 09 Posts: 54 Credit: 11,269,420 RAC: 11,881	Message 50367 - Posted: 9 Jun 2024, 23:36:56 UTC After 10 days/24 hours is what it showed me,(error) I have another task that takes 3 days and the period is also estimated to be 10 days. Should I cancel it or let it happen? ID: 50367 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1556 Credit: 10,100,748 RAC: 1,717	Message 50371 - Posted: 10 Jun 2024, 6:52:19 UTC - in response to Message 50367. It depends on what the Virtual Machine is doing inside. You may use 'Show VM Console' from BOINC Manager. You need VirtualBox Extension Pack installed to use it, else the button will be greyed. When you have a remote desktop to the Console use keystrokes ALT-F3 for CPU-usage of the VM (~100% when event processing is busy), ALT-F2 for the events processing progress and ALT-F1 shows you the job-info like generator and the number of events to be processed. Most of the times 100,000 events. Example of job descriptoion from your error task: cranky: [INFO] ===> [runRivet] Thu May 30 22:42:58 UTC 2024 [boinc pp z1j 8000 - - herwig++ 2.5.2 LHC-UE-EE-2-2760 100000 222] ID: 50371 · Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 101 Credit: 18,845,362 RAC: 10,341	Message 50385 - Posted: 10 Jun 2024, 21:00:04 UTC - in response to Message 50367. After 10 days/24 hours is what it showed me,(error) I have another task that takes 3 days and the period is also estimated to be 10 days. Should I cancel it or let it happen? The uploaded details for that failed task show a file transfer error: 2024-06-10 00:58:21 (14460): Status Report: Job Duration: '864000.000000' 2024-06-10 00:58:21 (14460): Status Report: Elapsed Time: '864000.554790' 2024-06-10 00:58:21 (14460): Status Report: CPU Time: '711710.531250' 2024-06-10 00:58:21 (14460): Powering off VM. 2024-06-10 00:58:22 (14460): Successfully stopped VM. 2024-06-10 00:58:22 (14460): Deregistering VM. (boinc_70006df3a91772f6, slot#0) 2024-06-10 00:58:22 (14460): Removing network bandwidth throttle group from VM. 2024-06-10 00:58:22 (14460): Removing VM from VirtualBox. 2024-06-10 00:58:27 (14460): called boinc_finish(0) </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>Theory_2743-2857705-222_0_r1232496514_result</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> </message> From this it appears that the task completed successfully, but there was an unrecoverable error in transmitting the results file to the LHC server. However, that is not certain. The task continued right up until the job limit of 864000 seconds (run time) and then stopped. There is no indication in the above that the task actually finished processing all the events. You may be able to find further details in the stdoutdae.txt file in the boinc directory. You say the other task has been running 3 days (maybe 4 now, since you posted this yesterday), which means it would have been sent to you probably on June 8. All the in progress Theory tasks listed for you were sent to you after 0600 UTC today, so I do not know what task you might be talking about. From experience, I can tell you that 3 days is far too early in this kind of situation for you to be making a decision on whether or not to abort the task -- except for one situation. Check the properties for this task and compare the CPU time and the elapsed time. These should be reasonably close to each other. For a task running several days, "reasonably close" might mean one or two hours. If this is the case, let the task keep running. If not, inspect the task properties twice, exactly 5 minutes apart. Each time, write down both times, CPU and elapsed. Now note the change in the CPU time. It should be within seconds of 5 minutes. If it is not, then the process has essentially stopped doing meaningful work towards completing the task, and you may abort it. ID: 50385 · Reply Quote

Emmanuel Mar Send message Joined: 9 Feb 09 Posts: 54 Credit: 11,269,420 RAC: 11,881	Message 50406 - Posted: 14 Jun 2024, 10:27:31 UTC - in response to Message 50385. unit work 223295691 I am not going to search for the file, in my opinion the task needed more time than the limit set and it did not end with a finished result to send. The task has 3/3 failed computers sent. ID: 50406 · Reply Quote