Message boards :
Theory Application :
New Version v300.05
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
This version again displays the job description in the output. |
Send message Joined: 9 Feb 16 Posts: 48 Credit: 537,111 RAC: 0 |
I've just had two of these, and (once again) both tasks reported they required significantly more CPU time than was available before the deadline. I've aborted them. Is this problem going to be addressed (please)? |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 15,673 |
As always this is a miscalculation caused by the BOINC client/server, not by an arbitrary project. Don't abort the tasks, just let them run and finish*). After a few days the BOINC client/server will have enough data to adjust the values used for runtime estimation. *) If you get a longrunner that doesn't finish within a week (!) feel free to abort it. |
Send message Joined: 9 Feb 16 Posts: 48 Credit: 537,111 RAC: 0 |
The problem is they typically don't finish by the deadline, but when passed to another host that host may complete the task quickly, making leaving them to run a waste of host time and electricity. The tasks that say they will complete in a reasonable time do complete in a reasonable time. |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 15,673 |
Version 300.05 has been introduced this morning with a deadline at 2020-01-28. Enough time left to find out if the BOINC ETA is a fake or not. |
Send message Joined: 9 Feb 16 Posts: 48 Credit: 537,111 RAC: 0 |
No, it's not enough time. One Theory task I have at present says it will take over four days of CPU time. The host it is running on is powered up during work hours, ie around eight hours a day, five days a week (using spare CPU cycles is the intention behind BOINC). So four days of CPU time will take 12 working days plus two weekends, ie 16 days. The deadline is in ten days. Ten is quite a bit less than 16, and I haven't even taken into account doing CPU intensive tasks as part of my work. I don't mind if a task genuinely takes four days of CPU time (like CPDN tasks typically do), but the deadline needs to be suitably distant in the future. I did once leave one of these tasks to run beyond the deadline, but even once past the deadline it was still a couple of days away from completing, so I aborted it. Others on this forum have let these long-runner tasks run and run, only to have them fail. That suggests the solution is to fix the bug, rather than to extend the deadline. |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 15,673 |
Those estimations are always "fake" when a new app version is introduced (like in this case). And they are not "CPU time". Hence nobody can guarantee whether a longrunner will finish before the deadline or not. I agree that this behavior is not nice but it's also not an issue that can be solved at LHC@home. Instead it would have to be changed in the BOINC code: https://github.com/BOINC/boinc/issues |
Send message Joined: 9 Feb 16 Posts: 48 Credit: 537,111 RAC: 0 |
Well you can call it what you like, but the fact remains that those tasks that say they will complete in a reasonable time (a few hours), and stay that way, will complete within a few hours of computer run time. Those that suddenly jump from displaying a few hours to four days will run and run. Some people have let them run and run, only to find they fail, but I just abort them because I don't want to waste time and electricity (at my expense) on them. Typically when resent the receiving host completes them in a fraction of the time my host spent on them, and I'm not the only person seeing that behaviour. Since this wasn't previously a problem, it strongly suggests a bug has been introduced in the latest Theory tasks. |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 573 |
@Brummig: Already said to you several times: Don't watch in BOINC Manager the Remaining time. It is really useless. The Theory tasks all have a different run time, so the 4 days you see is just a placeholder jumping in after the task has run a while. If you don't trust the run times, monitor the consoles, where you can see how many events total have to be done (mostly 100,000) and how many events have been done so far. Have you monitored a task in the available Consoles when extension Pack is installed? Have you seen the average run time in MC Production? http://mcplots-dev.cern.ch/production.php?view=revision&rev=2363 Long runners are rare. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I have one which has been running fro two days. It should completed in another two days. CPU usage is nominal. Tullio |
Send message Joined: 9 Feb 16 Posts: 48 Credit: 537,111 RAC: 0 |
@Crystal Pellet: Yes, I monitored one for some time. There was no evidence of any progress, and after switching back and forth between displaying different information, it settled on saying that it had processed zero of zero events. I aborted it, and it went to another host that completed it in a fraction of the time my host had been chewing on it. Curiously, whilst that task ran frantically doing nothing, two tasks, after being aborted, reported the run time and CPU time as zero. For example, task 259168280 has a start timestamp of 2020-01-13 15:31:52. I aborted it at 16 Jan 2020, 8:32:03 UTC because it jumped to an extreme estimated completion time, but apparently it did absolutely nothing during the time it was supposedly running (a couple of hours). Task 259230427 was sent 14 Jan 2020, 12:55:32 UTC, and aborted 15 Jan 2020, 8:59:17 UTC. That second task has just this in the stderr output: <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> aborted by user</message> ]]> |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 573 |
Your task 259168280 shows 2020-01-13 17:54:33 (10536): VM state change detected. (old = 'Running', new = 'Paused') 2020-01-13 17:54:33 (10536): Stopping VM. 2020-01-14 08:47:15 (10536): VM did not stop when requested. 2020-01-14 08:47:15 (10536): VM was successfully terminated. 2020-01-14 08:47:56 (20284): Detected: vboxwrapper 26197 2020-01-14 08:47:56 (20284): Detected: BOINC client v7.7 and looks like a fast shutdown without time for the VM to save to contents to disk. After restart the job probably started from scratch until you aborted it :( It had over 5 hours of cpu-time. |
Send message Joined: 7 Jan 07 Posts: 41 Credit: 16,102,983 RAC: 3 |
Looking into runRivet.log, I got these last two lines: 59400 events processed Event 59500 ( 1d 7h 44m 17s elapsed / 21h 36m 12s left ) -> ETA: Tue Jan 21 17:43 Seems to progress flawlessly. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
Failed after 4 days 4 hours 24 min 4 seconds with xfer_error Tullio |
Send message Joined: 7 Jan 07 Posts: 41 Credit: 16,102,983 RAC: 3 |
Eventually, task 259592915 ended with success after 54 hours. |
Send message Joined: 18 Dec 15 Posts: 1823 Credit: 119,031,799 RAC: 16,871 |
Failed after 4 days 4 hours 24 min 4 seconds with xfer_errorthis morning, same thing here, after 4 days 4 hours 37 minutes !!! Rather annoying such faulty tasks :-( |
Send message Joined: 18 Dec 15 Posts: 1823 Credit: 119,031,799 RAC: 16,871 |
the next one: </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>Theory_2363-930665-16_0_r255393757_result</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> </message> failing after 4 days and 4 hours:-( https://lhcathome.cern.ch/lhcathome/result.php?resultid=259616872 what's going on over there? |
Send message Joined: 18 Dec 15 Posts: 1823 Credit: 119,031,799 RAC: 16,871 |
here the next one: https://lhcathome.cern.ch/lhcathome/result.php?resultid=259667718 after 4 days 4 hours !!! Quite some waste of CPU time :-( What the hell is this caused by? |
Send message Joined: 18 Dec 15 Posts: 1823 Credit: 119,031,799 RAC: 16,871 |
since most recently, I am noticing a strange behaviour on all my computers when downloading Theory tasks (all VM) : Although in the websettings I have the figure 8 for "max # of tasks", each of my hosts downloads only 2 tasks. Why so? Does anyone else experience the same thing? If not, what's going wrong here? |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 573 |
since most recently, I am noticing a strange behaviour on all my computers when downloading Theory tasks (all VM) :First answer this one, then your previous post. It looks like you get twice the setting in your preferences for Max # CPUs, so I suppose you have set there 1. It's the same old and odd behaviour for that preference setting. Try 'No limit' when you are only running VBox Theory and no ATLAS. You will get 16 tasks then. |
©2025 CERN