Message boards :
Number crunching :
Very long job
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Aug 05 Posts: 1 Credit: 718,585 RAC: 0 |
I have a WU that has been running for nearly 8 days, taking 4 cores all the while and is beyond its deadline. Others have finished much more quickly. Is this one broken? Will it finish or should I kill it and will I get any credit for it if I do? Many thanks. |
Send message Joined: 15 Jun 08 Posts: 2571 Credit: 258,777,947 RAC: 119,347 |
BOINC does not reward results that are returned after deadline + grace period. This task is most likely lost. If you want others to check the logfile you should make your computers visible and post a link to the faulty task. |
Send message Joined: 15 Jul 05 Posts: 26 Credit: 2,419,992 RAC: 412 |
looks like I have to a very long runner too https://lhcathome.cern.ch/lhcathome/result.php?resultid=283384624 So far I see a CPU time of 7d 18h at 74,6% -> estimated time to go 02d 09 h with a deadline of 15.11.2020 05:58 CET visible at the Client Progress is growing slowly and CPU usage for the task is at 100% Is it possible to extend the deadline for the result which is now 16.11.2020, 4:58:17 UTC Or should I cancel the result when the server deadline is over Matthias |
Send message Joined: 14 Jan 10 Posts: 1439 Credit: 9,624,852 RAC: 2,528 |
Or should I cancel the result when the server deadline is overThe task will stop after 10 days run time. |
Send message Joined: 10 Aug 11 Posts: 5 Credit: 1,279,291 RAC: 0 |
I have a task running for 4 days and 13 hours. It started reporting 13 hours to complete and now it says 1:39 minutes to complete. Taking something like 5 minutes per second to complete. I'm getting a little bit frustrated. Will it ever complete the task? I understand that I don't get any credits for it? Deadline is overdue since yesterday. But is my computer power wasted? I hope you don't use the same kind of timers in the real project. Are you really sure at LHC that you can handle this project? |
Send message Joined: 10 Aug 11 Posts: 5 Credit: 1,279,291 RAC: 0 |
I have a task running for 4 days and 13 hours. It started reporting 13 hours to complete and now it says 1:39 minutes to complete. Taking something like 5 minutes per second to complete. The task is: https://lhcathome.cern.ch/lhcathome/result.php?resultid=288895684 |
Send message Joined: 15 Jun 08 Posts: 2571 Credit: 258,777,947 RAC: 119,347 |
VBox Additions should be installed. Then you would be able to check the output from the VM consoles. In case of ATLAS console 2 has a more accurate timer based on the logs from the scientific app. Timers are influenced by the CPU throttle. Your CPU throttle is set to 50%. Vbox apps should be set to 100%. If it is necessary to limit CPU usage on your computer use other methods. BOINC provides a couple of them. |
Send message Joined: 10 Aug 11 Posts: 5 Credit: 1,279,291 RAC: 0 |
VBox Additions should be installed. Allright, I just thought the simulations would use the CPU performance measurement and the computation configuration in the forecast for remaining time. I deleted the very long task I had, configured to 100 % CPU use. But set to using 50 % of availabe CPU's. I hope thats more correct. And now I have an Atlas simulation, that started with 13 hours remaining. Now after 1 days of computing, it says 4 hours remaining. And each second in remaining time takes 5 seconds. Lets see how things develop ... The task is: https://lhcathome.cern.ch/lhcathome/result.php?resultid=289197986 And I have installed VBox Additions. How do I use it? |
Send message Joined: 15 Jun 08 Posts: 2571 Credit: 258,777,947 RAC: 119,347 |
BOINC has no interface to look into the VM. Hence, it doesn't know anything about the progress of the scientific app and presents unreliable fake values. Now, if VBox Additions are installed you can select a task in your BOINC Manager and then click on "show VM console". This opens a window showing console 1 from inside the VM. Switch through the VM consoles using ALT-F1..ALT-Fn. ATLAS has a progress monitoring at ALT-F2 (based on statistical values from the scientific logs but better than BOINC monitoring) and a TOP output at ALT-F3. Other apps show log output at ALT-F2. |
Send message Joined: 10 Aug 11 Posts: 5 Credit: 1,279,291 RAC: 0 |
I activated the Atlas simulation in VBox but nothing happens with Alt+F1, Alt+F2 etc. I've even tried the virtual keyboard. No reactions. Also the same on the other LHC simulations I have running in Boinc which are Theory simulations. One clue might be that on the Atlas simulation the virtual screen says This kernel requires an x86-64 CPU, but only detected an i686 CPU. The processor is unsupported in CentOs 7. My Cpu's are Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz |
Send message Joined: 15 Jun 08 Posts: 2571 Credit: 258,777,947 RAC: 119,347 |
The i7-3770 is a 64-bit CPU and has VT-x/VT-d See: https://ark.intel.com/content/www/us/en/ark/products/65719/intel-core-i7-3770-processor-8m-cache-up-to-3-90-ghz.html Your logfile states: Processor supports HW virtualization: no This may point out a wrong or missing BIOS setting. In addition you may check if all software packages are installed as 64-bit versions, especially: - the OS iself - BOINC - VirtualBox VirtualBox's kernel drivers may need to be recompiled. |
Send message Joined: 10 Aug 11 Posts: 5 Credit: 1,279,291 RAC: 0 |
I've checked. Everything is 64 bit. And is freshly rebuild. In fact I got the lastest kernel and virtualbox right today from Mageia project. hansmicheelsen@localhost ~]$ uname -a Linux localhost.localdomain 5.9.10-desktop-1.mga8 #1 SMP Sun Nov 22 13:48:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux [hansmicheelsen@localhost ~]$ rpm -qa | grep virtualbox virtualbox-guest-additions-6.1.16-1.mga8 dkms-virtualbox-6.1.16-1.mga8 virtualbox-kernel-5.9.10-desktop-1.mga8-6.1.16-10.mga8 virtualbox-6.1.16-1.mga8 virtualbox-kernel-desktop-latest-6.1.16-10.mga8 [hansmicheelsen@localhost ~]$ rpm -qa | grep boinc boinc-client-7.16.12-1.mga8 boinc-manager-7.16.12-1.mga8 Status: After 2 days of work done time remaining is 1 hour 3 minutes. And each second remaining takes 15 second to run. Geee, there is 3 days left before deadline. With 1 hour left to go I'm afraid I won't make it. I'll switch off Atlas projects for my boinc computing. It's waste of good CPU. |
Send message Joined: 12 Jul 11 Posts: 95 Credit: 1,129,876 RAC: 0 |
Or should I cancel the result when the server deadline is overThe task will stop after 10 days run time. Is it the case of this task ? </stderr_txt> 10 days running (on a Intel i9-10910 @ 3.60GHz) and a miserable ending... and no credit... |
Send message Joined: 14 Jan 10 Posts: 1439 Credit: 9,624,852 RAC: 2,528 |
...That sometimes happens. If you are not watching the VM's Console to see whether the job progress is useful and could abort when not, the system stops the job after 10 days to avoid more wasted time. In your case it's a pity that somehow the job restarted after 8 days, so we will not know if it could finish in the full 10 days even on your fast host: 2021-01-05 16:12:35 (78303): Guest Log: 16:12:38 CET +01:00 2021-01-05: cranky: [INFO] ===> [runRivet] Tue Jan 5 15:12:37 UTC 2021 [boinc pp jets 8000 250,-,4160 - sherpa 1.4.1 default 100000 170] 2021-01-13 22:02:24 (832): Guest Log: 16:12:38 CET +01:00 2021-01-05: cranky: [INFO] ===> [runRivet] Tue Jan 5 15:12:37 UTC 2021 [boinc pp jets 8000 250,-,4160 - sherpa 1.4.1 default 100000 170] Sherpa-jobs are known for more issues than jobs using other generators like pythia6 and pythia8. |
Send message Joined: 2 May 07 Posts: 2257 Credit: 174,393,697 RAC: 21,438 |
[boinc pp jets 8000 250,-,4160 - sherpa 1.4.1 default 100000 170] Sherpa's are very difficult. Sometime they finished correct, otherwise is the time limit at 10 days to find a end. We have to live with this problem, see other threats about Sherpa's. Sorry, CP, you are faster ;-)) |
Send message Joined: 12 Jul 11 Posts: 95 Credit: 1,129,876 RAC: 0 |
Unfortunately I am under macOS and the console access from a task was done through an extarnal application "cord" (if you click the console button in the boinc manager it is asking for cord to be installed) and this app is discontinued (not maintained / cannot be installed anymore), the website still exists and they recommend to use "freerdp" instead, I tried to install it (with homebrew) but I have no idea how to tell boinc to use that one instead... I think I could use the "graphic" button and then try to located the log in the webpage of the task, but they are tons... the logs are so complicated to interpred... it's basically a pain in the [biiip]. But if it's all the fault of sherpas, is there a way to ignore sherpa tasks (however these poor unloved guys may be) and run the pythia tasks only ? using some app_config maybe ? |
Send message Joined: 15 Jun 08 Posts: 2571 Credit: 258,777,947 RAC: 119,347 |
Unfortunately I am under macOS and the console access from a task was done through an extarnal application "cord" (if you click the console button in the boinc manager it is asking for cord to be installed) and this app is discontinued (not maintained / cannot be installed anymore), the website still exists and they recommend to use "freerdp" instead, I tried to install it (with homebrew) but I have no idea how to tell boinc to use that one instead... I'm not 100% sure but this might be hardwired in the BOINC client, hence it should be asked here: https://github.com/BOINC/boinc ...is there a way to ignore sherpa tasks... Task input is taken from mcplots. The data set currently in progress has 70981 different combinations of input parameters and event generators. 2141 (3 %) of them are sherpas (which is an event generator) and not all sherpas are long-runners. Each set is send out multiple times - the example from your post has #170: [boinc pp jets 8000 250,-,4160 - sherpa 1.4.1 default 100000 170] There's no function to ask for a specific parameter set or event generator. Each computer gets what is at the top position of the task queue. |
Send message Joined: 14 Jan 10 Posts: 1439 Credit: 9,624,852 RAC: 2,528 |
... and they recommend to use "freerdp" instead, I tried to install it (with homebrew) but I have no idea how to tell boinc to use that one instead... I don't know Darwin, but I suppose when you start freerdp it wants to know to which computer and port you want to connect. You may enter there localhost:portnr. portnr you can find on several places. The easiest way is in details of the Virtual Machine from VirtualBox Manager: Remote Desktop Server. There is the portnumber noted. |
©2025 CERN