Message boards : Theory Application : On old PC's all WU runs almost exactly 18 hours: is it strange or normal? On modern PC's runtime is very different...
Message board moderation

To post messages, you must log in.

AuthorMessage
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 119
Credit: 51,287,231
RAC: 20,565
Message 37854 - Posted: 29 Jan 2019, 21:06:21 UTC

On old PC's all WU runs almost exactly 18 hours: is it strange or normal? On modern PC's runtime is very different...
ID: 37854 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,501,728
RAC: 4,157
Message 37855 - Posted: 29 Jan 2019, 21:36:22 UTC - in response to Message 37854.  

Normal

The main thing is we prefer them to be Valid tasks.
ID: 37855 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,928,861
RAC: 137,663
Message 37857 - Posted: 30 Jan 2019, 6:48:58 UTC - in response to Message 37854.  

Theory tasks are designed for a runtime of 12 h.
After that 12 h running subtasks will be given the time to finish but no new subtask will be started.
18 h is a hardwired limit checked by a watchdog to prevent crashed VMs from running forever.
The watchdog shuts down the VM even if a subtask is not yet finished.

This forced shutdown mostly affects sherpa jobs as they tend to have by far the longest runtimes.
As MAGIC already wrote the tasks are marked as valid (from the BOINC perspective) as it's not a volunteer's fault to shut down the VM.
ID: 37857 · Report as offensive     Reply Quote
Guiri-One[Andalucia]

Send message
Joined: 1 Feb 06
Posts: 66
Credit: 9,723
RAC: 0
Message 37860 - Posted: 30 Jan 2019, 9:11:49 UTC

Kucky you that some results returned valid...
https://lhcathome.cern.ch/lhcathome/result.php?resultid=214740242

After hours, by somehow of miracle, job was cancelled and sent to "heaven" with 0 credits & results.

Why? None knows.

As soon as I get Theory tasks, I cancel them. Atlas more or less works fine...
ID: 37860 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 119
Credit: 51,287,231
RAC: 20,565
Message 37861 - Posted: 30 Jan 2019, 9:29:59 UTC - in response to Message 37857.  
Last modified: 30 Jan 2019, 9:37:20 UTC

MAGIC, computezrmle, thank you for your answers.

I wonder, does old PC returns any useful result to the Theory Project if VM do not finish normally?

And is it true that I waste about 6 hours of processor’s time for nothing with each WU on old PC? Can I do something to do work more effectively?
ID: 37861 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,928,861
RAC: 137,663
Message 37862 - Posted: 30 Jan 2019, 12:24:47 UTC - in response to Message 37861.  

Good subtasks (= jobs) show the following pattern in your logfiles:
2019-01-28 19:53:45 (4064): Guest Log: [INFO] New Job Starting in slotx
2019-01-28 19:53:45 (4064): Guest Log: [INFO] Condor JobID:  487756.33 in slotx
2019-01-28 20:01:36 (4064): Guest Log: [INFO] Job finished in slotx with 0.

In case the watchdog shuts down the VM only the last unfinished subtask is lost but will be rescheduled.

To get a better average performance (Theory Simulation) you may run several tasks in parallel as 1-core setup instead of a single task as multicore.
This avoids idle cores between 12 h and 18 h runtime.

Unfortunately ATLAS interprets the websettings in a different way (wrong in my eyes) thus reducing the #cores will also reduce the #tasks your client can download. This especially affects modern computers with lots of cores. You can work around the negative impact if you use an app_config.xml or if you run extra BOINC clients.
ID: 37862 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 119
Credit: 51,287,231
RAC: 20,565
Message 37863 - Posted: 30 Jan 2019, 13:03:34 UTC - in response to Message 37862.  

computezrmle, thank you.

I'll find out this way.

Thanks again :-)
ID: 37863 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 37864 - Posted: 30 Jan 2019, 13:34:02 UTC - in response to Message 37862.  

...
You can work around the negative impact if you use an app_config.xml or if you run extra BOINC clients.
... or use a separate venue for your slower computers.

As computezrmle said: for Theory it's the best when you use only single core tasks (if you have enough memory).
ID: 37864 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 119
Credit: 51,287,231
RAC: 20,565
Message 37871 - Posted: 31 Jan 2019, 8:49:33 UTC - in response to Message 37864.  

Crystal Pellet, thank you.

And… Now I see, it is true: single core tasks is one and a half times more effective than multicore :-)
ID: 37871 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 119
Credit: 51,287,231
RAC: 20,565
Message 37872 - Posted: 31 Jan 2019, 9:00:13 UTC - in response to Message 37871.  

And now it becomes very interesting: is the same true for ATLAS?

I runs ATLAS on modern PCs and I want to do it the best way...
ID: 37872 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37874 - Posted: 31 Jan 2019, 9:50:29 UTC - in response to Message 37872.  

Yes, it is true for ATLAS too but ATLAS tasks take much more RAM than Theory so if the PC doesn't have enough RAM then you might not be able to run a single core ATLAS task on all cores concurrently. In such cases you might want to run 2-core ATLAS tasks.

For ATLAS VBox, RAM = 3000 MB + 900 MB * ncpus
For Theory VBox, RAM = 630 MB + 100 MB * ncpus

ncpus = number of cpus assigned to the task
ID: 37874 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 119
Credit: 51,287,231
RAC: 20,565
Message 37875 - Posted: 31 Jan 2019, 10:02:23 UTC - in response to Message 37874.  

bronco, thank you :-)

I'll experiment and share my experience here.
ID: 37875 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 119
Credit: 51,287,231
RAC: 20,565
Message 39010 - Posted: 31 May 2019, 17:06:24 UTC - in response to Message 37874.  

Yes, it is true for ATLAS too but ATLAS tasks take much more RAM than Theory so if the PC doesn't have enough RAM then you might not be able to run a single core ATLAS task on all cores concurrently. In such cases you might want to run 2-core ATLAS tasks.

For ATLAS VBox, RAM = 3000 MB + 900 MB * ncpus
For Theory VBox, RAM = 630 MB + 100 MB * ncpus

ncpus = number of cpus assigned to the task


I discovered a very strange situation... If I run 4-core ATLAS tasks, I can get only 4 tasks... If I run 3-core ATLAS tasks, I can get only 3 tasks... If I run 2-core ATLAS tasks, I can get only 2 tasks... Why so? How can I run, for example, 12 2-core ATLAS tasks at 24-core PC?
ID: 39010 · Report as offensive     Reply Quote

Message boards : Theory Application : On old PC's all WU runs almost exactly 18 hours: is it strange or normal? On modern PC's runtime is very different...


©2024 CERN