21) Message boards : CMS Application : Larger jobs in the pipeline (Message 46149)
Posted 2 Feb 2022 by Profile rbpeake
Post:
The goal is to have these longer running jobs become the standard so that jobs are processed more efficiently (start-up time allocated over a larger number of processed jobs)?
Thank you.
22) Message boards : CMS Application : Why Such Varied Runtimes? (Message 45488)
Posted 20 Oct 2021 by Profile rbpeake
Post:
Thank you for the detailed reply. It’s interesting to know that jobs will keep coming up to a 12-hour age for the VM.
Those recent CMS failures were due to a reboot. I’ll be more careful next time!👍
23) Message boards : CMS Application : Why Such Varied Runtimes? (Message 45477)
Posted 20 Oct 2021 by Profile rbpeake
Post:
I'm just curious why work unit runtimes vary so much for CMS (on one consistent machine)?
Thanks.
24) Message boards : ATLAS application : Native version 2.80 - no longer requiring python (Message 41311)
Posted 20 Jan 2020 by Profile rbpeake
Post:
I am a Windows user but will run Linux within a VM should the Linux software contain all the necessary code to run Atlas right out of the box. If that is more efficient, that is.

Thanks.
25) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 40345)
Posted 31 Oct 2019 by Profile rbpeake
Post:
Tasks do nothing and end with error after 15-20 minutes.
which is typical for "no sub-tasks available"

Yes, sorry about that. We seem to have a problem with queued jobs: there are two queues for each batch submitted, "queued" and "pending", both of which I believe have a 2,000-job limit. When I submit a batch, recently of 10,000 jobs each, jobs are created if necessary up to that number, to fill the "queued" queue; contemporaneously, jobs are moved from "queued" to "pending" until it also is full. In recent weeks, apparently since there was an interruption to the DNS service at CERN, there seems to have been a disruption in taking jobs from the "pending" queue and allocating them to worker machines -- only a small fraction get sent.
What happened last night was that the current batch's "queued" queue drained, and job allocation from the "pending" jobs dropped off (there are currently 1200 pending and 13 running; in the previous batch 232 are still pending and 36 running!). At CMS IT's suggestion, I've been playing around with batch priority but that's had no perceptible effect. I'll have to just make sure that I submit new batches before the "queued" queue drains -- there's a new batch on its way so things should pick up again soon.
I'll contact CERN again and suggest they restart the Condor scheduler.

At what point will CERN be able to generate CMS jobs themselves, so you would not be required to submit batches?
26) Message boards : CMS Application : CMS Tasks Failing (Message 38483)
Posted 28 Mar 2019 by Profile rbpeake
Post:
Thanks!
27) Message boards : CMS Application : CMS Tasks Failing (Message 38478)
Posted 28 Mar 2019 by Profile rbpeake
Post:
Do the CMS units fail if BOINC is exited and started again? I had to reboot my computer today and yesterday.
Thanks.
https://lhcathome.cern.ch/lhcathome/results.php?userid=1080&offset=0&show_names=0&state=6&appid=11
28) Message boards : ATLAS application : Guide for building everything from sources to run native ATLAS on Debian 9 (Stretch) Version 2 (Message 38071)
Posted 27 Feb 2019 by Profile rbpeake
Post:
Presumably it does not make sense to run the native ATLAS app if I need to do it on a Windows machine by using Ubuntu in a virtualbox?

Thanks!
29) Message boards : ATLAS application : ATLAS issues (Message 38068)
Posted 26 Feb 2019 by Profile rbpeake
Post:
...Remember ATLAS was not initially developed for BOINC. It was developed for a vast network of many thousands of non-BOINC hosts on CERN's own non-BOINC distributed computing network. BOINC is just an after thought, a mere add on attached to their DC network with bits of glue and twine. BOINC delivers up only a few hundreds of hosts that together return a miniscule amount of work compared to CERN's own DC network. While they certainly appreciate our contributions there is a limit on how much effort they should invest in keeping less capable BOINC hosts happy.

Although from this table, https://lhcathome.cern.ch/lhcathome/atlas_job.php Boinc is not an insignificant contributor to the overall ATLAS effort. :)
30) Message boards : Number crunching : Boinc memory estimate and LHC Settings (Message 35025)
Posted 16 Apr 2018 by Profile rbpeake
Post:
They should increase beyond 8 the maximum number of CPUs and the maximum number of work units as options to choose.

I can run 7 instances of 3-unit Atlas tasks on 21 cores. But I cannot run more than 8 2-core tasks on the same machine.
31) Message boards : ATLAS application : ATLAS Queue is empty (Message 35015)
Posted 14 Apr 2018 by Profile rbpeake
Post:
Seems like this is happening again.
32) Message boards : ATLAS application : new series of ATLAS tasks - runtime 493 secs ? (Message 34958)
Posted 11 Apr 2018 by Profile rbpeake
Post:
... Is it this statement that indicates success?
2018-04-11 01:37:31 (7056): Guest Log: Successfully finished the ATLAS job!

It depends on the perspective.
"Guest Log: Successfully finished the ATLAS job!" indicates a successful end of the BOINC task and your host will most likely get credits.

From the science perspective a file like "HITS.*" contains the results and should be included in the upload.
This name normally appears in the directory listing that is included in the stderr.txt and can be checked there.

Thanks! Found it noted in 4 places, including twice here:
2018-04-11 01:37:31 (7056): Guest Log: HITS.13684178._016629.pool.root.1 srm://srm.ndgf.org:8443;autodir=no;spacetoken=ATLASDATADISK/srm/managerv2?SFN=/atlas/disk/atlasdatadisk/rucio/mc16_13TeV/f3/ae/HITS.13684178._016629.pool.root.1:checksumtype=adler32:checksumvalue=b9f476d
33) Message boards : ATLAS application : new series of ATLAS tasks - runtime 493 secs ? (Message 34955)
Posted 11 Apr 2018 by Profile rbpeake
Post:
again I had an ATLAS task with runtime of about 10 minutes:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186994334

Very strange.
Even your valid tasks show incomplete logs:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186915983
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186900508
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186900791

You may consider to stop your BOINC client, clean the VirtualBox environment and do a project reset before you start a fresh task.

Is it this statement that indicates success?
2018-04-11 01:37:31 (7056): Guest Log: Successfully finished the ATLAS job!
34) Message boards : CMS Application : Result stage-out failures (Message 34825)
Posted 31 Mar 2018 by Profile rbpeake
Post:

We're in a transitional stage at the moment, unfortunately with Easter looming, but the good bit of news I can impart is that as of a day or two ago we are writing merged result files into the T2_CH_CERN storage -- which means they are now available world-wide to anyone in CMS who wants to use them. *NOW* the challenge is to get workflows that suit our capabilities and limitations and produce results that end up in papers we can all point to with pride!

How were the results used before this?
Thanks!
35) Message boards : ATLAS application : Draining ATLAS tasks ahead of storage migration (Message 34824)
Posted 30 Mar 2018 by Profile rbpeake
Post:
have you removed the max concurrent running tasks cap or is it still in place?

Maybe this is the source of my problem?
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4139&postid=34820#34820
36) Message boards : ATLAS application : No App config task limts (Message 34820)
Posted 30 Mar 2018 by Profile rbpeake
Post:
Thanks!
The problem I have is that 11 instances of ATLAS 2-core download, but then only 7 of them run. With my allocating 22 cores to LHC, the 4 ATLAS 2-cores that are there but not running prevent other LHC work units from running. So i an stuck with just 14 of my 22 cores running.
37) Message boards : ATLAS application : No App config task limts (Message 34814)
Posted 30 Mar 2018 by Profile rbpeake
Post:
[quote]
You could fix it for your machines with a small app_config.xml in the Project-Directory of LHCATHOME
<app_config>
 <app>
 <name>ATLAS</name>
 <max_concurrent>12</max_concurrent>
 </app>
 </app_config>

How do I combine this (or do I?) with the command to limit ATLAS to 2-cores per work unit?
Thanks!
38) Message boards : ATLAS application : Very long tasks in the queue (Message 34693)
Posted 17 Mar 2018 by Profile rbpeake
Post:
This idea to let 1000 Collisions running with Linux-native App is NOT against Windows, sorry.

There are more Computer with running Linux-App to do this heavy work.
David had a statistic for one week shown in this thread:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4396&postid=33036#33036

OK, I see your point. Whatever works best for the project is OK with me.
39) Message boards : ATLAS application : Very long tasks in the queue (Message 34688)
Posted 17 Mar 2018 by Profile rbpeake
Post:
The current ATLAS tasks process 100 events, but as an experiment we have sent some tasks with 1000 events. We would like to see if it's possible to run tasks like these on ATLAS@Home because this is the same number of events each task processes on the ATLAS grid. It would make things a lot easier if the same tasks could run on ATLAS@Home as on the rest of the ATLAS grid.

These tasks will run 10 times longer than the other tasks and will generate an output file 10 times as large (500MB), so this may be an issue for those of you with low upload bandwidth. The advantage is that the initial download of 200MB is the same. Obviously using more cores will be better for these tasks, so they finish in a reasonable time.

To know if you are running one of these tasks and that it's not a regular "longrunner" you can check the stderr.txt in the slots directory - if it shows "Starting ATLAS job. (PandaID=xxx taskID: taskID=10959636)" then you got one. The regular tasks have taskID=10947180.

Please let us know your opinion in general about the length and data in/out requirements of ATLAS tasks. They are usually much shorter than the other vbox LHC projects - is this a good thing or would you prefer more consistency among the projects?


This is the beginning of this thread from David!

What's about 1.000 events running in nativeApp for Linux only?
There is not so a overhead and is running very stable. In the preferences is a flag useful therefore.
Is there a second chance......

And I wish there were a way for ATLAS to identify the Windows machines that could process these successfully, and have an opt-in option for the user. I had no problems with these when they were available.
Thanks!
40) Message boards : ATLAS application : Performance 4-core VM versus 3-core VM (Message 34655)
Posted 15 Mar 2018 by Profile rbpeake
Post:
Is there a limit on the number of instances of Atlas tasks running on any particular machine?

I have 20 cores and was running five instances of 4-core Atlas tasks. In the interest of increasing efficiency, I switched to 2-core Atlas tasks. But instead of ten 2-core tasks running, I am seeing only seven 2-core Atlas tasks running. What happened?

Thanks!


Previous 20 · Next 20


©2024 CERN