1) Message boards : Number crunching : Boinc memory estimate and LHC Settings (Message 35025)
Posted 16 Apr 2018 by Profile rbpeake
Post:
They should increase beyond 8 the maximum number of CPUs and the maximum number of work units as options to choose.

I can run 7 instances of 3-unit Atlas tasks on 21 cores. But I cannot run more than 8 2-core tasks on the same machine.
2) Message boards : ATLAS application : ATLAS Queue is empty (Message 35015)
Posted 14 Apr 2018 by Profile rbpeake
Post:
Seems like this is happening again.
3) Message boards : ATLAS application : new series of ATLAS tasks - runtime 493 secs ? (Message 34958)
Posted 11 Apr 2018 by Profile rbpeake
Post:
... Is it this statement that indicates success?
2018-04-11 01:37:31 (7056): Guest Log: Successfully finished the ATLAS job!

It depends on the perspective.
"Guest Log: Successfully finished the ATLAS job!" indicates a successful end of the BOINC task and your host will most likely get credits.

From the science perspective a file like "HITS.*" contains the results and should be included in the upload.
This name normally appears in the directory listing that is included in the stderr.txt and can be checked there.

Thanks! Found it noted in 4 places, including twice here:
2018-04-11 01:37:31 (7056): Guest Log: HITS.13684178._016629.pool.root.1 srm://srm.ndgf.org:8443;autodir=no;spacetoken=ATLASDATADISK/srm/managerv2?SFN=/atlas/disk/atlasdatadisk/rucio/mc16_13TeV/f3/ae/HITS.13684178._016629.pool.root.1:checksumtype=adler32:checksumvalue=b9f476d
4) Message boards : ATLAS application : new series of ATLAS tasks - runtime 493 secs ? (Message 34955)
Posted 11 Apr 2018 by Profile rbpeake
Post:
again I had an ATLAS task with runtime of about 10 minutes:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186994334

Very strange.
Even your valid tasks show incomplete logs:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186915983
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186900508
https://lhcathome.cern.ch/lhcathome/result.php?resultid=186900791

You may consider to stop your BOINC client, clean the VirtualBox environment and do a project reset before you start a fresh task.

Is it this statement that indicates success?
2018-04-11 01:37:31 (7056): Guest Log: Successfully finished the ATLAS job!
5) Message boards : CMS Application : Result stage-out failures (Message 34825)
Posted 31 Mar 2018 by Profile rbpeake
Post:

We're in a transitional stage at the moment, unfortunately with Easter looming, but the good bit of news I can impart is that as of a day or two ago we are writing merged result files into the T2_CH_CERN storage -- which means they are now available world-wide to anyone in CMS who wants to use them. *NOW* the challenge is to get workflows that suit our capabilities and limitations and produce results that end up in papers we can all point to with pride!

How were the results used before this?
Thanks!
6) Message boards : ATLAS application : Draining ATLAS tasks ahead of storage migration (Message 34824)
Posted 30 Mar 2018 by Profile rbpeake
Post:
have you removed the max concurrent running tasks cap or is it still in place?

Maybe this is the source of my problem?
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4139&postid=34820#34820
7) Message boards : ATLAS application : No App config task limts (Message 34820)
Posted 30 Mar 2018 by Profile rbpeake
Post:
Thanks!
The problem I have is that 11 instances of ATLAS 2-core download, but then only 7 of them run. With my allocating 22 cores to LHC, the 4 ATLAS 2-cores that are there but not running prevent other LHC work units from running. So i an stuck with just 14 of my 22 cores running.
8) Message boards : ATLAS application : No App config task limts (Message 34814)
Posted 30 Mar 2018 by Profile rbpeake
Post:
[quote]
You could fix it for your machines with a small app_config.xml in the Project-Directory of LHCATHOME
<app_config>
 <app>
 <name>ATLAS</name>
 <max_concurrent>12</max_concurrent>
 </app>
 </app_config>

How do I combine this (or do I?) with the command to limit ATLAS to 2-cores per work unit?
Thanks!
9) Message boards : ATLAS application : Very long tasks in the queue (Message 34693)
Posted 17 Mar 2018 by Profile rbpeake
Post:
This idea to let 1000 Collisions running with Linux-native App is NOT against Windows, sorry.

There are more Computer with running Linux-App to do this heavy work.
David had a statistic for one week shown in this thread:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4396&postid=33036#33036

OK, I see your point. Whatever works best for the project is OK with me.
10) Message boards : ATLAS application : Very long tasks in the queue (Message 34688)
Posted 17 Mar 2018 by Profile rbpeake
Post:
The current ATLAS tasks process 100 events, but as an experiment we have sent some tasks with 1000 events. We would like to see if it's possible to run tasks like these on ATLAS@Home because this is the same number of events each task processes on the ATLAS grid. It would make things a lot easier if the same tasks could run on ATLAS@Home as on the rest of the ATLAS grid.

These tasks will run 10 times longer than the other tasks and will generate an output file 10 times as large (500MB), so this may be an issue for those of you with low upload bandwidth. The advantage is that the initial download of 200MB is the same. Obviously using more cores will be better for these tasks, so they finish in a reasonable time.

To know if you are running one of these tasks and that it's not a regular "longrunner" you can check the stderr.txt in the slots directory - if it shows "Starting ATLAS job. (PandaID=xxx taskID: taskID=10959636)" then you got one. The regular tasks have taskID=10947180.

Please let us know your opinion in general about the length and data in/out requirements of ATLAS tasks. They are usually much shorter than the other vbox LHC projects - is this a good thing or would you prefer more consistency among the projects?


This is the beginning of this thread from David!

What's about 1.000 events running in nativeApp for Linux only?
There is not so a overhead and is running very stable. In the preferences is a flag useful therefore.
Is there a second chance......

And I wish there were a way for ATLAS to identify the Windows machines that could process these successfully, and have an opt-in option for the user. I had no problems with these when they were available.
Thanks!
11) Message boards : ATLAS application : Performance 4-core VM versus 3-core VM (Message 34655)
Posted 15 Mar 2018 by Profile rbpeake
Post:
Is there a limit on the number of instances of Atlas tasks running on any particular machine?

I have 20 cores and was running five instances of 4-core Atlas tasks. In the interest of increasing efficiency, I switched to 2-core Atlas tasks. But instead of ten 2-core tasks running, I am seeing only seven 2-core Atlas tasks running. What happened?

Thanks!
12) Questions and Answers : Windows : vm_image.vdi Not Being Deleted by VirtualBox When Task is Completed (Message 34577)
Posted 11 Mar 2018 by Profile rbpeake
Post:
On one of my machines, I have to manually delete the vm_image.vdi of those virtual machines that have completed, reported their work units, and shut down, but they still remain listed with a yellow exclamation point (!) on the virtual hard disk list. My other 2 machines do not have this problem, and only show active vm_image.vdi files.

Is there some setting I am missing?

Thanks!
13) Message boards : ATLAS application : Unable to upload an Atlas task (Message 34555)
Posted 6 Mar 2018 by Profile rbpeake
Post:
I have 10 waiting, some from as long as 2 days ago.

Here is the BOINC log:

3/6/2018 6:41:17 PM | LHC@home | Started upload of qzJODmxEjCsnSu7Ccp2YYBZmABFKDmABFKDmyQMKDmABFKDmUGlfOn_2_r1113909317_ATLAS_result
3/6/2018 6:41:17 PM | LHC@home | Started upload of vUaKDmlmrCsnSu7Ccp2YYBZmABFKDmABFKDmfhGKDmABFKDmEH2fzm_0_r963091061_ATLAS_result
3/6/2018 6:41:28 PM | LHC@home | [error] Error reported by file upload server: [vUaKDmlmrCsnSu7Ccp2YYBZmABFKDmABFKDmfhGKDmABFKDmEH2fzm_0_r963091061_ATLAS_result] locked by file_upload_handler PID=-1
3/6/2018 6:41:28 PM | LHC@home | Temporarily failed upload of vUaKDmlmrCsnSu7Ccp2YYBZmABFKDmABFKDmfhGKDmABFKDmEH2fzm_0_r963091061_ATLAS_result: transient upload error
3/6/2018 6:41:28 PM | LHC@home | Backing off 03:18:54 on upload of vUaKDmlmrCsnSu7Ccp2YYBZmABFKDmABFKDmfhGKDmABFKDmEH2fzm_0_r963091061_ATLAS_result
3/6/2018 6:41:29 PM | LHC@home | [error] Error reported by file upload server: [qzJODmxEjCsnSu7Ccp2YYBZmABFKDmABFKDmyQMKDmABFKDmUGlfOn_2_r1113909317_ATLAS_result] locked by file_upload_handler PID=-1
3/6/2018 6:41:29 PM | LHC@home | Temporarily failed upload of qzJODmxEjCsnSu7Ccp2YYBZmABFKDmABFKDmyQMKDmABFKDmUGlfOn_2_r1113909317_ATLAS_result: transient upload error
3/6/2018 6:41:29 PM | LHC@home | Backing off 05:35:19 on upload of qzJODmxEjCsnSu7Ccp2YYBZmABFKDmABFKDmyQMKDmABFKDmUGlfOn_2_r1113909317_ATLAS_result
14) Message boards : ATLAS application : Vboxwrapper Lost Communication with VirtualBox (Message 33474)
Posted 22 Dec 2017 by Profile rbpeake
Post:
Not sure how to fix this issue.

2017-12-22 10:00:19 (5180): Guest Log: ATHENA_PROC_NUMBER=4
2017-12-22 10:00:19 (5180): Guest Log: Starting ATLAS job. (PandaID=3761028309 taskID=12866455)
2017-12-22 10:57:25 (5180): ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time.
2017-12-22 10:57:25 (5180): Powering off VM.
2017-12-22 10:58:06 (5180): Successfully stopped VM.

Thanks!
15) Message boards : ATLAS application : Atlas tasks are failing, server status show also 0 tasks ready to send (Message 32025)
Posted 21 Aug 2017 by Profile rbpeake
Post:
Seems like a lot of validate errors.
16) Message boards : News : Deadline change for ATLAS jobs (Message 31974)
Posted 16 Aug 2017 by Profile rbpeake
Post:
There was discussion at some point of having a BOINC option to run longer work units that were being issued to the other computer centers. I guess there were some issues uncovered and it was never pursued further, but it worked well for me. It seemed like an efficient way to run.
17) Message boards : CMS Application : CMS Tasks Failing (Message 31290)
Posted 4 Jul 2017 by Profile rbpeake
Post:
This said Condor exited after 11117 seconds without running a job.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=150399989

The HTcondor update was around 1300 UTC, so it looks like you got caught across it. I'm not expert enough to speculate exactly what happened; I've been assured that any jobs caught up like that will be resubmitted, but it's a pity you didn't get credit for your CPU time.

No problem, thanks for the explanation!
18) Message boards : CMS Application : CMS Tasks Failing (Message 31288)
Posted 4 Jul 2017 by Profile rbpeake
Post:
This said Condor exited after 11117 seconds without running a job.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=150399989
19) Message boards : CMS Application : Possible disruption in the next several hours (Message 31266)
Posted 3 Jul 2017 by Profile rbpeake
Post:
Just curious, what happens if the CMS scientists have no tasks for us? Does the well run dry?
Thanks!
20) Message boards : ATLAS application : ATLAS Queue is empty (Message 31263)
Posted 3 Jul 2017 by Profile rbpeake
Post:
The Project Status Page Shows "0" for ATLAS (and some other projects as well).

What does this mean?

I see plenty of Atlas tasks now.


Next 20


©2018 CERN