Message boards : ATLAS application : Tasks of batch 12577096 have 200 Events
Message board moderation

To post messages, you must log in.

AuthorMessage
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 515
Credit: 3,552,085
RAC: 2,333
Message 33066 - Posted: 15 Nov 2017, 12:06:50 UTC

Tasks of batch mc16_13TeV 140_CVetoBVeto.simul (12577096) have 200 Events.
ID: 33066 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 508
Credit: 16,122,715
RAC: 19,984
Message 33068 - Posted: 15 Nov 2017, 20:11:33 UTC

To do more work than 25 Events in one task is the old discussion to make it possible in the preferences.
Native app is no problem to do more work than 25 Events.
Have tasks with 100 Events (also in Windows).
ID: 33068 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 200
Credit: 5,197,639
RAC: 9,022
Message 33069 - Posted: 16 Nov 2017, 8:25:44 UTC - in response to Message 33066.  

Tasks of batch mc16_13TeV 140_CVetoBVeto.simul (12577096) have 200 Events.


The previous task 12515739 had 50 events and the WU were finishing very quickly, so the efficiency was not so good. So we asked for more events in the new tasks in order to have longer WU. This means the overall data to download is lower but you have to upload 200MB at the end of each WU.
ID: 33069 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 508
Credit: 16,122,715
RAC: 19,984
Message 33074 - Posted: 17 Nov 2017, 15:08:19 UTC

Task mc16_13TeV 140_CVetoBVeto.simul (12577096): 11007/37300

David,

is it on a good way with Simulation 12577096 for the most Volunteers?

For my Computers (allways the same configuration) the Cobblestones are growing now.
ID: 33074 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 200
Credit: 5,197,639
RAC: 9,022
Message 33139 - Posted: 26 Nov 2017, 11:54:22 UTC
Last modified: 26 Nov 2017, 11:54:44 UTC

I've added info on the number of events per WU to the task info on http://atlasathome.cern.ch/
ID: 33139 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 515
Credit: 3,552,085
RAC: 2,333
Message 33140 - Posted: 26 Nov 2017, 13:30:40 UTC - in response to Message 33139.  

I've added info on the number of events per WU to the task info on http://atlasathome.cern.ch/

Thanks David!
ID: 33140 · Report as offensive     Reply Quote
AuxRx

Send message
Joined: 16 Sep 17
Posts: 86
Credit: 936,144
RAC: 7,297
Message 33247 - Posted: 8 Dec 2017, 14:43:06 UTC - in response to Message 33139.  

I noticed I would miss the progress bars yesterday. Since the website is being overhauled, could this information be integrated into atlas_job.php as to make all information available on one page?

We at least need a sticky with all project related links so new users can find this "hidden" information.
ID: 33247 · Report as offensive     Reply Quote
csbyseti

Send message
Joined: 6 Jul 17
Posts: 14
Credit: 4,005,374
RAC: 0
Message 33297 - Posted: 13 Dec 2017, 8:53:42 UTC

it will be fine for me if the Atlas Task will be as big as possible.
The reason is that Atlas (or LHC) is a SSD killer.
At this Machine a Ryzen 1700 32GB Ram and 850 Evo 250GB only for Boinc i got 0,7 TBW on the System Disk and 20,5 TBW on the Boinc Disk.
It was build first week of August 2017 and not all the time running LHC Work Units.
So the Warranty value of 80TBW will be reached in 24 Month or earlier.
The small work Units will run only 1 hour (3 cores) and often produce a big 200MB download.
With 5 Task running at the same time more than 1GB in download are written in 1 hour.

Bigger WU's will reduce the amount of download written very much.
ID: 33297 · Report as offensive     Reply Quote
AuxRx

Send message
Joined: 16 Sep 17
Posts: 86
Credit: 936,144
RAC: 7,297
Message 33308 - Posted: 13 Dec 2017, 12:00:06 UTC - in response to Message 33297.  

I am concerned longer WUs (and some 200 event WUs already fall into this 8h+ category) will increase the risk of failing WUs. As long as WUs cannot be stopped/continued reliably and disconnects kill a WU (I am disconnected every 24h) I would ask not to increase run time. In it's current form failing a WU can cost a third of my systems daily run time.

In mitigation of the TBW issue I would recommend using HDD space where available and increasing checkpoint time. With each event taking around 4 minutes to compute, save points can be spread out even further. AFAIK not downloads of 200MB but compute data exceeding 4GB per VM is the underlying cause. Maybe increasing core count per task could further lighten the load on storage.

Please correct me, if I am misinterpreting the numbers.
ID: 33308 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 870
Credit: 6,479,875
RAC: 10,574
Message 33310 - Posted: 13 Dec 2017, 12:53:38 UTC - in response to Message 33308.  

The reason is that Atlas (or LHC) is a SSD killer.

In mitigation of the TBW issue I would recommend using HDD space where available

That's why on all PCs with which I am running various LHC projects I have a HDD (besides the actual system SSD), in one case even an external USB-3 external HDD, in order to avoid this problem.

The other day, for example, I noticed very late that due to a server problem, CMS tasks only ran for about 10-12 minutes and then finished unsuccessfully, each time building an image vdi of about 3 GB. If you run serveral such tasks concurrently, and this goes like this for 20 or 30 hours, you can imagine what this means in terms of TBW.
ID: 33310 · Report as offensive     Reply Quote

Message boards : ATLAS application : Tasks of batch 12577096 have 200 Events


©2018 CERN