Message boards : ATLAS application : Huge input file!
Message board moderation

To post messages, you must log in.

AuthorMessage
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 410
Credit: 3,087,527
RAC: 1,056
Message 32637 - Posted: 5 Oct 2017, 9:53:07 UTC
Last modified: 5 Oct 2017, 17:23:25 UTC

For a task from the batch mc16_13TeV AZNLOCTEQ6L1_Ztautau.simul (12236583), I got a huge input file: 757.812.338 bytes.
The job needs normal run time for a dual-core VM.

No problem for my bandwidth
LHC@home 05 Oct 11:11:17 Started download of jf_f6dcceffbec93fe20e79c5f21f22f9ee
LHC@home 05 Oct 11:12:37 Finished download of jf_f6dcceffbec93fe20e79c5f21f22f9ee


and no data limit, but could be an issue for others, when they have to download such files for many ATLAS-tasks.

Edit: The run time will be longer, cause the job has more than 50 events, probably 100 events.
Edit2: >100 events; guessing how many . . .
Edit3: >200 events; guessing how many . . .
ID: 32637 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 104
Credit: 1,277,854
RAC: 2,894
Message 32639 - Posted: 5 Oct 2017, 19:57:41 UTC - in response to Message 32637.  

ID: 32639 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 120
Credit: 7,130,926
RAC: 4,145
Message 32641 - Posted: 6 Oct 2017, 5:41:00 UTC

I got 2 of those:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=158597383
https://lhcathome.cern.ch/lhcathome/result.php?resultid=158522054
They have been running for more than 2 days now, and each has processed more than 600 events so far.
We are the product of random evolution.
ID: 32641 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 410
Credit: 3,087,527
RAC: 1,056
Message 32644 - Posted: 6 Oct 2017, 8:09:13 UTC - in response to Message 32641.  

They have been running for more than 2 days now, and each has processed more than 600 events so far.

From the links gyllic mentioned (thanks for those links), I conclude that there are 1000 events in one job.

Used 6000 / finished 6 = 1000 events

How many cores your ATLAS-VM have?
How do you know yours are over 600 after 2 days of running?

In my console I only see the last events of yesterday. The today's events are in front of that and can't be shown.
At midnight my total events number was 256 (2 cores together), but I don't know how far they are now.
ID: 32644 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 410
Credit: 3,087,527
RAC: 1,056
Message 32646 - Posted: 6 Oct 2017, 9:23:57 UTC - in response to Message 32644.  
Last modified: 6 Oct 2017, 9:35:51 UTC

How do you know yours are over 600 after 2 days of running?

I found a manner myself:

Lock the screen output with the Lock-key for about 1 minute and then Release and Lock again very quickly after each other.
When you're lucky you'll see the most recent events.

327 events done from 1000 - athena's run times 818 minutes.
ID: 32646 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 410
Credit: 3,087,527
RAC: 1,056
Message 32649 - Posted: 6 Oct 2017, 12:28:23 UTC - in response to Message 32646.  
Last modified: 6 Oct 2017, 12:35:39 UTC

Finally it's all for nothing. BOINC aborted the task cause too less disk space was reserved.

LHC@home 06 Oct 14:08:45 Aborting task h8qMDmrtLJrnSu7Ccp2YYBZmABFKDmABFKDmxSLKDmABFKDmN0WMJn_2: exceeded disk limit: 5726.07MB > 5722.05MB

The rsc_disk_bound of 6000000000 bytes is too low for these types of tasks,
specially when a user sometimes has to suspend the task and therefore a snapshot will be written into the slot directory.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=158726637
ID: 32649 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 535
Credit: 4,243,328
RAC: 6,306
Message 32650 - Posted: 6 Oct 2017, 12:49:40 UTC - in response to Message 32649.  

Finally it's all for nothing. BOINC aborted the task cause too less disk space was reserved.

this is really annoying :-(

The rsc_disk_bound of 6000000000 bytes is too low for these types of tasks

just out of curiosity: where did you see this figure? I checked your STDERR text, either I overlooked it, or it's not written there and you know that from somewhere else.
ID: 32650 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 410
Credit: 3,087,527
RAC: 1,056
Message 32651 - Posted: 6 Oct 2017, 13:23:45 UTC - in response to Message 32650.  

The rsc_disk_bound of 6000000000 bytes is too low for these types of tasks

just out of curiosity: where did you see this figure? I checked your STDERR text, either I overlooked it, or it's not written there and you know that from somewhere else.

You can find that in the workunit info part of client_state.xml or in sched_reply_lhcathome.cern.ch_lhcathome.xml created when a requested task is sent to you.
ID: 32651 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 238
Credit: 6,943,185
RAC: 12,504
Message 32654 - Posted: 6 Oct 2017, 14:05:07 UTC

What do these tasks show as 'Task size' if you right-click the task in Boinc and view Properties? My normal tasks show 43200 GFLOPs.
ID: 32654 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 535
Credit: 4,243,328
RAC: 6,306
Message 32656 - Posted: 6 Oct 2017, 16:32:05 UTC - in response to Message 32651.  

You can find that in the workunit info part of client_state.xml or in sched_reply_lhcathome.cern.ch_lhcathome.xml created when a requested task is sent to you.

thanks for the Information.
So this is something which comes as a firm value determined by the server.
ID: 32656 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 120
Credit: 7,130,926
RAC: 4,145
Message 32662 - Posted: 7 Oct 2017, 4:37:28 UTC - in response to Message 32644.  
Last modified: 7 Oct 2017, 4:41:33 UTC

How many cores your ATLAS-VM have?
How do you know yours are over 600 after 2 days of running?

The lines showing the progress of the events appear to be sorted, so after one day one can only see the events that were calculated shortly before midnight.
In the window I saw an event number above 300, and since I have 2 cores on the task, I concluded that at least 600 events had been calculated.

Both tasks are still running, more than 3 days now, athena.py CPU time at 4000 minutes.
We are the product of random evolution.
ID: 32662 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 120
Credit: 7,130,926
RAC: 4,145
Message 32743 - Posted: 10 Oct 2017, 2:12:22 UTC - in response to Message 32662.  

Both tasks are still running, more than 3 days now, athena.py CPU time at 4000 minutes.

Both tasks failed due to lack of disk space as well, but the failure occurred exactly at the time of resuming the task. If you do not suspend it, it seems that the task will continue for ever without failing.
We are the product of random evolution.
ID: 32743 · Report as offensive     Reply Quote

Message boards : ATLAS application : Huge input file!


©2018 CERN