Message boards : ATLAS application : Task processing slowing down considerably beyond ~85% progress
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1815
Credit: 118,666,994
RAC: 39,476
Message 39821 - Posted: 4 Sep 2019, 11:49:45 UTC
Last modified: 4 Sep 2019, 11:52:45 UTC

On my i7-4930k (6+6H cores), @ 3.6GHz, I have running 4 2-core ATLAS tasks. When they started yesterday, the runtime prediction was about 13 hours.
What I noticed this morning was that meanwhile more than 29 hours have passed, and the progress was at about 86%; the remaining time is shown as about 3 1/2hours.
Now, some 5 hour later, the progress is at about 90%, and the remaining time is shown as about 3 hours!
Watching the progress percentage, I see that processing obviously has become awfully slow. It takes the value of the third digit right from the comma (i.e. the 1/1000th percent digit) about 6-7 seconds to move ahead.
With this, I guess the remaining time will not be 3 hours, but probably a manyfold of it.
Is this normal behaviour?
I had crunched many ATLAS tasks before, but this is new to me.
ID: 39821 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1815
Credit: 118,666,994
RAC: 39,476
Message 39823 - Posted: 4 Sep 2019, 14:11:18 UTC

a close look at the tasks within this afternoon shows that currently the progress is about 1% per hour.
From all my past experience with ATLAS tasks, I can say that this is totally unusual. Something must be wrong with these tasks. I am not even sure whether I should abort them, as I suspect that - if they besome even slower and slower - they will not finish in time (Sept. 6th).

Is anyone making the same experience?
ID: 39823 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2534
Credit: 254,129,145
RAC: 53,939
Message 39824 - Posted: 4 Sep 2019, 15:25:56 UTC - in response to Message 39823.  

You may check the output at console 2.
It shows how many events are already processed and how many seconds (average) your computer needs to process a single event.

The process bar of your BOINC client is not reliable, especially as ATLAS sends out different types of jobs.
ID: 39824 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 728
Credit: 49,141,815
RAC: 29,746
Message 39825 - Posted: 4 Sep 2019, 15:26:44 UTC - in response to Message 39823.  
Last modified: 4 Sep 2019, 15:27:09 UTC

ID: 39825 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1815
Credit: 118,666,994
RAC: 39,476
Message 39827 - Posted: 4 Sep 2019, 15:50:39 UTC - in response to Message 39824.  

You may check the output at console 2.
It shows how many events are already processed and how many seconds (average) your computer needs to process a single event.
the tasks show about 52 processed events. Processing time about 2000 secs (which is awfully high, compared to what it was informer days).
As I remember from before, there were tasks with 100 events and tasks with 200 events. Any idea how many events the current tasks have? Should they have 200, in my case the leadtime (Sept. 6th) would definitely be exceeded.
ID: 39827 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1419
Credit: 9,474,701
RAC: 2,980
Message 39828 - Posted: 4 Sep 2019, 17:11:27 UTC - in response to Message 39827.  

You may check the output at console 2.
It shows how many events are already processed and how many seconds (average) your computer needs to process a single event.
the tasks show about 52 processed events. Processing time about 2000 secs (which is awfully high, compared to what it was informer days).
As I remember from before, there were tasks with 100 events and tasks with 200 events. Any idea how many events the current tasks have? Should they have 200, in my case the leadtime (Sept. 6th) would definitely be exceeded.
You told, you were running 2 core tasks, so you have to add the last 2 events together.
The total events are 200 at the moment, so you are a little past midway.
ID: 39828 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1815
Credit: 118,666,994
RAC: 39,476
Message 39831 - Posted: 5 Sep 2019, 5:31:48 UTC - in response to Message 39828.  

okay, the first one just finished after 47 hours and succesfully produced a HITS file :-)
The other three tasks are at round 75% progress (as seen in console2).
ID: 39831 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 39837 - Posted: 6 Sep 2019, 8:45:37 UTC - in response to Message 39831.  

The current tasks still process 200 events but they are rather heavy on CPU due to the more complex physics involved. The average CPU time per event is roughly 3 times higher than the tasks we had a few weeks ago. So don't give up on them if they are still crunching!
ID: 39837 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1815
Credit: 118,666,994
RAC: 39,476
Message 39842 - Posted: 6 Sep 2019, 11:01:04 UTC - in response to Message 39837.  

So don't give up on them if they are still crunching!
no, I didn't give up :-)
All 4 got finished okay already; right now, besides 2 Theory, another ATLAS task is running.
ID: 39842 · Report as offensive     Reply Quote
Filipe

Send message
Joined: 9 Aug 05
Posts: 36
Credit: 7,698,293
RAC: 0
Message 39844 - Posted: 7 Sep 2019, 9:02:12 UTC

I have one wu sitting at 100% complete but still running.... 32 hours now.
ID: 39844 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2243
Credit: 173,902,375
RAC: 1,652
Message 39845 - Posted: 7 Sep 2019, 9:42:20 UTC - in response to Message 39844.  

Filipe,
when you click on your Atlas-task in Boinc-manager - Show VM-Console - Open the RDP - with F2 you can see how many Collisions
your Task had made so long. 200 is the max.
There are tasks running for the moment with more than one Day when you use 4 CPU's.
ID: 39845 · Report as offensive     Reply Quote
Filipe

Send message
Joined: 9 Aug 05
Posts: 36
Credit: 7,698,293
RAC: 0
Message 39846 - Posted: 7 Sep 2019, 10:15:50 UTC
Last modified: 7 Sep 2019, 10:18:49 UTC

I am running 2-core tasks.

VM-Console doesn't open anymore. i get an error message when a try to open it. But maybe because it shows 100% complete?

It has been at 100% for more than 12 hours now. Total elapsed time now 34hours. still running.
ID: 39846 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2243
Credit: 173,902,375
RAC: 1,652
Message 39847 - Posted: 7 Sep 2019, 11:20:40 UTC - in response to Message 39846.  

If the vboxheadless.exe from this task in Taskmanager show no CPU-use, save the Boinc-Manager if possible and start one's more.
To understand how Atlas is working see Yeti's-Checklist in the Atlas-Forum.
ID: 39847 · Report as offensive     Reply Quote
Filipe

Send message
Joined: 9 Aug 05
Posts: 36
Credit: 7,698,293
RAC: 0
Message 39849 - Posted: 7 Sep 2019, 12:48:49 UTC

@maeax: It was thanks to Yeti's check list that i manage to have Atlas VM's running on my computer.

I saw your tasks run for +/- 40 cpu hours. How many cpu-cores are you using on each task? is it 4?

i'm running 2 cores-tasks.
ID: 39849 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2243
Credit: 173,902,375
RAC: 1,652
Message 39850 - Posted: 7 Sep 2019, 14:00:37 UTC - in response to Message 39849.  

Have 4, 5 and 6 CPU's using for Atlas.
ID: 39850 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2243
Credit: 173,902,375
RAC: 1,652
Message 39867 - Posted: 8 Sep 2019, 16:56:37 UTC

Server-Status shows 22.48 Hours for the Duration of the last 100 Tasks of Atlas.
ID: 39867 · Report as offensive     Reply Quote
Filipe

Send message
Joined: 9 Aug 05
Posts: 36
Credit: 7,698,293
RAC: 0
Message 39873 - Posted: 8 Sep 2019, 20:34:57 UTC

Mine is still running after 68 hours...

Is there a wall-clock time to worry about?
ID: 39873 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 847
Credit: 692,005,655
RAC: 114,004
Message 39874 - Posted: 9 Sep 2019, 6:13:29 UTC

I abort mine if they run pass the deadline.
ID: 39874 · Report as offensive     Reply Quote
Filipe

Send message
Joined: 9 Aug 05
Posts: 36
Credit: 7,698,293
RAC: 0
Message 39876 - Posted: 9 Sep 2019, 15:42:30 UTC

Finished and validates after running for 84 hours!
ID: 39876 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2243
Credit: 173,902,375
RAC: 1,652
Message 39877 - Posted: 9 Sep 2019, 15:56:07 UTC - in response to Message 39876.  

Now you can run 4 or more cores instead of 2. :-)
ID: 39877 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : ATLAS application : Task processing slowing down considerably beyond ~85% progress


©2024 CERN