Message boards : ATLAS application : never ending tasks here
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
tullio

Send message
Joined: 19 Feb 08
Posts: 607
Credit: 3,777,023
RAC: 820
Message 29122 - Posted: 9 Mar 2017, 9:45:06 UTC

On my Linux box I have a task at 99.996% still running after 96 hours. CPU usage is 170% and more. I keep it running because the two climateprediction.net tasks alongside it have a very extended deadline.
Tullio
ID: 29122 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 959
Credit: 6,336,723
RAC: 1,351
Message 29167 - Posted: 10 Mar 2017, 11:21:54 UTC - in response to Message 29121.  

On 9 Mar 2017 @ 8:56:10 UTC David wrote:
Short summary: the problem has been fixed but will take a few hours to propagate. If you keep the jobs running they will exit and you will get the credit.

Don't want to wait any longer. I killed 2 tasks almost running 1 day.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=124413276
https://lhcathome.cern.ch/lhcathome/result.php?resultid=124431714
ID: 29167 · Report as offensive     Reply Quote
Profile Saenger

Send message
Joined: 13 Jul 05
Posts: 64
Credit: 447,413
RAC: 45
Message 29819 - Posted: 4 Apr 2017, 15:12:03 UTC
Last modified: 4 Apr 2017, 15:12:34 UTC

My current task is running since ~112h on my penguin, on two of its four cores, and it's at 100% since some time last night (CEST). The last few percentage points from 99.95 to 99.999 were getting slower and s.l.o..w..e...r and s...l....o....w......e........r :)

The data from Eigenschaften are:
Ablaufdatum: Di 04 Apr 2017 18:57:50 CEST
Ressourcen: 2 CPUs
Geschätzter Berechnungsaufwand: 203580 GFLOPs
Prozessor-Zeit beim letzten Checkpoint: 204:35:11
Prozessorzeit:204:35:30
Bisherige Laufzeit: 111:46:50
Geschätzte verbleibende Zeit: ---
Fortschritt: 100,000 %
benötigter Arbeitsspeicher: 2,00 GB
Größe des Arbeitspakets: 3,52 GB


The WU is due in about 2h according to my BOINC, in a day according to the above linked WU-page. I'll probably keep it running for some time, but is there any estimate you could give me? And what will happen after either today or tomorrow 1657 UTC?
Grüße vom Sänger
ID: 29819 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 406
Credit: 96,116,916
RAC: 1
Message 29820 - Posted: 4 Apr 2017, 15:14:41 UTC

check Nr 16e from my checklist V3


Supporting BOINC, a great concept !
ID: 29820 · Report as offensive     Reply Quote
Profile Saenger

Send message
Joined: 13 Jul 05
Posts: 64
Credit: 447,413
RAC: 45
Message 29822 - Posted: 4 Apr 2017, 16:21:32 UTC - in response to Message 29820.  

I checked (with a little bit of problem, been quite some time since I did something like it last), and now it just finished and gave me 6,130.31 credits. I can live with that ;)
Grüße vom Sänger
ID: 29822 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1284
Credit: 23,108,244
RAC: 2,668
Message 29823 - Posted: 4 Apr 2017, 16:45:49 UTC - in response to Message 29822.  

... it just finished and gave me 6,130.31 credits. I can live with that ;)

this might even be some kind of record, I guess :-)
ID: 29823 · Report as offensive     Reply Quote
Profile Saenger

Send message
Joined: 13 Jul 05
Posts: 64
Credit: 447,413
RAC: 45
Message 30353 - Posted: 15 May 2017, 15:14:16 UTC - in response to Message 29819.  

My current task is running since ~112h on my penguin, on two of its four cores, and it's at 100% since some time last night (CEST). The last few percentage points from 99.95 to 99.999 were getting slower and s.l.o..w..e...r and s...l....o....w......e........r :)

Same again, this time it looks as if it just used 1 core instead of 2, as the data from Eigenschaften are:
Ablaufdatum: S0 21 Apr 2017 02:06:02 CEST
Ressourcen: 2 CPUs
Geschätzter Berechnungsaufwand: 43200 GFLOPs
Prozessor-Zeit beim letzten Checkpoint: 98:51:20
Prozessorzeit:98:51:48
Bisherige Laufzeit: 97:39:22
Geschätzte verbleibende Zeit: ---
Fortschritt: 100,000 %
benötigter Arbeitsspeicher: 1,620 GB
Größe des Arbeitspakets: 4,10 GB


Here are preliminary times (% vs. runtime):
80,306	07:30:08
86,452	09:15:51
98,761	20:36:38
99,263	23:04:06
99,757	28:20:04
99,868	31:16:01
99,932	34:23:25
99,946	35:27:00
99,956	36:30:29
99,986	41:48:39
99,990	43:54:30
99,996	48:08:12
99,998	53:08:50
99,999	56:30:06
100,000	59:27:53
100,000	64:38:31
100,000	69:36:15
100,000	73:39:01
100,000	77:44:33
100,000	82:58:11
100,000	89:42:34
100,000	98:41:13


I hope it will end as the other one with success and enough credits, I just wanted you to know.
Grüße vom Sänger
ID: 30353 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,120,806
RAC: 28
Message 30361 - Posted: 16 May 2017, 2:53:20 UTC - in response to Message 30353.  

this time it looks as if it just used 1 core instead of 2

I had 2 similar tasks this morning:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=140469886
https://lhcathome.cern.ch/lhcathome/result.php?resultid=140056494
I went to check using Alt-F2 in the VM console and saw that the tasks were not processing any events, so I aborted both of them.
I suggest you try the same.
We are the product of random evolution.
ID: 30361 · Report as offensive     Reply Quote
Terrible T

Send message
Joined: 1 Nov 05
Posts: 8
Credit: 596,413
RAC: 0
Message 30363 - Posted: 16 May 2017, 7:05:09 UTC

New record ?

WU 67789777 ran for nearly 24hrs, was about to abort but it suddenly decided to be finished..
Nice score though; 7,214.07 pts.
In logfile no "hits" , so wonder if any scientific value for this one
ID: 30363 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1284
Credit: 23,108,244
RAC: 2,668
Message 30368 - Posted: 16 May 2017, 10:39:25 UTC - in response to Message 30363.  

In logfile no "hits" , so wonder if any scientific value for this one

good question
ID: 30368 · Report as offensive     Reply Quote
Profile Saenger

Send message
Joined: 13 Jul 05
Posts: 64
Credit: 447,413
RAC: 45
Message 30370 - Posted: 16 May 2017, 14:55:33 UTC - in response to Message 29820.  

check Nr 16e from my checklist V3


I started the VM as an admin, and saw a stopped VM about the recent task. I started the VM and it obviously started just then, and now I'm at #17 from your checklist (after quite a lot of [ OK ] before).
I'll leave it open until it's done.
Grüße vom Sänger
ID: 30370 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 406
Credit: 96,116,916
RAC: 1
Message 30371 - Posted: 16 May 2017, 16:03:32 UTC - in response to Message 30370.  

check Nr 16e from my checklist V3


I started the VM as an admin, and saw a stopped VM about the recent task. I started the VM and it obviously started just then, and now I'm at #17 from your checklist (after quite a lot of [ OK ] before).
I'll leave it open until it's done.

Klick into the console window and try ALT/F2

It should bring you a screen that shows progress of the running tasks


Supporting BOINC, a great concept !
ID: 30371 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : ATLAS application : never ending tasks here


©2020 CERN