log in

"SixTrack Tasks NOT being distributed


Advanced search

Message boards : Sixtrack Application : "SixTrack Tasks NOT being distributed

1 · 2 · 3 · 4 . . . 5 · Next
Author Message
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 836
Credit: 1,420,242
RAC: 1,081
Message 30902 - Posted: 20 Jun 2017, 12:37:47 UTC

This thread replaces, "Will not WorkUnits", "no new WUs",
"All tasks are pending" and "260.000 WUs to send, but no handed out".
It is for SixTrack. This is, as a minimum, to let you know we
are taking this issue rather seriously and working on it to the
best of our abilities.

I have managed to reproduce this problem on my own home Windows 10
computer. So far I have been unable to identify the precise
problem from the available log information. However while awaiting
some expert help, I found that:

A project reset does not help as already tried and reported by several volunteers.

In my BOINC Manager I removed the project.
NOTA BENE: this DESTROYS any unsent results or active tasks.
=============================================================
It should really only be done when the Client is idle.
I then re-installed the client/boinc manager
from the WWW and/or desktop icon, with the
"repair" rather than delete option.
To my surprise the project was still there!!!
AND I got a bunch of new tasks immediately. :-)

I also tried just remove project, then add project BUT I
then get password problems! I even see a password problem
when I re-install. I have opened a ticket for that.
Eric
____________

Profile MAGIC Quantum Mechanic
Avatar
Send message
Joined: 24 Oct 04
Posts: 494
Credit: 14,290,993
RAC: 12,147
Message 30908 - Posted: 20 Jun 2017, 14:52:04 UTC

Hi Eric,

Well I have 16 in progress right now and 11 still pending and 3 longer ones are *Validation inconclusive*

https://lhcathome.cern.ch/lhcathome/results.php?userid=5472&offset=0&show_names=0&state=3&appid=1
____________
Volunteer Mad Scientist For Life

Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 836
Credit: 1,420,242
RAC: 1,081
Message 30914 - Posted: 21 Jun 2017, 7:59:16 UTC

I now have @home more than 50 tasks (4 active on my 4 thread system).
I normally get just the 4. We shall see later if I get new work. But for now
this is much better, when we have over 1.6 million queued. Eric.
____________

Orange Kid
Send message
Joined: 9 May 17
Posts: 1
Credit: 54,767
RAC: 11
Message 30921 - Posted: 22 Jun 2017, 1:24:38 UTC

This project is a joke 1.8 million tasks available and I can't get any.

Wake up.

Get your shit together.

Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 836
Credit: 1,420,242
RAC: 1,081
Message 30922 - Posted: 22 Jun 2017, 3:51:53 UTC - in response to Message 30921.

Thank you for your feedback (even if I find your language somewhat
forthright). I see you have 4 Windows 10 systems, of which 3 are indeed
idle, and not getting SixTrack tasks, while the 4th system has 24! in progress,
although it has just 4 processors! These 24 seem to have been sent/received
at 21:55 UTC yesterday.

While I consider this a very strange situation, I see a very marginal improvement.
The details of this situation are valuable. Your (edited) message will be passed
to my colleagues. Eric.
____________

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,397,624
RAC: 3,695
Message 30924 - Posted: 22 Jun 2017, 4:44:52 UTC

On a host that successfully runs vbox WUs from CERN via separate client instances I attached a new instance to sixtrack yesterday evening.
Unfortunately I ran into the same problem than many others as the project server reports:

No tasks are available for SixTrack


Is there still a problem on the server or is it a client misconfiguration?

Host:
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10486310

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,397,624
RAC: 3,695
Message 30925 - Posted: 22 Jun 2017, 6:19:36 UTC - in response to Message 30924.

Finally I got 2 WUs that were finished after 8 s and 12 s.
And with the next request a few minutes later:

Server error: feeder not running

Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester
Send message
Joined: 15 Jul 05
Posts: 103
Credit: 788,939
RAC: 4,818
Message 30926 - Posted: 22 Jun 2017, 6:39:36 UTC - in response to Message 30925.

Thanks for the update.

We just restarted the daemons with increased weight for the Sixtrack application.

The feeder needs some time to fill the shared memory buffer used by the scheduler with tasks. Hence during 5-10 minutes after a BOINC service restart, you will get this message "feeder not" running from the scheduler in spite of the fact that the feeder indeed is running.

On Tuesday we increased the memory buffer size and Sixtrack weight to avoid exhausting the buffer of Sixtrack tasks, but in spite of this, the very short tasks are immediately sucked from the queue.

Right now we tried to further increase the number of Sixtrack tasks in the buffer, but either the tasks are going out in seconds, or we have another problem.

The bottom line is that you might get a few more of these messages "feeder not running" while we're working on this.

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,397,624
RAC: 3,695
Message 30927 - Posted: 22 Jun 2017, 6:44:18 UTC - in response to Message 30926.

OK.
I got some WUs with the most recent request and they seem to run longer than a few seconds.
Thank you.

Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 836
Credit: 1,420,242
RAC: 1,081
Message 30928 - Posted: 22 Jun 2017, 6:54:49 UTC - in response to Message 30925.

Thanks for the input. Can you name iID the tasks to save time.
I shall find them in the database shortly though. Eric.
____________

Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 836
Credit: 1,420,242
RAC: 1,081
Message 30929 - Posted: 22 Jun 2017, 7:10:58 UTC - in response to Message 30928.

OK I found your Computer Linux ID 10486310 and a "short task".
It is WU 71441414 and Task 147737902. The "sidekick" task
in progress is 147737901.
While waiting for the Task copy to finish and be validated
or not, I am trying to run it myself.

This is a big help. Thanks a lot and more news soonest. Eric.
____________

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,397,624
RAC: 3,695
Message 30930 - Posted: 22 Jun 2017, 7:22:01 UTC - in response to Message 30928.
Last modified: 22 Jun 2017, 7:23:40 UTC

<edit> Sorry. I was too slow. :-) </edit>

I'm not sure which ID you refer to.
Therefore both lists.

The very short WUs from the first request:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=147737882
https://lhcathome.cern.ch/lhcathome/result.php?resultid=147737902

w-c6_n4_lhc2016_40_MD-120-16-476-2.5-1.1282__43__s__64.31_59.32__14_15__6__46.5_1_sixvf_boinc38206_1
w-c6_n4_lhc2016_40_MD-120-16-476-2.5-1.1282__43__s__64.31_59.32__14_15__6__61.5_1_sixvf_boinc38216_1

App:
SixTrack v451.07 (sse2) i686-pc-linux-gnu



Next ones, some of them ran only a few minutes:


https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744261
https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744263
https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744265
https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744281
https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744283
https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744285
https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744287
https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744342
https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744344
https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744346
https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744349
https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744351


w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__7_8__6__70.5_1_sixvf_boinc25351_1
w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__7_8__6__72_1_sixvf_boinc25352_1
w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__7_8__6__73.5_1_sixvf_boinc25353_1
w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__7_8__6__85.5_1_sixvf_boinc25361_1
w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__7_8__6__87_1_sixvf_boinc25362_1
w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__7_8__6__88.5_1_sixvf_boinc25363_1
w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__8_9__6__1.5_1_sixvf_boinc25364_1
w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__8_9__6__43.5_1_sixvf_boinc25392_0
w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__8_9__6__45_1_sixvf_boinc25393_0
w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__8_9__6__46.5_1_sixvf_boinc25394_0
w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__8_9__6__48_1_sixvf_boinc25395_1
w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__8_9__6__49.5_1_sixvf_boinc25396_1

App:
SixTrack v451.07 (sse2) x86_64-pc-linux-gnu

Crystal Pellet
Volunteer moderator
Volunteer tester
Send message
Joined: 14 Jan 10
Posts: 328
Credit: 2,768,172
RAC: 2,981
Message 30931 - Posted: 22 Jun 2017, 8:13:38 UTC
Last modified: 22 Jun 2017, 8:14:45 UTC

With only SixTrack selected in prefs, no VBox installed and >million in server queue request on machine 10362384

132091 LHC@home 22 Jun 10:08:00 Requesting new tasks for CPU
132092 LHC@home 22 Jun 10:08:01 Scheduler request completed: got 0 new tasks
132093 LHC@home 22 Jun 10:08:01 No tasks sent
132094 LHC@home 22 Jun 10:08:01 No tasks are available for SixTrack
132095 LHC@home 22 Jun 10:08:01 No tasks are available for sixtracktest
132096 LHC@home 22 Jun 10:08:01 Message from server: VirtualBox is not installed

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,397,624
RAC: 3,695
Message 30933 - Posted: 22 Jun 2017, 8:25:01 UTC

Similar result as described below when I attached my second host (also via extra client instance).

The first 2 WUs had walltimes of 9 s and 15 s.

hostID: 10486393


https://lhcathome.cern.ch/lhcathome/result.php?resultid=147758603
https://lhcathome.cern.ch/lhcathome/result.php?resultid=147758604

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,397,624
RAC: 3,695
Message 30934 - Posted: 22 Jun 2017, 8:31:21 UTC

And again the feeder ...

I know, I know, you're trying hardest to get that solved.

Crystal Pellet
Volunteer moderator
Volunteer tester
Send message
Joined: 14 Jan 10
Posts: 328
Credit: 2,768,172
RAC: 2,981
Message 30935 - Posted: 22 Jun 2017, 8:42:20 UTC

@10:39:37 CEST:

132943 LHC@home 22 Jun 10:39:37 Scheduler request completed: got 30 new tasks

Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 12 Jul 11
Posts: 836
Credit: 1,420,242
RAC: 1,081
Message 30940 - Posted: 22 Jun 2017, 10:18:08 UTC - in response to Message 30933.

Many thanks; I have managed to run the WU

w-c6_n4_lhc2016_40_MD-120-16-476-2.5-1.1282__43__s__64.31_59.32__14_15__6__61.5_1_sixvf_boinc38216.zip

which was still in the download database table.
It appears to be a genuine results with apparently
very unstable particles which are lost after less than
500 turns. Unlucky.

I have looked another 60,000 or so cases in the same study
and I found 20,000 with > 1000 sec CPU, 30,000 with > 100 sec,
but 441 with less than 1 sec CPU tracking...........
This last number 441 is a bit strange and requires further investigation, but
otherwise the results seem consistent with a typical beam
physics application. Eric.
____________

Crystal Pellet
Volunteer moderator
Volunteer tester
Send message
Joined: 14 Jan 10
Posts: 328
Credit: 2,768,172
RAC: 2,981
Message 30945 - Posted: 22 Jun 2017, 14:02:23 UTC - in response to Message 30935.

132943 LHC@home 22 Jun 10:39:37 Scheduler request completed: got 30 new tasks

One time experience Unsent SixTracks 1747102. Again:

141806 LHC@home 22 Jun 15:58:46 Sending scheduler request: To fetch work.
141807 LHC@home 22 Jun 15:58:46 Requesting new tasks for CPU
141808 LHC@home 22 Jun 15:58:47 update requested by user
141809 LHC@home 22 Jun 15:58:49 Scheduler request completed: got 0 new tasks
141810 LHC@home 22 Jun 15:58:49 No tasks sent
141811 LHC@home 22 Jun 15:58:49 No tasks are available for SixTrack
141812 LHC@home 22 Jun 15:58:49 No tasks are available for sixtracktest
141813 LHC@home 22 Jun 15:58:49 Message from server: VirtualBox is not installed

Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester
Send message
Joined: 15 Jul 05
Posts: 103
Credit: 788,939
RAC: 4,818
Message 30946 - Posted: 22 Jun 2017, 14:10:46 UTC - in response to Message 30945.

I am afraid that you will get this message occasionally, even now that we have a scheduler buffer of 2000 tasks.

The problem frequently occurs when there are batches of tasks with very short duration of a few seconds. Then the time for the feeder to query the DB and fill the buffer is simply too long.

During normal operations with average task length, the buffer should now be large enough to avoid running out of Sixtrack tasks.

We're looking into both how to optimize the BOINC server code and Sixtrack pre-processing, but both tasks are more long term than short term.

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,397,624
RAC: 3,695
Message 30947 - Posted: 22 Jun 2017, 14:20:36 UTC - in response to Message 30946.

Sounds good.

Can you give some advice what task durations should be reported in the MB to get special attention of the admin team?
There are long runners as well as short runners.

1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Sixtrack Application : "SixTrack Tasks NOT being distributed