Message boards :
Sixtrack Application :
"SixTrack Tasks NOT being distributed
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
This thread replaces, "Will not WorkUnits", "no new WUs", "All tasks are pending" and "260.000 WUs to send, but no handed out". It is for SixTrack. This is, as a minimum, to let you know we are taking this issue rather seriously and working on it to the best of our abilities. I have managed to reproduce this problem on my own home Windows 10 computer. So far I have been unable to identify the precise problem from the available log information. However while awaiting some expert help, I found that: A project reset does not help as already tried and reported by several volunteers. In my BOINC Manager I removed the project. NOTA BENE: this DESTROYS any unsent results or active tasks. ============================================================= It should really only be done when the Client is idle. I then re-installed the client/boinc manager from the WWW and/or desktop icon, with the "repair" rather than delete option. To my surprise the project was still there!!! AND I got a bunch of new tasks immediately. :-) I also tried just remove project, then add project BUT I then get password problems! I even see a password problem when I re-install. I have opened a ticket for that. Eric |
Send message Joined: 24 Oct 04 Posts: 1169 Credit: 54,227,367 RAC: 56,847 |
Hi Eric, Well I have 16 in progress right now and 11 still pending and 3 longer ones are *Validation inconclusive* https://lhcathome.cern.ch/lhcathome/results.php?userid=5472&offset=0&show_names=0&state=3&appid=1 Volunteer Mad Scientist For Life |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
I now have @home more than 50 tasks (4 active on my 4 thread system). I normally get just the 4. We shall see later if I get new work. But for now this is much better, when we have over 1.6 million queued. Eric. |
Send message Joined: 9 May 17 Posts: 1 Credit: 1,828,810 RAC: 89 |
This project is a joke 1.8 million tasks available and I can't get any. Wake up. Get your shit together. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Thank you for your feedback (even if I find your language somewhat forthright). I see you have 4 Windows 10 systems, of which 3 are indeed idle, and not getting SixTrack tasks, while the 4th system has 24! in progress, although it has just 4 processors! These 24 seem to have been sent/received at 21:55 UTC yesterday. While I consider this a very strange situation, I see a very marginal improvement. The details of this situation are valuable. Your (edited) message will be passed to my colleagues. Eric. |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 252,200,705 RAC: 133,489 |
On a host that successfully runs vbox WUs from CERN via separate client instances I attached a new instance to sixtrack yesterday evening. Unfortunately I ran into the same problem than many others as the project server reports: No tasks are available for SixTrack Is there still a problem on the server or is it a client misconfiguration? Host: https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10486310 |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 252,200,705 RAC: 133,489 |
Finally I got 2 WUs that were finished after 8 s and 12 s. And with the next request a few minutes later: Server error: feeder not running |
Send message Joined: 15 Jul 05 Posts: 247 Credit: 5,974,599 RAC: 0 |
Thanks for the update. We just restarted the daemons with increased weight for the Sixtrack application. The feeder needs some time to fill the shared memory buffer used by the scheduler with tasks. Hence during 5-10 minutes after a BOINC service restart, you will get this message "feeder not" running from the scheduler in spite of the fact that the feeder indeed is running. On Tuesday we increased the memory buffer size and Sixtrack weight to avoid exhausting the buffer of Sixtrack tasks, but in spite of this, the very short tasks are immediately sucked from the queue. Right now we tried to further increase the number of Sixtrack tasks in the buffer, but either the tasks are going out in seconds, or we have another problem. The bottom line is that you might get a few more of these messages "feeder not running" while we're working on this. |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 252,200,705 RAC: 133,489 |
OK. I got some WUs with the most recent request and they seem to run longer than a few seconds. Thank you. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Thanks for the input. Can you name iID the tasks to save time. I shall find them in the database shortly though. Eric. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
OK I found your Computer Linux ID 10486310 and a "short task". It is WU 71441414 and Task 147737902. The "sidekick" task in progress is 147737901. While waiting for the Task copy to finish and be validated or not, I am trying to run it myself. This is a big help. Thanks a lot and more news soonest. Eric. |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 252,200,705 RAC: 133,489 |
<edit> Sorry. I was too slow. :-) </edit> I'm not sure which ID you refer to. Therefore both lists. The very short WUs from the first request: https://lhcathome.cern.ch/lhcathome/result.php?resultid=147737882 https://lhcathome.cern.ch/lhcathome/result.php?resultid=147737902 w-c6_n4_lhc2016_40_MD-120-16-476-2.5-1.1282__43__s__64.31_59.32__14_15__6__46.5_1_sixvf_boinc38206_1 w-c6_n4_lhc2016_40_MD-120-16-476-2.5-1.1282__43__s__64.31_59.32__14_15__6__61.5_1_sixvf_boinc38216_1 App: SixTrack v451.07 (sse2) i686-pc-linux-gnu Next ones, some of them ran only a few minutes: https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744261 https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744263 https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744265 https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744281 https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744283 https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744285 https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744287 https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744342 https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744344 https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744346 https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744349 https://lhcathome.cern.ch/lhcathome/result.php?resultid=147744351 w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__7_8__6__70.5_1_sixvf_boinc25351_1 w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__7_8__6__72_1_sixvf_boinc25352_1 w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__7_8__6__73.5_1_sixvf_boinc25353_1 w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__7_8__6__85.5_1_sixvf_boinc25361_1 w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__7_8__6__87_1_sixvf_boinc25362_1 w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__7_8__6__88.5_1_sixvf_boinc25363_1 w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__8_9__6__1.5_1_sixvf_boinc25364_1 w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__8_9__6__43.5_1_sixvf_boinc25392_0 w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__8_9__6__45_1_sixvf_boinc25393_0 w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__8_9__6__46.5_1_sixvf_boinc25394_0 w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__8_9__6__48_1_sixvf_boinc25395_1 w-c2_n4_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__8_9__6__49.5_1_sixvf_boinc25396_1 App: SixTrack v451.07 (sse2) x86_64-pc-linux-gnu |
Send message Joined: 14 Jan 10 Posts: 1413 Credit: 9,434,983 RAC: 9,630 |
With only SixTrack selected in prefs, no VBox installed and >million in server queue request on machine 10362384 132091 LHC@home 22 Jun 10:08:00 Requesting new tasks for CPU 132092 LHC@home 22 Jun 10:08:01 Scheduler request completed: got 0 new tasks 132093 LHC@home 22 Jun 10:08:01 No tasks sent 132094 LHC@home 22 Jun 10:08:01 No tasks are available for SixTrack 132095 LHC@home 22 Jun 10:08:01 No tasks are available for sixtracktest 132096 LHC@home 22 Jun 10:08:01 Message from server: VirtualBox is not installed |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 252,200,705 RAC: 133,489 |
Similar result as described below when I attached my second host (also via extra client instance). The first 2 WUs had walltimes of 9 s and 15 s. hostID: 10486393 https://lhcathome.cern.ch/lhcathome/result.php?resultid=147758603 https://lhcathome.cern.ch/lhcathome/result.php?resultid=147758604 |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 252,200,705 RAC: 133,489 |
And again the feeder ... I know, I know, you're trying hardest to get that solved. |
Send message Joined: 14 Jan 10 Posts: 1413 Credit: 9,434,983 RAC: 9,630 |
@10:39:37 CEST: 132943 LHC@home 22 Jun 10:39:37 Scheduler request completed: got 30 new tasks |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Many thanks; I have managed to run the WU w-c6_n4_lhc2016_40_MD-120-16-476-2.5-1.1282__43__s__64.31_59.32__14_15__6__61.5_1_sixvf_boinc38216.zip which was still in the download database table. It appears to be a genuine results with apparently very unstable particles which are lost after less than 500 turns. Unlucky. I have looked another 60,000 or so cases in the same study and I found 20,000 with > 1000 sec CPU, 30,000 with > 100 sec, but 441 with less than 1 sec CPU tracking........... This last number 441 is a bit strange and requires further investigation, but otherwise the results seem consistent with a typical beam physics application. Eric. |
Send message Joined: 14 Jan 10 Posts: 1413 Credit: 9,434,983 RAC: 9,630 |
132943 LHC@home 22 Jun 10:39:37 Scheduler request completed: got 30 new tasks One time experience Unsent SixTracks 1747102. Again: 141806 LHC@home 22 Jun 15:58:46 Sending scheduler request: To fetch work. 141807 LHC@home 22 Jun 15:58:46 Requesting new tasks for CPU 141808 LHC@home 22 Jun 15:58:47 update requested by user 141809 LHC@home 22 Jun 15:58:49 Scheduler request completed: got 0 new tasks 141810 LHC@home 22 Jun 15:58:49 No tasks sent 141811 LHC@home 22 Jun 15:58:49 No tasks are available for SixTrack 141812 LHC@home 22 Jun 15:58:49 No tasks are available for sixtracktest 141813 LHC@home 22 Jun 15:58:49 Message from server: VirtualBox is not installed |
Send message Joined: 15 Jul 05 Posts: 247 Credit: 5,974,599 RAC: 0 |
I am afraid that you will get this message occasionally, even now that we have a scheduler buffer of 2000 tasks. The problem frequently occurs when there are batches of tasks with very short duration of a few seconds. Then the time for the feeder to query the DB and fill the buffer is simply too long. During normal operations with average task length, the buffer should now be large enough to avoid running out of Sixtrack tasks. We're looking into both how to optimize the BOINC server code and Sixtrack pre-processing, but both tasks are more long term than short term. |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 252,200,705 RAC: 133,489 |
Sounds good. Can you give some advice what task durations should be reported in the MB to get special attention of the admin team? There are long runners as well as short runners. |
©2024 CERN