Message boards :
Number crunching :
Validation Pendind since 02.JUN.2019
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Jul 05 Posts: 3 Credit: 4,932,597 RAC: 0 |
Hello. I have about 380 jobs "Validation Pending" since 02-JUN-2019. Validator server tells than it hasn't jobs ti process, so ¿Is it normal? Greetings |
Send message Joined: 15 Jun 08 Posts: 2425 Credit: 227,475,238 RAC: 129,792 |
Since your computers are hidden nobody can look into the logs to give you specific help. You may make them visible for other volunteers at this page: https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project |
Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0 |
Hello, jokerdm, without having access to the list of tasks crunched by your hosts, I cannot tell much. Please anyway keep in mind that, in presence of such a long backlog, in case two tasks out of the same WU cannot be validated, it will take quite some time before the third one is sent out and crunched. This might be at the origin of your (not yet) validated tasks. Hope it helps, Cheers, A. |
Send message Joined: 14 Jan 10 Posts: 1286 Credit: 8,515,990 RAC: 2,442 |
Hope it helps,What surely would help, is when a 'resend' (3rd, 4th wingman) is needed, that special created task is placed in front of the queue and not at the end. This is normal BOINC-practice, but not at LHC. Example workunit https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=116328856 232058584 10409137 12 Jun 2019, 9:39:27 UTC 14 Jun 2019, 2:06:26 UTC Completed, waiting for validation 7,291.11 7,283.58 pending SixTrack v502.05 (sse2) x86_64-pc-linux-gnu 232058585 10589358 12 Jun 2019, 9:31:41 UTC 23 Jun 2019, 1:41:11 UTC Not started by deadline - canceled 0.00 0.00 --- SixTrack v502.05 (sse2) windows_x86_64 233682174 10452031 26 Jun 2019, 11:09:54 UTC 3 Jul 2019, 6:49:49 UTC Not started by deadline - canceled 0.00 0.00 --- SixTrack v502.05 (avx) windows_intelx86 236591721 --- --- --- Unsent --- --- --- --- That last Unsent task is directly created after the second 'No reply' on 3 Jul 2019, 6:49:54 UTC, but not sent as soon as a client user is requesting new work. |
Send message Joined: 14 Jul 05 Posts: 3 Credit: 4,932,597 RAC: 0 |
Hi, Thanks for the feedback, i change it, now is shown. |
Send message Joined: 14 Jul 05 Posts: 3 Credit: 4,932,597 RAC: 0 |
Hello, Mistery solved, that´s what happens to my "Validation pending WUs" The elder one shows: Task click for details Computer Sent Time reported or deadline explain Status Run time (sec) CPU time (sec) Credit Application 230325739 10529993 1 Jun 2019, 7:45:14 UTC 2 Jun 2019, 8:24:22 UTC Completed, waiting for validation 53,767.04 53,673.03 pending SixTrack v502.05 (sse2) x86_64-pc-linux-gnu 230325740 10589829 1 Jun 2019, 7:55:56 UTC 7 Jun 2019, 23:29:05 UTC Not started by deadline - canceled 0.00 0.00 --- SixTrack v502.05 (avx) windows_intelx86 231788770 10586430 10 Jun 2019, 16:47:58 UTC 18 Jun 2019, 8:20:12 UTC Timed out - no response 0.00 0.00 --- SixTrack v502.05 (sse2) windows_intelx86 233525413 10555237 25 Jun 2019, 8:00:52 UTC 2 Jul 2019, 23:33:06 UTC Timed out - no response 0.00 0.00 --- SixTrack v502.05 (avx) windows_intelx86 236581119 --- --- --- Unsent --- --- --- --- So, I just have bad luck with the other users whom process the same WU.... Thank you very much! Greetings! |
Send message Joined: 15 Jun 08 Posts: 2425 Credit: 227,475,238 RAC: 129,792 |
Hi, OK. Now your computers are visible. This gives the following picture: Regarding SixTrack: There's nothing to complain. You attached lots of cores and got lots of tasks. Your error rate is close to 0. As each task needs a 2nd valid result to confirm your result just lean back and wait until another computer reports this 2nd result. Regarding ATLAS/Theory native: Both require CVMFS (ATLAS also Singularity) to be locally installed on your computers. Otherwise you will get nothing but errors. See this threads for help: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4840 https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4971 |
Send message Joined: 27 Jan 16 Posts: 2 Credit: 1,007,223 RAC: 0 |
Hello. Hi. I have got a similar issue. I have about 40 jobs "Validating Pending" since 22-JUL-2019 and Validator server tells the same; No jobs for validating. I'm stopping to crunch for LHC@Home until these 40 jobs are validated and I can check if they are OK or error. Regards |
Send message Joined: 15 Jun 08 Posts: 2425 Credit: 227,475,238 RAC: 129,792 |
Your results are valid. What they need is a 2nd valid result from another computer. Once that computer successfully reports the same task your result will change it's status to "valid". |
Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0 |
Hi, thanks to computezrmle for the correct replies. Concerning the comment by Crystal Pellet:
This would simplify a lot the life of SixTrack users - let's see the IT experts. Cheers, A. |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,859,285 RAC: 0 |
The "prioritising of resends" question has been with us forever. It's something that can really only be addressed with a change to the Boinc server code but since Boinc itself is administered by volunteers, there seems little enthusiasm to fiddle with it. The only current option available is to resend to "reliable" hosts, under the Accelerating Resends section. Boinc in its current configuration will always put resends to the back of the "current" queue so another option would be to release work in smaller batches and allow the queue to "almost" run dry before releasing the next batch. Resends would then go to the end of that first batch and should have higher priority than subsequent batches. Obviously that would result in more manual intervention by the staff, who are probably busy doing other stuff. Thinking out loud: Rather than releasing 500,000 WUs all together, could a script be set up to release, say 100,000, monitor the queue until it gets to, say 100, then release a further 100,000 and so on? |
Send message Joined: 13 Jul 05 Posts: 167 Credit: 14,945,019 RAC: 209 |
The "prioritising of resends" question has been with us forever.It's been less of an issue here though as until recently SixTrack work came in discrete batches lasting barely a fortnight, giving the system a chance to catch up. I'm running less SixTrack ATM so I'm not sure what the overall proportion of inconclusives is, but I still have a bunch of WUs over a week old waiting for re-sends after others' tasks got lost. This is more important now as chaining million-turn jobs together to make a 10^7 (or more) turn calculation will get badly bogged down if you start getting six-week pauses between steps! |
©2024 CERN