Thread 'Validation Pendind since 02.JUN.2019'

Author	Message
jokerdm Send message Joined: 14 Jul 05 Posts: 3 Credit: 5,198,227 RAC: 0	Message 39343 - Posted: 12 Jul 2019, 15:23:36 UTC Hello. I have about 380 jobs "Validation Pending" since 02-JUN-2019. Validator server tells than it hasn't jobs ti process, so Â¿Is it normal? Greetings ID: 39343 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2739 Credit: 301,909,118 RAC: 83,155	Message 39344 - Posted: 12 Jul 2019, 15:41:09 UTC - in response to Message 39343. Since your computers are hidden nobody can look into the logs to give you specific help. You may make them visible for other volunteers at this page: https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project ID: 39344 · Reply Quote

Alessio Mereghetti Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0	Message 39355 - Posted: 14 Jul 2019, 14:08:44 UTC - in response to Message 39343. Hello, jokerdm, without having access to the list of tasks crunched by your hosts, I cannot tell much. Please anyway keep in mind that, in presence of such a long backlog, in case two tasks out of the same WU cannot be validated, it will take quite some time before the third one is sent out and crunched. This might be at the origin of your (not yet) validated tasks. Hope it helps, Cheers, A. ID: 39355 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1551 Credit: 10,068,129 RAC: 749	Message 39356 - Posted: 14 Jul 2019, 19:03:44 UTC - in response to Message 39355. Hope it helps, Cheers, A. What surely would help, is when a 'resend' (3rd, 4th wingman) is needed, that special created task is placed in front of the queue and not at the end. This is normal BOINC-practice, but not at LHC. Example workunit https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=116328856 232058584 10409137 12 Jun 2019, 9:39:27 UTC 14 Jun 2019, 2:06:26 UTC Completed, waiting for validation 7,291.11 7,283.58 pending SixTrack v502.05 (sse2) x86_64-pc-linux-gnu 232058585 10589358 12 Jun 2019, 9:31:41 UTC 23 Jun 2019, 1:41:11 UTC Not started by deadline - canceled 0.00 0.00 --- SixTrack v502.05 (sse2) windows_x86_64 233682174 10452031 26 Jun 2019, 11:09:54 UTC 3 Jul 2019, 6:49:49 UTC Not started by deadline - canceled 0.00 0.00 --- SixTrack v502.05 (avx) windows_intelx86 236591721 --- --- --- Unsent --- --- --- --- That last Unsent task is directly created after the second 'No reply' on 3 Jul 2019, 6:49:54 UTC, but not sent as soon as a client user is requesting new work. ID: 39356 · Reply Quote

jokerdm Send message Joined: 14 Jul 05 Posts: 3 Credit: 5,198,227 RAC: 0	Message 39359 - Posted: 15 Jul 2019, 15:51:30 UTC - in response to Message 39344. Hi, Thanks for the feedback, i change it, now is shown. ID: 39359 · Reply Quote

jokerdm Send message Joined: 14 Jul 05 Posts: 3 Credit: 5,198,227 RAC: 0	Message 39360 - Posted: 15 Jul 2019, 15:58:58 UTC - in response to Message 39356. Hello, Mistery solved, thatÂ´s what happens to my "Validation pending WUs" The elder one shows: Task click for details Computer Sent Time reported or deadline explain Status Run time (sec) CPU time (sec) Credit Application 230325739 10529993 1 Jun 2019, 7:45:14 UTC 2 Jun 2019, 8:24:22 UTC Completed, waiting for validation 53,767.04 53,673.03 pending SixTrack v502.05 (sse2) x86_64-pc-linux-gnu 230325740 10589829 1 Jun 2019, 7:55:56 UTC 7 Jun 2019, 23:29:05 UTC Not started by deadline - canceled 0.00 0.00 --- SixTrack v502.05 (avx) windows_intelx86 231788770 10586430 10 Jun 2019, 16:47:58 UTC 18 Jun 2019, 8:20:12 UTC Timed out - no response 0.00 0.00 --- SixTrack v502.05 (sse2) windows_intelx86 233525413 10555237 25 Jun 2019, 8:00:52 UTC 2 Jul 2019, 23:33:06 UTC Timed out - no response 0.00 0.00 --- SixTrack v502.05 (avx) windows_intelx86 236581119 --- --- --- Unsent --- --- --- --- So, I just have bad luck with the other users whom process the same WU.... Thank you very much! Greetings! ID: 39360 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2739 Credit: 301,909,118 RAC: 83,155	Message 39361 - Posted: 15 Jul 2019, 16:15:10 UTC - in response to Message 39359. Hi, Thanks for the feedback, i change it, now is shown. OK. Now your computers are visible. This gives the following picture: Regarding SixTrack: There's nothing to complain. You attached lots of cores and got lots of tasks. Your error rate is close to 0. As each task needs a 2nd valid result to confirm your result just lean back and wait until another computer reports this 2nd result. Regarding ATLAS/Theory native: Both require CVMFS (ATLAS also Singularity) to be locally installed on your computers. Otherwise you will get nothing but errors. See this threads for help: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4840 https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4971 ID: 39361 · Reply Quote

Agus Send message Joined: 27 Jan 16 Posts: 2 Credit: 1,007,223 RAC: 0	Message 39464 - Posted: 30 Jul 2019, 6:32:56 UTC - in response to Message 39343. Hello. I have about 380 jobs "Validation Pending" since 02-JUN-2019. Validator server tells than it hasn't jobs ti process, so Â¿Is it normal? Greetings Hi. I have got a similar issue. I have about 40 jobs "Validating Pending" since 22-JUL-2019 and Validator server tells the same; No jobs for validating. I'm stopping to crunch for LHC@Home until these 40 jobs are validated and I can check if they are OK or error. Regards ID: 39464 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2739 Credit: 301,909,118 RAC: 83,155	Message 39467 - Posted: 30 Jul 2019, 7:53:23 UTC - in response to Message 39464. Your results are valid. What they need is a 2nd valid result from another computer. Once that computer successfully reports the same task your result will change it's status to "valid". ID: 39467 · Reply Quote

Alessio Mereghetti Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0	Message 39469 - Posted: 30 Jul 2019, 14:37:31 UTC - in response to Message 39356. Hi, thanks to computezrmle for the correct replies. Concerning the comment by Crystal Pellet: What surely would help, is when a 'resend' (3rd, 4th wingman) is needed, that special created task is placed in front of the queue and not at the end. This is normal BOINC-practice, but not at LHC. This would simplify a lot the life of SixTrack users - let's see the IT experts. Cheers, A. ID: 39469 · Reply Quote

Ray Murray Volunteer moderator Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,888,115 RAC: 0	Message 39470 - Posted: 30 Jul 2019, 16:01:50 UTC Last modified: 30 Jul 2019, 16:41:19 UTC The "prioritising of resends" question has been with us forever. It's something that can really only be addressed with a change to the Boinc server code but since Boinc itself is administered by volunteers, there seems little enthusiasm to fiddle with it. The only current option available is to resend to "reliable" hosts, under the Accelerating Resends section. Boinc in its current configuration will always put resends to the back of the "current" queue so another option would be to release work in smaller batches and allow the queue to "almost" run dry before releasing the next batch. Resends would then go to the end of that first batch and should have higher priority than subsequent batches. Obviously that would result in more manual intervention by the staff, who are probably busy doing other stuff. Thinking out loud: Rather than releasing 500,000 WUs all together, could a script be set up to release, say 100,000, monitor the queue until it gets to, say 100, then release a further 100,000 and so on? ID: 39470 · Reply Quote

Henry Nebrensky Send message Joined: 13 Jul 05 Posts: 170 Credit: 15,020,549 RAC: 0	Message 39482 - Posted: 2 Aug 2019, 9:22:31 UTC - in response to Message 39470. The "prioritising of resends" question has been with us forever. It's been less of an issue here though as until recently SixTrack work came in discrete batches lasting barely a fortnight, giving the system a chance to catch up. I'm running less SixTrack ATM so I'm not sure what the overall proportion of inconclusives is, but I still have a bunch of WUs over a week old waiting for re-sends after others' tasks got lost. This is more important now as chaining million-turn jobs together to make a 10^7 (or more) turn calculation will get badly bogged down if you start getting six-week pauses between steps! ID: 39482 · Reply Quote