41)
Message boards :
Number crunching :
Faulty Computers or Modified BOINC ?? Huge Credits
(Message 23837)
Posted 12 Jan 2012 by Igor Zacharov Post: I have been correcting the data base by hand subtracting values from total_credit in host and user tables. I believe all values should be fair now. If anybody thinks more should be done, please let me know. Thanks, |
42)
Message boards :
Number crunching :
Faulty Computers or Modified BOINC ?? Huge Credits
(Message 23836)
Posted 12 Jan 2012 by Igor Zacharov Post:
именно так |
43)
Message boards :
Number crunching :
Faulty Computers or Modified BOINC ?? Huge Credits
(Message 23830)
Posted 10 Jan 2012 by Igor Zacharov Post: The new credit system is in place. Will monitor how it goes. I have corrected the host and user tables in the sixtrack data base. This is not a simple action, as this may impact the consistency of the data base. I have recalculated the value of total_credit in these two tables. If you know more places, where values should be corrected - let me know. The correction has been done for both cheating episodes, last november and in january. However, for november I have reset only the user, who generated the huge claimed credit values. The reason not to update other host/user tables affected by the huge credit claims, is that it is not obvious how to recalculate the contributions. This is also why we did not do it immediately then. This time however, I felt that it must be done even at a risk of some inconsistencies. I hope this will resolve the issue. Thank you. |
44)
Message boards :
Number crunching :
Faulty Computers or Modified BOINC ?? Huge Credits
(Message 23825)
Posted 10 Jan 2012 by Igor Zacharov Post: We do have a new credit system ready which depends only on the computational characteristics of the sixtrack program. This will defeat any cheating with clock or any other way of statistics distortion - both, intentional or unitentional. It was ready for introduction after the cheat episode last year. What prevented its introduction, is the wait for the update to the latest boinc server version, that was waiting for the verification of the executable, etc., etc. I will decouple different dependencies and make the introduction of the new credit system the highest priority. Thank you for making your opinion known. I appreciate the shake up. |
45)
Message boards :
Number crunching :
Tasks v530.09 crashing
(Message 23431)
Posted 9 Oct 2011 by Igor Zacharov Post: After consultation with Eric McIntosh, we desided to retrack completely the 530.9 version. I have reinstalled the 530.8 version, now called 530.10. We will come back to it after a better investigation. Have to admit a mistake. |
46)
Message boards :
Number crunching :
Long delays in jobs
(Message 23417)
Posted 9 Oct 2011 by Igor Zacharov Post: with help of Keith, I have implemented the reliable/trusted host settings correctly now. I believe the execution turn-around will improve. Will monitor and see. Thank you much! |
47)
Message boards :
Number crunching :
Tasks v530.09 crashing
(Message 23415)
Posted 9 Oct 2011 by Igor Zacharov Post: we don't have much architectural choices when specifying which app version to run. I have now retracted 530.9 (deleted) for all generic x86 Windows and Linux, leaving 530.9 specifically only for platforms which report with AMD_x86_64 and Intel EM64T processors back to the server. Please, check if that works for you. |
48)
Message boards :
Number crunching :
Tasks v530.09 crashing
(Message 23391)
Posted 7 Oct 2011 by Igor Zacharov Post: yes, the 530.9 and 530.8 deliver identical results (within the model, where we look for last bit differences). The 530.9 can be factor of 2 faster, but not always - it does not optimize away calculations, it (the compiler in fact) just organizes them better by using the pipelining and special instructions. Yes, there is no problem for us to keep multiple version of the same. I just need to find out how to call the architecture to which the older cpus belong. This will allow for automatic selection of the executable. Shortening the deadlines is only a discussion item at this time. I still need to assess what will have the largest inpackt on the efficiency of calculations. |
49)
Message boards :
Number crunching :
Tasks v530.09 crashing
(Message 23387)
Posted 6 Oct 2011 by Igor Zacharov Post: Apparently, we should have kept the version 530.8 for older processors. It is still possible, I have not removed them. What would be the architecture designation for distinguishing the old and the new? |
50)
Message boards :
Number crunching :
Can't set resource to zero
(Message 23375)
Posted 6 Oct 2011 by Igor Zacharov Post:
[/quote] so, it works then and RS can be set to 0 with this trick. You don't have to explain why you need it. We appreciate your contribution. This brings me to another thought, however. Physicists complain about the long outfliers of a study. The bulk of first jobs comes quickly, then last jobs take very long, since they are sitting and waiting somewhere. We may need to tune the system to squize tighter deadlines. I would like to collect opinions about this first. |
51)
Message boards :
Number crunching :
no more work?
(Message 23374)
Posted 6 Oct 2011 by Igor Zacharov Post: two new studies will go in tomorrow. I have been holding them back to see Sorry, was posting accidentaly from a different account where I was testing. The message itself is correct. |
52)
Message boards :
Number crunching :
Can't set resource to zero
(Message 23358)
Posted 5 Oct 2011 by Igor Zacharov Post: does setting the resource share to 0.001 works for you? Just to say, we are not on the very latest boinc server version, because of some administrative reasons. Nothing serious and will be sorted out. (If you wonder - std CERN update program for software dependencies doesn't include the latest versions needed for boinc server installation.) I like to use some projects as back-up projects, by using the "0" resource share method. But when I set it to zero for this project, it keeps getting changed back to 100. Please fix. |
53)
Message boards :
Number crunching :
Linux vs. Windows app
(Message 23345)
Posted 4 Oct 2011 by Igor Zacharov Post: Why specifically Linux version of sixtrack might be slower then the Windows version needs to be investigated. Before we do that, I have loaded inherently faster version of sixtrack (version 503.9) into the system which uses SSE3. This has to be monitored for a while, since we have also changed the validator to accomodate results from different sixtrack versions. System should be stable. With version 503.9 we are also interested in comparison between linux and windows. Before, the numerical stability was major concern. Sixtrack is very sensitive to erroneous results. Particles turning around accelerator structure are very close to chaotic behaviour. Single bit differences somewhere lead to exponential deviations over 1 million turns. We have found machines out there, which produce wrong results 50-100 times more frequently then the average. We would like to offer an explanation for this artifact and will suggest a way to investigate this further. |
54)
Message boards :
Number crunching :
Too low credits granted in LHC
(Message 23249)
Posted 25 Sep 2011 by Igor Zacharov Post: we are on boinc version 6.11.0 with validator for sixtrack rewritten from the sample_bitwise_validator. The credit is calculated by returning the function stddev_credit from the compute_granted_credit in the validator. I believe we don't calculate any credit in sixtrack (will verify with Eric McIntosh). Therefore, the average is based on the client software claimed credit. |
55)
Message boards :
Number crunching :
Too low credits granted in LHC
(Message 23236)
Posted 24 Sep 2011 by Igor Zacharov Post: It is kind of important. Who regulates what should be second of compute worth? I have changed the algotithm of assignment to take the average of all the valid contributions. For the rest, the default boinc assignments are kept. The actual valuation could also be dependent on boinc version. I don't mind to multiply the calcuated figure by a factor. If admins on the other projects do the same we will end up in an escalating situation which makes no sense. Proper advice is needed. |
56)
Message boards :
LHC@home Science :
screensaver/graphics bug
(Message 23234)
Posted 24 Sep 2011 by Igor Zacharov Post: SixTrack is a computational program... Part of our problems in the past was related to bugs in the graphics part, which hampered the advance of the computation. Therefore, we have decided to cut the graphics off completely. I do agree that visual attraction is important. We plan to resurect the screen saver, but it should not be coupled to the program any more. The screensaver could be fed the cumulative effect of the study to give it informative sense. Having said that, the priority on the graphics project is low, I must admit. If there are volunteers to help with the screensaver it would be great. |
57)
Message boards :
Number crunching :
Daily quota
(Message 23205)
Posted 23 Sep 2011 by Igor Zacharov Post: Right now the limits are as follows: <daily_result_quota> 80 </daily_result_quota> <one_result_per_user_per_wu> 1 </one_result_per_user_per_wu> <max_wus_to_send> 2 </max_wus_to_send> <max_wus_in_progress> 1 </max_wus_in_progress> Please, monitor and see how this works. |
58)
Message boards :
Number crunching :
Credit awarded calculation
(Message 23177)
Posted 21 Sep 2011 by Igor Zacharov Post: based on this discussion we have changed the algorithm of awarding credits. It is based on average now. Please, monitor and tell us if the change is visible and if it is indeed the right method. |
59)
Message boards :
Number crunching :
error -177 resource limit exceeded
(Message 23132)
Posted 19 Sep 2011 by Igor Zacharov Post: I have put the value of 90MB into the data base for all work units and restarted the boinc server. Also, I have changed the distribution rate: <max_wus_to_send> 5 </max_wus_to_send> <max_wus_in_progress> 3 </max_wus_in_progress> (was 10 before) such that not too many are sitting on a single machine. The daily quota is still at 40, so that nobody should be short of work. I suggest, if you abort the jobs which are waiting or consummed little time you will get new jobs with corrected disk size. Please, report any other problems you see. Thank you. |
60)
Message boards :
Number crunching :
error -177 resource limit exceeded
(Message 23127)
Posted 19 Sep 2011 by Igor Zacharov Post: The beam-beam interaction jobs which we run now are new studies. They simulate influence of beam particles on each other and need large number of simulated turns to unveil the effects. Of course, we have tested on our computers before submitting, but we may not have corrected everything. The problem with resource limits is serios and I will discuss with Eric McIntosh when he comes to CERN in the morning. We will review the settings asap. |
©2024 CERN