Message boards :
Number crunching :
LHC@home has task checkpoint ?
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Mar 12 Posts: 2 Credit: 597,129 RAC: 0 ![]() ![]() |
Hello, I have a second desktop that is dedicated to Boinc. It's Celeron at 1,7GHz socket 478B so can run only tasks that support task checkpoint. Does lhc@home suport ? |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Yes indeed; otherwise our longer runs might never finish! Regards. Eric. |
![]() ![]() Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,869,905 RAC: 58 ![]() ![]() |
Sorry, Eric. Something else for you to take a look at when you get back. To save thrashing my hard disk continually writing to disk, I have "Tasks checkpoint to disk at most every [ ] seconds" set to 120 seconds. Sixtrack is ignoring this setting and checkpoints every 10 seconds or less. I have only noticed this with the large amount of work we have been getting recently (which is welcome) as I have been getting quite a few hda {Busy} multiwrite errors within my sister-project Test4Theory VMs while Sixtrack has been occupying all my other cores across both my machines. Both machines do little other than T4T and Sixtrack. Presumably this has been going on since the release of 446.03 but I have only just spotted the huge increase in checkpoint frequency and now suspect this as the cause of the intermittent T4T problem as the hardware passes disk checks and the problem stops when T4T is running on its own or with CPDN which has a much longer checkpoint interval. In previous versions, tasks would checkpoint at the percentage progress steps but 446.03 seems to checkpoint whenever it wants. From another thread "I am also adding two new parameters NUMLMAX and NUMLCP. NUMLCP will simply specify the number of turns between checkpoints." Could that interval be increased or could the application be configured as previous versions to, once again, checkpoint at 1% intervals and to respect users' settings within Boinc? |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Hi Ray; no need to apologise. I have taken out the call boinc_time_to_checkpoint :-( in SixTrack Version 4451 (boinc 445.01) and later. Mea culpa. I just didn't know enough about this feature. However, the new options NUMLMAX and NUMLCP are in only Version 4506 (450.3 I think). So since the checkpoint frequency is based on the user writebin parameter it could be that someone has specified a small number. I think I can just re-instate the boinc_time_to_checkpoint call to let the volunteer preference over-ride even with the new NUMLs. (Can't build new executables right now!). Will fix soonest. Big thanks as usual. Eric. |
![]() ![]() Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,869,905 RAC: 58 ![]() ![]() |
After further investigation, it seems that the longer 10^6 tracking jobs checkpoint as normal at 1% intervals and it is only the 10^5 injection jobs that exhibit the short checkpointing behaviour so it may indeed have been something that those guys have fiddled with. [Edit] Batch of 10^5 tracking jobs just in, checkpointing at c.5% intervals around 6 mins. Much more disk-friendly. |
©2025 CERN