Message boards : Number crunching : LHC@home has task checkpoint ?
Message board moderation

To post messages, you must log in.

AuthorMessage
Pop Horea-Vasile

Send message
Joined: 5 Mar 12
Posts: 2
Credit: 597,129
RAC: 0
Message 25706 - Posted: 30 Aug 2013, 7:25:25 UTC

Hello,
I have a second desktop that is dedicated to Boinc. It's Celeron at 1,7GHz socket 478B so can run only tasks that support task checkpoint. Does lhc@home suport ?

ID: 25706 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 25708 - Posted: 30 Aug 2013, 7:59:22 UTC - in response to Message 25706.  

Yes indeed; otherwise our longer runs might never finish!
Regards. Eric.
ID: 25708 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,869,905
RAC: 58
Message 26254 - Posted: 9 Mar 2014, 23:35:43 UTC
Last modified: 9 Mar 2014, 23:57:37 UTC

Sorry, Eric. Something else for you to take a look at when you get back.

To save thrashing my hard disk continually writing to disk, I have "Tasks checkpoint to disk at most every [ ] seconds" set to 120 seconds. Sixtrack is ignoring this setting and checkpoints every 10 seconds or less. I have only noticed this with the large amount of work we have been getting recently (which is welcome) as I have been getting quite a few hda {Busy} multiwrite errors within my sister-project Test4Theory VMs while Sixtrack has been occupying all my other cores across both my machines. Both machines do little other than T4T and Sixtrack. Presumably this has been going on since the release of 446.03 but I have only just spotted the huge increase in checkpoint frequency and now suspect this as the cause of the intermittent T4T problem as the hardware passes disk checks and the problem stops when T4T is running on its own or with CPDN which has a much longer checkpoint interval.
In previous versions, tasks would checkpoint at the percentage progress steps but 446.03 seems to checkpoint whenever it wants.
From another thread "I am also adding two new parameters NUMLMAX and NUMLCP. NUMLCP will simply specify the number of turns between checkpoints."
Could that interval be increased or could the application be configured as previous versions to, once again, checkpoint at 1% intervals and to respect users' settings within Boinc?
ID: 26254 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 26255 - Posted: 10 Mar 2014, 3:34:25 UTC - in response to Message 26254.  

Hi Ray; no need to apologise. I have taken out the
call boinc_time_to_checkpoint :-( in SixTrack Version
4451 (boinc 445.01) and later. Mea culpa. I just
didn't know enough about this feature. However, the
new options NUMLMAX and NUMLCP are in only
Version 4506 (450.3 I think). So since the checkpoint
frequency is based on the user writebin parameter it
could be that someone has specified a small number.

I think I can just re-instate the boinc_time_to_checkpoint
call to let the volunteer preference over-ride even with
the new NUMLs. (Can't build new executables right now!).
Will fix soonest. Big thanks as usual. Eric.
ID: 26255 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,869,905
RAC: 58
Message 26258 - Posted: 10 Mar 2014, 19:17:02 UTC - in response to Message 26255.  
Last modified: 10 Mar 2014, 19:44:43 UTC

After further investigation, it seems that the longer 10^6 tracking jobs checkpoint as normal at 1% intervals and it is only the 10^5 injection jobs that exhibit the short checkpointing behaviour so it may indeed have been something that those guys have fiddled with.

[Edit]
Batch of 10^5 tracking jobs just in, checkpointing at c.5% intervals around 6 mins. Much more disk-friendly.
ID: 26258 · Report as offensive     Reply Quote

Message boards : Number crunching : LHC@home has task checkpoint ?


©2025 CERN