Message boards : Theory Application : Made a small script to keep an eye on Theory jobs with correct % done
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
seanr22a

Send message
Joined: 29 Nov 18
Posts: 35
Credit: 2,433,431
RAC: 2,230
Message 51639 - Posted: 4 Mar 2025, 14:15:15 UTC
Last modified: 4 Mar 2025, 14:18:21 UTC

I'm using this to know how far Theory jobs has come in the calculations. Maybe someone else find it useful.

#!/bin/sh

BASEDIR=/var/lib/boinc-client
STDERR=stderr.txt # not used yet, this file is in $BASEDIR/slots/$SLOT/stderr.txt for each job
RUNRIVET=cernvm/shared/runRivet.log
INPUT=input
JOBNAME=""
TOTALEVENT=0
PROCESSEDEVENT=0
EVENTTOGO=0
PERCENT=0

# Find all slots
SLOTLIST=$(ls $BASEDIR/slots)

echo " "
echo "---LHC Theory ------------------------------------------------------------------------------"
printf "%*s%*s%*s%*s%*s\n" 25 "Job id" 14 "Total events" 18 "Processed events" 18 "Remaining events" 13 "Completed %"
echo "--------------------------------------------------------------------------------------------"

for SLOT in $SLOTLIST
do
# Check if the slot is a Theory job - must check other LHC projects so cernvm is not used in
# more projects - Not used in CMS, have to check ATLAS and sixtrack as soon I get work
if [ -d $BASEDIR/slots/$SLOT/cernvm ]; then
# Get job name (note: this is one line if you copy/paste)
JOBNAME="Theory_"$(grep -a "revision" $BASEDIR/slots/$SLOT/$INPUT | tr --delete '"' | awk -F'=' '{print $2}')"-"$(grep -a "runid" $BASEDIR/slots/$SLOT/$INPUT | tr --delete '"' | awk -F'=' '{print $2}')"-"$(grep -a "seed" $BASEDIR/slots/$SLOT/$INPUT | tr --delete '"' | awk -F'=' '{print $2}') # Must be possible to do this line in a smarter way :(

TOTALEVENT=$(grep "\[runRivet\]" $BASEDIR/slots/$SLOT/$RUNRIVET | awk '{print$18}')

# need a real error handler - if the scripts read the file same time as it's being updated the TOTALEVENT get screwed up
if ! [ -n "$TOTALEVENT" ] && [ "$TOTALEVENT" -eq "$TOTALEVENT" ] 2>/dev/null; then
sleep 1 # and try one more time
TOTALEVENT=$(grep "\[runRivet\]" $BASEDIR/slots/$SLOT/$RUNRIVET | awk '{print$18}')
fi

PROCESSEDEVENT=$(tac $BASEDIR/slots/$SLOT/$RUNRIVET | awk '/events processed/{print $1;exit}')
if [ -z "${PROCESSEDEVENT}" ]; then
PROCESSEDEVENT=0 # if it gets here there is no job progress
fi

EVENTTOGO=$(expr $TOTALEVENT - $PROCESSEDEVENT)
PERCENT=$(echo "scale = 2; $PROCESSEDEVENT/$TOTALEVENT *100" | bc)

printf "%*s%*s%*s%*s%*s\n" 25 "$JOBNAME" 14 "$TOTALEVENT" 18 "$PROCESSEDEVENT" 18 "$EVENTTOGO" 13 "$PERCENT"

fi
done

The output look like this (it shows correct on a terminal but gets a little wrong here):

---LHC Theory ------------------------------------------------------------------------------
Job id Total events Processed events Remaining events Completed %
--------------------------------------------------------------------------------------------
Theory_2843-4105746-2 100000 10900 89100 10.00
Theory_2843-4105745-260 14000 700 13300 5.00
Theory_2843-4105744-242 31000 25300 5700 81.00
Theory_2843-4105740-250 47000 45900 1100 97.00
Theory_2843-4105750-290 45000 20900 24100 46.00
Theory_2843-4105752-290 22000 20400 1600 92.00
Theory_2843-4105744-250 22000 21800 200 99.00
ID: 51639 · Report as offensive     Reply Quote
seanr22a

Send message
Joined: 29 Nov 18
Posts: 35
Credit: 2,433,431
RAC: 2,230
Message 51640 - Posted: 4 Mar 2025, 15:41:13 UTC
Last modified: 4 Mar 2025, 15:43:03 UTC

Replace: PERCENT=$(echo "scale = 2; $PROCESSEDEVENT/$TOTALEVENT *100" | bc)

With: PERCENT=$(awk -v a="$PROCESSEDEVENT" -v b="$TOTALEVENT" 'BEGIN {printf("%.1f\n",100*a/b)}')

It gives a more correct % number

---LHC Theory ------------------------------------------------------------------------------
Job id Total events Processed events Remaining events Completed %
--------------------------------------------------------------------------------------------
Theory_2843-4105746-2 100000 12000 88000 12.0
Theory_2843-4105745-260 14000 2200 11800 15.7
Theory_2843-4105742-304 29000 1300 27700 4.5
Theory_2843-4105743-304 19000 1000 18000 5.3
Theory_2843-4105749-304 33000 3600 29400 10.9
Theory_2843-4105740-304 45000 2200 42800 4.9
Theory_2843-4105748-304 36000 3600 32400 10.0
Theory_2843-4105750-304 39000 3600 35400 9.2
Theory_2843-4105747-304 41000 3700 37300 9.0
Theory_2843-4105744-242 31000 26700 4300 86.1
Theory_2843-4105751-304 42000 3500 38500 8.3
Theory_2843-4105750-290 45000 25900 19100 57.6
Theory_2814-4013432-44 100000 22600 77400 22.6
Theory_2814-3953152-43 100000 61700 38300 61.7
Theory_2814-3941809-44 100000 43500 56500 43.5
ID: 51640 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2629
Credit: 268,527,686
RAC: 133,033
Message 51641 - Posted: 4 Mar 2025, 18:21:20 UTC - in response to Message 51640.  

@seanr22a

As for your stderr.txt logfiles, this one looks good:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=419992235


Some hints regarding your script (you don't need to follow them)

Avoid variale names like STDERR.
It is too close to stderr which has a special meaning.

Consider to grep init_data.xml for result_name and refine the output like:
JOBNAME="$(grep -Pom1 '<result_name>Theory_\K[^<]+' init_data.xml)"
[[ $JOBNAME != "" ]] && JOBNAME="Theory_$JOBNAME"

Avoid 'expr' like in:
EVENTTOGO=$(expr $TOTALEVENT - $PROCESSEDEVENT)

Instead (in bash) use '(( ))' for integer(!) calculations like:
EVENTTOGO=$(( $TOTALEVENT - $PROCESSEDEVENT ))

Consider to use 'watch' to run your script in a separate console every n seconds like:
watch -n 60 my_script
ID: 51641 · Report as offensive     Reply Quote
seanr22a

Send message
Joined: 29 Nov 18
Posts: 35
Credit: 2,433,431
RAC: 2,230
Message 51642 - Posted: 5 Mar 2025, 6:06:24 UTC - in response to Message 51641.  
Last modified: 5 Mar 2025, 6:06:56 UTC

@computezrmle

I followed your suggestions (almost) ;) and did some small improvements and changed from sh to bash. The end result is the same but the script is cleaner. Thanks !

I use while :; do clear; theory.sh; sleep 60; done - watch do the same thing and is shorter to write :)


The updated version:

#!/bin/bash

DATE=$(date +"%Y-%m-%d %H:%m:%S")
BASEDIR=/var/lib/boinc-client
ERRLOG=stderr.txt # not used yet in this script, this file is in $BASEDIR/slots/$SLOT/stderr.txt for each job - check if you have problems
RUNRIVET=cernvm/shared/runRivet.log
JOBINPUT=init_data.xml
JOBNAME=""
TOTALEVENT=0
PROCESSEDEVENT=0
EVENTTOGO=0
PERCENT=0
CERNVMCOUNTER=0

# Find all slots
SLOTLIST=$(ls $BASEDIR/slots)

echo " "
echo "--- LHC Theory ----- $DATE ---------------------------------------------------"
printf "%*s%*s%*s%*s%*s\n" 27 "Job id" 14 "Total events" 18 "Processed events" 18 "Remaining events" 13 "Completed %"
echo "--------------------------------------------------------------------------------------------"

for SLOT in $SLOTLIST
do
# Check if the slot is a Theory job - must check other LHC projects so cernvm is not used in
# more projects - Not used in CMS, have to check ATLAS and sixtrack as soon I get work
if [ -d $BASEDIR/slots/$SLOT/cernvm ]; then
((CERNVMCOUNTER++))

JOBNAME="Theory_"$(grep -Pom1 '<result_name>Theory_\K[^<]+' $BASEDIR/slots/$SLOT/$JOBINPUT)

TOTALEVENT=$(grep "\[runRivet\]" $BASEDIR/slots/$SLOT/$RUNRIVET | awk '{print$18}')

# if the scripts read the file same time as it's being updated the TOTALEVENT get screwed up so make a simple error handler
if ! [ -n "$TOTALEVENT" ] && [ "$TOTALEVENT" -eq "$TOTALEVENT" ] 2>/dev/null; then
sleep 1 # and try one more time
TOTALEVENT=$(grep "\[runRivet\]" $BASEDIR/slots/$SLOT/$RUNRIVET | awk '{print$18}')
fi

PROCESSEDEVENT=$(tac $BASEDIR/slots/$SLOT/$RUNRIVET | awk '/events processed/{print $1;exit}')
if [ -z "${PROCESSEDEVENT}" ]; then
PROCESSEDEVENT=0
fi

EVENTTOGO=$(( $TOTALEVENT - $PROCESSEDEVENT ))
PERCENT=$(awk -v a="$PROCESSEDEVENT" -v b="$TOTALEVENT" 'BEGIN {printf("%.1f\n",100*a/b)}')

printf "%*s%*s%*s%*s%*s\n" 27 "$JOBNAME" 14 "$TOTALEVENT" 18 "$PROCESSEDEVENT" 18 "$EVENTTOGO" 13 "$PERCENT"
fi
done

if (( $CERNVMCOUNTER == 0 )); then
echo "No Theory job running"
fi

echo "--------------------------------------------------------------------------------------------"


Looks like this:

ID: 51642 · Report as offensive     Reply Quote
Anne Havinga

Send message
Joined: 4 Mar 20
Posts: 13
Credit: 5,760,632
RAC: 6,163
Message 51643 - Posted: 5 Mar 2025, 13:08:21 UTC - in response to Message 51642.  
Last modified: 5 Mar 2025, 13:12:56 UTC

@seanr22a

Thank you very much for the script.
I just want to let you know a little typo. The minutes part of the date should be a capital M for the minutes instead of m for the month.

DATE=$(date +"%Y-%m-%d %H:%M:%S")
ID: 51643 · Report as offensive     Reply Quote
seanr22a

Send message
Joined: 29 Nov 18
Posts: 35
Credit: 2,433,431
RAC: 2,230
Message 51644 - Posted: 5 Mar 2025, 14:22:49 UTC - in response to Message 51643.  
Last modified: 5 Mar 2025, 14:23:27 UTC

Anne Havinga
The minutes part of the date should be a capital M for the minutes instead of m for the month.


didn't see that one. Thanks !

I can't go back and edit my previous post so I hope those who are interested see your post or they will have the minute updated once a month :lol:
ID: 51644 · Report as offensive     Reply Quote
seanr22a

Send message
Joined: 29 Nov 18
Posts: 35
Credit: 2,433,431
RAC: 2,230
Message 51660 - Posted: 9 Mar 2025, 8:30:44 UTC - in response to Message 51644.  

There is a small bug in the theory logging. I don't expect anyone to fix it but I can whish :)

this is runRivet.log for Theory_2843-4274082-511_0. When the theory app logs anomalies, it doesn't end the log line with a newline \n. When the app logs how many events that has been processed the info gets mixed up and it's not possible to extract the correct number of events, example from the log: -15 [3163800 events processed - should have been '63800 events processed' on its own line. The logging works as intended when there is no anomalies.

The 'events processed' is mixed up in many different ways with the other log info. This is just a couple of lines of thousands of lines:

The decay tau- -> nu_tau e- nu_ebar 156.664 500 is too inefficient for the particle 30 tau- 63400 events processed
The decay tau- -> nu_tau mu- nu_mub63500 events processed
63600 events processed
The decay tau- -> nu_tau e- nu_ebar 156.664 500 is too inefficient for the particle 38 63700 events processed
The decay tau+ -> nu_taubar pi+ pi0 544.147 500 is too inefficient for the particle 32 tau+ -15 [3163800 events processed
The decay tau+ ->63900 events processed

So my whish is that a new line is added to the end of each log message in the app running theory jobs.
ID: 51660 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2629
Credit: 268,527,686
RAC: 133,033
Message 51661 - Posted: 9 Mar 2025, 9:13:31 UTC - in response to Message 51660.  

So my whish is that a new line is added ...

These logs are not intended to be used by BOINC users.
If you do, accept them 'as is'.

OTOH you could easily enable your script to skip weird lines and only filter for well formatted ones.
Hint: use regex patterns.
ID: 51661 · Report as offensive     Reply Quote
seanr22a

Send message
Joined: 29 Nov 18
Posts: 35
Credit: 2,433,431
RAC: 2,230
Message 51666 - Posted: 11 Mar 2025, 7:45:03 UTC - in response to Message 51660.  
Last modified: 11 Mar 2025, 8:13:59 UTC

There has been a lot of new Theory jobs released that behaves a little different from the ones I wrote the script for. So I did some updates.

1. Added a Slot column to make it easier to find the stderr.txt file for the job
2. Added an Elapsed Time column. This does not show the CPU time; it shows the total time from when Boinc downloaded the job and created the slot until now.
3. Added a Err column. This column gets a marker if the script can't extract the Processed Event info from the runRivet.log. It does NOT show that there is something wrong with the job.
4. Improved error handling.

-------------

#!/bin/bash

DATE=$(date +"%Y-%m-%d %H:%M:%S")
HOST=$(hostname)
BASEDIR=/var/lib/boinc-client
ERRLOG=stderr.txt # not used yet in this script, this file is in $BASEDIR/slots/$SLOT/stderr.txt for each job - check if you have problems
RUNRIVET=cernvm/shared/runRivet.log
TMPRUNRIVET=/tmp/runRivet.log
JOBINPUT=init_data.xml
JOBNAME=""
JOBTIME=""
JOBSTART=""
JOBCURRENT=""
TOTALEVENT=0
PROCESSEDEVENT=0
EVENTTOGO=0
PERCENT=0
CERNVMCOUNTER=0
ERR=""


# Find all Boinc slots
SLOTLIST=$(ls $BASEDIR/slots)

echo -e "\n"
echo "--- LHC Theory - $HOST ---- $DATE -------------------------------------------------------------------------"
printf "%*s%*s%*s%*s%*s%*s%*s%*s\n" 6 "Slot" 27 "Job id" 14 "Total events" 18 "Processed events" 18 "Remaining events" 18 "Elapsed time" 13 "Completed %" 5 "Err"
echo "-------------------------------------------------------------------------------------------------------------------------"

for SLOT in $SLOTLIST
do
# Check if the slot is a Theory job. Only the Theory jobs has the cernvm folder and check for the runRivet.log file.
if [ -d $BASEDIR/slots/"$SLOT"/cernvm ] && [ -f $BASEDIR/slots/"$SLOT"/$RUNRIVET ]; then
# Work with a copy of the runRivet.log file to avoid errors if the file is modified during script exec.
cp $BASEDIR/slots/"$SLOT"/$RUNRIVET $TMPRUNRIVET

# Keep track of how many theory jobs
((CERNVMCOUNTER++))

# Calculate runtime. This is not cpu time, it's the total time from when the slot was created in Boinc until now.
JOBSTART=$(stat --format %w $BASEDIR/slots/"$SLOT"/boinc_lockfile | awk -F'.' '{print $1}')
JOBCURRENT=$(date +"%Y-%m-%d %H:%m:%S")
diff=$(($(date -d "$JOBCURRENT" +'%s') - $(date -d "$JOBSTART" +'%s')))
days=$(($(date -d @$diff +'%-j')-1))
JOBTIME=$(date -d @"$diff" +"$days"' day(s) %H:%M')


JOBNAME="Theory_"$(grep -Pom1 '<result_name>Theory_\K[^<]+' $BASEDIR/slots/"$SLOT"/$JOBINPUT)

TOTALEVENT=$(grep "\[runRivet\]" $TMPRUNRIVET | awk '{print$18}')

if [ -z "${TOTALEVENT}" ]; then
TOTALEVENT=0
fi

PROCESSEDEVENT=$(grep "events processed" $TMPRUNRIVET | tac $TMPRUNRIVET | awk '/events processed/{print $1;exit}')

# Check so PROCESSEDEVENT is a number. There is a logging bug in the theory app so if there is job anomalies logged we can't extract how many events
if [ -n "$PROCESSEDEVENT" ] && [ "$PROCESSEDEVENT" -eq "$PROCESSEDEVENT" ] 2>/dev/null; then
if [ "$TOTALEVENT" -ge "$PROCESSEDEVENT" ]; then
EVENTTOGO=$(( TOTALEVENT - PROCESSEDEVENT ))
PERCENT=$(awk -v a="$PROCESSEDEVENT" -v b="$TOTALEVENT" 'BEGIN {printf("%.1f\n",100*a/b)}')
ERR=""
else
EVENTTOGO=0
PERCENT=0
ERR="*"
fi
else
PROCESSEDEVENT=0
EVENTTOGO=0
PERCENT=0
ERR="*"
fi

printf "%*s%*s%*s%*s%*s%*s%*s%*s\n" 6 "$SLOT" 27 "$JOBNAME" 14 "$TOTALEVENT" 18 "$PROCESSEDEVENT" 18 "$EVENTTOGO" 18 "$JOBTIME" 13 "$PERCENT" 5 "$ERR"
rm $TMPRUNRIVET
fi
done

if (( CERNVMCOUNTER == 0 )); then
echo "No Theory job running"
fi

echo -e "\n--- Number of Theory jobs for host $HOST: $CERNVMCOUNTER ----------------------------------------------------------------------------"

-------------

This is how it looks now:
ID: 51666 · Report as offensive     Reply Quote
Anne Havinga

Send message
Joined: 4 Mar 20
Posts: 13
Credit: 5,760,632
RAC: 6,163
Message 51675 - Posted: 12 Mar 2025, 17:20:23 UTC - in response to Message 51666.  

Thanks again for the update.
ID: 51675 · Report as offensive     Reply Quote
seanr22a

Send message
Joined: 29 Nov 18
Posts: 35
Credit: 2,433,431
RAC: 2,230
Message 51688 - Posted: 15 Mar 2025, 7:36:48 UTC - in response to Message 51675.  
Last modified: 15 Mar 2025, 7:55:57 UTC

I received some jobs that doesn't use the Event method to tell how the job is proceeding. Instead it uses 'Integrate' like this in the runRivet.log file: Integrate 318 of 760:

Replace this:
--------------
TOTALEVENT=$(grep "\[runRivet\]" $TMPRUNRIVET | awk '{print$18}')

if [ -z "${TOTALEVENT}" ]; then
TOTALEVENT=0
fi

PROCESSEDEVENT=$(grep "events processed" $TMPRUNRIVET | tac $TMPRUNRIVET | awk '/events processed/{print $1;exit}')
--------------

With this:
--------------
# Check if it is a job that doesn't use Event but Integrate
INTEGRATE=$(awk -v p="Integrate" '$1 == p' $TMPRUNRIVET | awk '{last=$0} END{print last}' | sed 's/.$//')

if [ ${INTEGRATE:+1} ]; then
# Uses Integrate
TOTALEVENT=$(echo $INTEGRATE | awk '{print $4}')
PROCESSEDEVENT=$(echo $INTEGRATE | awk '{print $2}')
else
# Uses Event
TOTALEVENT=$(grep "\[runRivet\]" $TMPRUNRIVET | awk '{print$18}')

if [ -z "${TOTALEVENT}" ]; then
TOTALEVENT=0
fi

PROCESSEDEVENT=$(grep "events processed" $TMPRUNRIVET | tac $TMPRUNRIVET | awk '/events processed/{print $1;exit}')
fi

--------------

It just do a check for a absolute match of "Integrate" in the runRivet.log file. This can cause trouble if any app would log the word 'Integrate' to the logfile for some other reason but so far it's working well. This check can always be improved :)


This is one of those jobs:
ID: 51688 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1446
Credit: 9,710,908
RAC: 377
Message 51689 - Posted: 15 Mar 2025, 9:15:09 UTC - in response to Message 51688.  

I received some jobs that doesn't use the Event method to tell how the job is proceeding. Instead it uses 'Integrate' like this in the runRivet.log file: Integrate 318 of 760:
This is one of the Theory tasks with the SHERPA event generator.
These jobs have several (I think 4) steps before the real event generation starts.
These steps consist of integration and initialisation.
Most of the times the needed time for the last part (event generation) is shorter than the other steps together, so difficult to predict.
ID: 51689 · Report as offensive     Reply Quote
seanr22a

Send message
Joined: 29 Nov 18
Posts: 35
Credit: 2,433,431
RAC: 2,230
Message 51690 - Posted: 15 Mar 2025, 10:23:51 UTC - in response to Message 51689.  

These jobs have several (I think 4) steps before the real event generation starts. These steps consist of integration and initialisation.


Good to know. I will keep an eye on it and do necessary script adjustments so it can handle the transition from preparing to the event phase. Thanks !
ID: 51690 · Report as offensive     Reply Quote
seanr22a

Send message
Joined: 29 Nov 18
Posts: 35
Credit: 2,433,431
RAC: 2,230
Message 51692 - Posted: 15 Mar 2025, 16:21:20 UTC - in response to Message 51689.  

@crystal pellet
The script handles the Integrate/prepare now. Those jobs that has the Integrate/prepare period will show the progress and there is a Pre in the Err column showing it is preparing. As soon it gone through its prepare sequence and started the actual job it shows the Event progress as usual. I have tested back and forth with a saved runRivet.log file and it seems to be working. I have to wait for more jobs of that type to be sure.

@Anne Havinga
I don't know if I noticed it first this time :) I have apperently a hard time writing the M for minute. If you look at the updated script in the previous post at the Elapsed time calculation - I did it again :lol: but it's fixed now.


Another updated version. I don't know how many different types of jobs there is needing updates. Time will tell.


#!/bin/bash

DATE=$(date +"%Y-%m-%d %H:%M:%S")
HOST=$(hostname)
BASEDIR=/var/lib/boinc-client
ERRLOG=stderr.txt # not used yet in this script, this file is in $BASEDIR/slots/$SLOT/stderr.txt for each job - check if you have problems
RUNRIVET=cernvm/shared/runRivet.log
TMPRUNRIVET=/tmp/runRivet.log
JOBINPUT=init_data.xml
JOBNAME=""
JOBTIME=""
JOBSTART=""
JOBCURRENT=""
TOTALEVENT=0
PROCESSEDEVENT=0
EVENTTOGO=0
PERCENT=0
CERNVMCOUNTER=0
INTEGRATE=""
ERR=""


# Find all Boinc slots
SLOTLIST=$(ls $BASEDIR/slots)

echo -e "\n"
echo "--- LHC Theory - $HOST ---- $DATE -------------------------------------------------------------------------"
printf "%*s%*s%*s%*s%*s%*s%*s%*s\n" 6 "Slot" 27 "Job id" 14 "Total events" 18 "Processed events" 18 "Remaining events" 18 "Elapsed time" 13 "Completed %" 5 "Err"
echo "-------------------------------------------------------------------------------------------------------------------------"

for SLOT in $SLOTLIST
do
# Check if the slot is a Theory job. Only the Theory jobs has the cernvm folder and check for the runRivet.log file.
if [ -d $BASEDIR/slots/"$SLOT"/cernvm ] && [ -f $BASEDIR/slots/"$SLOT"/$RUNRIVET ]; then
# Work with a copy of the runRivet.log file to avoid errors if the file is modified during script exec.
cp $BASEDIR/slots/"$SLOT"/$RUNRIVET $TMPRUNRIVET
ERR=""

# Keep track of how many theory jobs
((CERNVMCOUNTER++))

# Calculate runtime. This is not cpu time, it's the total time from when the slot was created in Boinc until now.
JOBSTART=$(stat --format %w $BASEDIR/slots/"$SLOT"/boinc_lockfile | awk -F'.' '{print $1}')
JOBCURRENT=$(date +"%Y-%m-%d %H:%M:%S")
diff=$(($(date -d "$JOBCURRENT" +'%s') - $(date -d "$JOBSTART" +'%s')))
days=$(($(date -d @$diff +'%-j')-1))
JOBTIME=$(date -d @"$diff" +"$days"' day(s) %H:%M')

JOBNAME="Theory_"$(grep -Pom1 '<result_name>Theory_\K[^<]+' $BASEDIR/slots/"$SLOT"/$JOBINPUT)

TOTALEVENT=$(grep "\[runRivet\]" $TMPRUNRIVET | awk '{print$18}')
if [ -z "${TOTALEVENT}" ]; then
TOTALEVENT=0
ERR="*"
fi

# If PROCESSEDEVENT get a number when disable INTEGRATE. If it is empty it's still in INTEGRATE mode so enable INTEGRATE
PROCESSEDEVENT=$(grep "events processed" $TMPRUNRIVET | tail -1 | awk '/events processed/{print $1;exit}')
if [ -n "$PROCESSEDEVENT" ] && [ "$PROCESSEDEVENT" -eq "$PROCESSEDEVENT" ] 2>/dev/null; then
NOINTEGRATE=1
else
NOINTEGRATE=0
fi
if [ -z "${PROCESSEDEVENT}" ]; then
PROCESSEDEVENT=0
ERR="*"
fi

# Overide with Integrate to handle the job transition from Integrate (integration and initialisation phase) to Event phase. Info from @Crystal Pellet at LHC.
# Check if job uses integration/initialisation phase
if [ "$NOINTEGRATE" -eq "0" ]; then
INTEGRATE=$(awk -v p="Integrate" '$1 == p' $TMPRUNRIVET | awk '{last=$0} END{print last}' | sed 's/.$//')
if [ ${INTEGRATE:+1} ]; then
# Uses Integrate
TOTALEVENT=$(echo "$INTEGRATE" | awk '{print $4}')
PROCESSEDEVENT=$(echo "$INTEGRATE" | awk '{print $2}')
ERR="Pre"
fi
fi

# Check so PROCESSEDEVENT is a number
if [ -n "$PROCESSEDEVENT" ] && [ "$PROCESSEDEVENT" -eq "$PROCESSEDEVENT" ] 2>/dev/null; then
if [ "$TOTALEVENT" -ge "$PROCESSEDEVENT" ]; then
EVENTTOGO=$(( TOTALEVENT - PROCESSEDEVENT ))
PERCENT=$(awk -v a="$PROCESSEDEVENT" -v b="$TOTALEVENT" 'BEGIN {printf("%.1f\n",100*a/b)}')
else
EVENTTOGO=0
PERCENT=0
ERR="*"
fi
else
PROCESSEDEVENT=0
EVENTTOGO=0
PERCENT=0
ERR="*"
fi

printf "%*s%*s%*s%*s%*s%*s%*s%*s\n" 6 "$SLOT" 27 "$JOBNAME" 14 "$TOTALEVENT" 18 "$PROCESSEDEVENT" 18 "$EVENTTOGO" 18 "$JOBTIME" 13 "$PERCENT" 5 "$ERR"
rm $TMPRUNRIVET
fi
done

if (( CERNVMCOUNTER == 0 )); then
echo "No Theory job running"
fi

echo -e "\n--- Number of Theory jobs for host $HOST: $CERNVMCOUNTER ----------------------------------------------------------------------------"
ID: 51692 · Report as offensive     Reply Quote
Anne Havinga

Send message
Joined: 4 Mar 20
Posts: 13
Credit: 5,760,632
RAC: 6,163
Message 51718 - Posted: 18 Mar 2025, 18:08:02 UTC - in response to Message 51692.  

@ seanr22a

yes I did notice it and corrected it on my own copy :-).
Thanks again for this update however I found a minor fault. The elapsed time calculated is hour to much. If the jobs started the elapsed time starts with 0 days and 1:00 hours. As I am not a scripting and/or regex expert it's hard for me to correct. It would be nice to have it fixed if possible. Thanks in advance.
ID: 51718 · Report as offensive     Reply Quote
kotenok2000
Avatar

Send message
Joined: 21 Feb 11
Posts: 83
Credit: 577,613
RAC: 12
Message 51720 - Posted: 18 Mar 2025, 21:53:37 UTC

There is another problem
It assumes boinc data is located in /var/lib/boinc-client, while latest versions from https://boinc.berkeley.edu/linux_install.php install it in /var/lib/boinc
ID: 51720 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2629
Credit: 268,527,686
RAC: 133,033
Message 51721 - Posted: 19 Mar 2025, 6:00:41 UTC

@ seanr22a

I suggest not to use this forum to publish your script.

The much better place would be https://github.com/.
Looks like you are already registered there.
Just create your own repository and make it public, e.g.
https://github.com/seanr22a/lhcathome___or_any_name_you_choose

From here set a link to that repository.
ID: 51721 · Report as offensive     Reply Quote
seanr22a

Send message
Joined: 29 Nov 18
Posts: 35
Credit: 2,433,431
RAC: 2,230
Message 51726 - Posted: 19 Mar 2025, 13:50:42 UTC - in response to Message 51720.  

There is another problem
It assumes boinc data is located in /var/lib/boinc-client, while latest versions from https://boinc.berkeley.edu/linux_install.php install it in /var/lib/boinc


That is why you have the variabel BASEDIR=/var/lib/boinc-client that you are supposed to change to match your install :)
ID: 51726 · Report as offensive     Reply Quote
seanr22a

Send message
Joined: 29 Nov 18
Posts: 35
Credit: 2,433,431
RAC: 2,230
Message 51727 - Posted: 19 Mar 2025, 14:03:45 UTC - in response to Message 51718.  
Last modified: 19 Mar 2025, 14:56:23 UTC

@ seanr22a

Thanks again for this update however I found a minor fault. The elapsed time calculated is hour to much. If the jobs started the elapsed time starts with 0 days and 1:00 hours.


I will take a look at it. Waiting for another batch of Theory jobs so I have something to work with.

I fixed some other small errors and added support for one more type of Theory job. I'm testing that now but need more Theory jobs.

I will post the script here one more time when I've done some testing. After that I will take a look at github as admin @computezrmle wanted me to do.

[EDIT]
Did a quick test script with exactly the same time calculation as in the theory script.

What I found was quite funny :)

This is the time info as it is in the theory script right now:

Jobstart = 2025-03-19 21:43:37
Jobcurrent = 2025-03-19 21:43:57
diff = 20
days = 0
Jobtime = 0 day(s) 07:00

So, it should have been 00:00 in time difference but at my location it adds 7 hours from the TIMEZONE. I'm at Asia/Bangkok so it explains the issue. At your location you have a TIMEZONE that are +1 hour.

To fix this replace:
JOBTIME=$(date -d @"$diff" +"$days"' day(s) %H:%M')

With:
JOBTIME=$(TZ=GMT date -d @"$diff" +"$days"' day(s) %H:%M')

This clears the timezone for the date command (only in the script) to GMT timezone which is 0. I hope daylight savings doesn't mess with this as well :)
ID: 51727 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2629
Credit: 268,527,686
RAC: 133,033
Message 51728 - Posted: 19 Mar 2025, 14:23:00 UTC - in response to Message 51727.  

...as admin @computezrmle wanted me to do.

I'm neither admin nor do I force you to do so.
It's just the case that this forum is not made to handle things like code management.
Github is made exactly for this.
ID: 51728 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Theory Application : Made a small script to keep an eye on Theory jobs with correct % done


©2025 CERN