Message boards : ATLAS application : One Year native-Linux (SL69 and CentOS)
Message board moderation

To post messages, you must log in.

AuthorMessage
maeax

Send message
Joined: 2 May 07
Posts: 1556
Credit: 57,510,376
RAC: 200,154
Message 36122 - Posted: 30 Jul 2018, 7:06:56 UTC

In a few days native Linux for Atlas is running ONE Year.
Is it time to leave the TEST-Modus?
ID: 36122 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1992
Credit: 143,784,292
RAC: 95,494
Message 36126 - Posted: 30 Jul 2018, 9:57:42 UTC - in response to Message 36122.  

... Is it time to leave the TEST-Modus?

Yes:
- would make the project selection more transparent
- statistics would appear on the apps page

No:
- suspend/resume still doesn't work
ID: 36126 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 366
Credit: 13,258,168
RAC: 6,930
Message 36161 - Posted: 1 Aug 2018, 10:59:46 UTC

Thank you for reminding us of this anniversary :)

The main reason for keeping in test is due to the extra steps required to run in native mode, i.e. installing and configuring CVMFS and Singularity. In the virtualbox mode the BOINC client checks whether vbox is installed and if not it will not download any tasks. But there are no such checks for CVMFS and Singularity so if you don't have them you still download native jobs but they all fail immediately. We think it's better to avoid this behaviour so making people do the extra step of enabling test applications makes it more likely they will set up their hosts correctly.
ID: 36161 · Report as offensive     Reply Quote
Profile F6FGZ looking for DX !

Send message
Joined: 7 Jan 07
Posts: 39
Credit: 15,368,844
RAC: 525
Message 36162 - Posted: 1 Aug 2018, 11:01:35 UTC - in response to Message 36126.  


No:
- suspend/resume still doesn't work

Hello,

Into the Boinc Manager, I checked the box Leave non-GPU tasks in memory while suspended in the menu Options / Computing preferences... tab Disk and memory and it seems to work.
ID: 36162 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1992
Credit: 143,784,292
RAC: 95,494
Message 36169 - Posted: 1 Aug 2018, 13:30:02 UTC - in response to Message 36162.  

Do your tasks really suspend?
I use the same setting but it's only the BOINC client that reports the tasks as suspended.
The scientific app always continues running in the background.
ID: 36169 · Report as offensive     Reply Quote
Profile F6FGZ looking for DX !

Send message
Joined: 7 Jan 07
Posts: 39
Credit: 15,368,844
RAC: 525
Message 36177 - Posted: 1 Aug 2018, 16:15:20 UTC - in response to Message 36169.  
Last modified: 1 Aug 2018, 16:17:45 UTC

Do your tasks really suspend?
I use the same setting but it's only the BOINC client that reports the tasks as suspended.
The scientific app always continues running in the background.

You are right, I monitor with htop the task python -tt etc ... still running.
The big difference is that the iteration number don't restart from zero after a while.
ID: 36177 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36188 - Posted: 2 Aug 2018, 0:45:58 UTC - in response to Message 36169.  

The scientific app always continues running in the background.
By "scientific app" I assume you mean athena.py? When I suspend native tasks all athena.py processes become zombies and then disappear from top within a few seconds, usually. cvmfs runs for a few secs and then it disappears too. Those are the happy times :)

On rare occasion cvmfs jumps from it's normal low CPU and mem usage to much higher usage, the athena.py processes drop from normal ~98% CPU to about 80% and they continue running. Sometimes it goes on like that for 15 minutes. Sometimes it's still that way after 30 minutes at which point I just shake my head, walk away and try to think of something else. Again, that's very rare.
ID: 36188 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36190 - Posted: 2 Aug 2018, 0:50:07 UTC - in response to Message 36177.  

[You are right, I monitor with htop the task python -tt etc ... still running.
The big difference is that the iteration number don't restart from zero after a while.

I have not observed with htop. I will try it too.
ID: 36190 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1556
Credit: 57,510,376
RAC: 200,154
Message 36269 - Posted: 5 Aug 2018, 9:08:27 UTC - in response to Message 36161.  

Thank you for reminding us of this anniversary :)

The main reason for keeping in test is due to the extra steps required to run in native mode, i.e. installing and configuring CVMFS and Singularity. In the virtualbox mode the BOINC client checks whether vbox is installed and if not it will not download any tasks. But there are no such checks for CVMFS and Singularity so if you don't have them you still download native jobs but they all fail immediately. We think it's better to avoid this behaviour so making people do the extra step of enabling test applications makes it more likely they will set up their hosts correctly.

What's about 1.000 Collisions for native Linux instead of 200?
Fast PC's are possible to do this work in less than one day!
Computezrmle's arguments for No in this thread must be realized therefore.
ID: 36269 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36270 - Posted: 5 Aug 2018, 15:27:20 UTC - in response to Message 36269.  

What's about 1.000 Collisions for native Linux instead of 200?

+.5 (500 collisions)
ID: 36270 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1556
Credit: 57,510,376
RAC: 200,154
Message 36271 - Posted: 5 Aug 2018, 15:34:17 UTC - in response to Message 36270.  
Last modified: 5 Aug 2018, 15:59:13 UTC

For us Volunteers they need to convert from 1.000 (default) to 200 (in the past only 50).
Edit
With less than four Cores it is not possible (250 per Core against 50 at the moment).
If there are problems in the infrastructure or elsewhere by volunteer than we blow a lot of Energy in the wind!
ID: 36271 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36272 - Posted: 5 Aug 2018, 18:21:27 UTC - in response to Message 36271.  

If there are problems in the infrastructure or elsewhere by volunteer than we blow a lot of Energy in the wind!

Sometimes many small steps works better than big steps.

Just an idea.... Plug 'n Crunch... a complete "live Linux with BOINC for ATLAS native" distro in an .iso that can be burned to DVD or USB stick. Live so install to HD not required. It auto-runs a setup script that analyses system resources and then creates appropriate app_config.xml and makes recommendations to user for webside settings. KISS...just ATLAS native, no other sub-projects, no other projects, prevent users from playing with the settings via BOINC manager and boinccmd, all the cores get used, it's a dedicated 24/7 ATLAS cruncher, no options, no frustrations, no decisions.

No hassles with Linux installation. No VBox. No frustration with settings and options and docs spread all over the place. Power off, remove the media, reboot ---> returns to whatever you had.

Make the ISO a free download but offer bootable DVD and USB stick for cost of media plus shipping.
ID: 36272 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1556
Credit: 57,510,376
RAC: 200,154
Message 36500 - Posted: 19 Aug 2018, 17:22:53 UTC
Last modified: 19 Aug 2018, 17:33:28 UTC

native Linux SL69 is running very well, BUT...
when rebooting or starting the first time:

probing of cvmfs/atlas.cern.ch... ok
probing of cvmfs/atlas-condb.cern.ch... ok
probing of cvmfs/grid.cern.ch... ok
need together about 2-3 minutes for succeeding!

Have anyone else made the same experience?

BTW: all have openhtc.io and the new Kernel 2.6.32-754.3.5.el6.x86_64 from 18/8/15
ID: 36500 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1992
Credit: 143,784,292
RAC: 95,494
Message 36501 - Posted: 19 Aug 2018, 20:03:49 UTC - in response to Message 36500.  

After a reboot you may run "cvmfs_config wipecache" before the first WU starts.
ID: 36501 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36504 - Posted: 19 Aug 2018, 22:33:38 UTC - in response to Message 36501.  

After a reboot you may run "cvmfs_config wipecache" before the first WU starts.

Is it a good idea to wipe the cache every time the client starts? If so and if auto-starting the client via a SystemV init script (the way installing BOINC from most distros' repositories configures the client) then maybe it's a good idea to put a "cvmfs_config wipecache" statement in /etc/init.d/boinc-client?

On my Ubuntu rigs /etc/init.d/boinc-client has function start() as below. I am thinking maybe add a wipecache statement as shown by the red text. Should it be enclosed in single quotes, double quotes or not quoted? Sorry, I don't know bash very well.

start()
{
log_begin_msg "Starting $DESC: $NAME"
if is_running; then
log_progress_msg "already running"
else
'cvmfs_config wipecache'
if [ -n "$DISPLAY" -a -x /usr/bin/xhost ]; then
# grant the boinc client to perform GPU computing
xhost +si:localuser:$BOINC_USER || echo -n "xhost error ignored, GPU computing may not be possible"
fi
if [ -n "$VALGRIND_OPTIONS" ]; then
start-stop-daemon --start --quiet --background --pidfile $PIDFILE \
--make-pidfile --user $BOINC_USER --chuid $BOINC_USER \
--chdir $BOINC_DIR --exec /usr/bin/valgrind -- $VALGRIND_OPTIONS $BOINC_CLIENT $BOINC_OPTS
else
start-stop-daemon --start --quiet --background --pidfile $PIDFILE \
--make-pidfile --user $BOINC_USER --chuid $BOINC_USER \
--chdir $BOINC_DIR --exec $BOINC_CLIENT -- $BOINC_OPTS
fi
fi
log_end_msg 0

if [ "$SCHEDULE" = "1" ]; then
schedule
fi
}
ID: 36504 · Report as offensive     Reply Quote

Message boards : ATLAS application : One Year native-Linux (SL69 and CentOS)


©2022 CERN