1) Message boards : Sixtrack Application : Sixtrack Error while computing on machine with many cores and "slow" storage. (Message 34089)
Posted 26 Jan 2018 by Thord
Post:
[quote]
My suggestions:
- Install more RAM.
- Avoid using a RAM disk (it's fast but using the RAM for the disk cache is more efficient).
- Reduce your swappiness "sysctl vm.swappiness=1".

In addition to those suggestions, you could also set up a write cache, or actually increase the size, since it is there by default anyway in Linux.

For 16 GB memory I would suggest starting with these (which work in Ubuntu):

Set write cache to 4 GB for 16 GB main memory:
sudo sysctl vm.dirty_background_bytes=4000000000 (size of cache when the pages start being flushed to disk)
sudo sysctl vm.dirty_bytes=5000000000 (maximum size of cache before halting further writes)
sudo sysctl vm.dirty_writeback_centisecs=500 (checks the cache every 5 seconds)
sudo sysctl vm.dirty_expire_centisecs=360000 (pages older than 1 hour are flushed)

I actually use larger values with 32 GB memory, but these should help reduce the disk thrashing. One advantage of a cache over a ramdisk is that Linux will automatically reclaim cache space if it is needed for working memory (it is said), though I don't normally have to put that to the test.


Tanks, will test this when i start running again, right now there is som other stuff running. Further, the sixtrack servers seems overloaded as there is so many uploads that downloads was disabled.

Update:
I have been testing some more for a time... Setting up the cache did help for a while, but when the cache was full the disk could not cope with the writes, with random writes to a mechanical disk from 32 threads I get a write speed of less than 1 MByte/sec. So 16 threads seems to be about what a single disk can handle. But this should be no problem for a SSD. Also tried an USB-stick, no problem during normal run, but with the 10-second work units errors started to occur. As a side note, with an USB-stick and the settings as above it took about 5 minutes to sync when doing a reboot.

So I am back to having the BOINC-folder on /dev/shm, takes about 1GB of RAM, so less than the cache and only expands as needed.
Main drawback as I see it is you have to remember to copy it to/from disk when restarting the machine.

Would be good if the timing issue could be fixed in either the sixtrack application or the boinc client.
2) Message boards : Sixtrack Application : Sixtrack Error while computing on machine with many cores and "slow" storage. (Message 33744)
Posted 9 Jan 2018 by Thord
Post:

My suggestions:
- Install more RAM.
- Avoid using a RAM disk (it's fast but using the RAM for the disk cache is more efficient).
- Reduce your swappiness "sysctl vm.swappiness=1".

In addition to those suggestions, you could also set up a write cache, or actually increase the size, since it is there by default anyway in Linux.

For 16 GB memory I would suggest starting with these (which work in Ubuntu):

Set write cache to 4 GB for 16 GB main memory:
sudo sysctl vm.dirty_background_bytes=4000000000 (size of cache when the pages start being flushed to disk)
sudo sysctl vm.dirty_bytes=5000000000 (maximum size of cache before halting further writes)
sudo sysctl vm.dirty_writeback_centisecs=500 (checks the cache every 5 seconds)
sudo sysctl vm.dirty_expire_centisecs=360000 (pages older than 1 hour are flushed)

I actually use larger values with 32 GB memory, but these should help reduce the disk thrashing. One advantage of a cache over a ramdisk is that Linux will automatically reclaim cache space if it is needed for working memory (it is said), though I don't normally have to put that to the test.


Tanks, will test this when i start running again, right now there is som other stuff running. Further, the sixtrack servers seems overloaded as there is so many uploads that downloads was disabled.
3) Message boards : Sixtrack Application : Sixtrack Error while computing on machine with many cores and "slow" storage. (Message 33639)
Posted 3 Jan 2018 by Thord
Post:
The data below is with all the boinc-stuff residing in memory:

On the TR machine with 32 sixtrack + 1 setiathome-cuda + firefox::

top - 09:06:51 up 1 day, 9:06, 13 users, load average: 33.42, 33.81, 33.90
Tasks: 521 total, 36 running, 304 sleeping, 0 stopped, 2 zombie
%Cpu(s): 1.9 us, 0.1 sy, 93.1 ni, 4.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16355028 total, 2138648 free, 12150888 used, 2065492 buff/cache
KiB Swap: 33554428 total, 33344764 free, 209664 used. 2268360 avail Mem

On the opteron machine, 64 sixtrack:

top - 09:08:56 up 5 days, 16:24, 4 users, load average: 66.14, 66.75, 66.65
Tasks: 772 total, 65 running, 395 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.4%us, 0.0%sy, 97.4%ni, 1.9%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32904536k total, 26145756k used, 6758780k free, 378784k buffers
Swap: 33554428k total, 267776k used, 33286652k free, 2594904k cached

So seems OK. When having the boinc-stuff on disk memory usage on the TR machine was almost 100% (more active disk-cache), but load was about the same.

I also tried to increase the checkpoint time from 60 seconds to 600, but no big difference, should maybe try to increase to one hour to see if it helps.
4) Message boards : Sixtrack Application : Sixtrack Error while computing on machine with many cores and "slow" storage. (Message 33627)
Posted 2 Jan 2018 by Thord
Post:
Hi!

I have had issues with computational errors on LHC-classic (not running with virtual-box) on a 64 core Opteron machine when many work-units have been ready at the same time, or when suspending/resuming operations. I have suspected some bug in the Linux-numa kernel but not being able to pinpoint the problem (other applications have been running fine).

Recently i have got a Threadripper 16 core/32thread machine. When testing with LHC on that machine i noticed that the disk was accessed continuously (standard HDD, no SSD), giving very poor response time for starting other programs.

I also got some errors while computing on this machine. So I moved the BOINC work folders to ram-disk (/dev/shm) and since then i have not seen any "Errors while computing" on either the Threadripper or the Opteron machine.

Machines are:
ID: 10359398
ID: 10515966

So I guess there is some timing-issue between the boinc client and the six-track application.

Would be good if the team could look into this.
5) Message boards : Number crunching : Strange performance difference between Windows And Linux. (Message 27593)
Posted 11 Sep 2015 by Thord
Post:
Has anybody noticed the performance difference between windows and linux on some workunits?

For example on this one:

http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=10359398

The windows client takes almost 20 times longer to complete this work unit compared to the Linux client.

Any clue on what reason for this could be?
6) Message boards : LHC@home Science : If Neutrinos have no mass, can they escape a black hole? (Message 16655)
Posted 1 Apr 2007 by Thord
Post:
> Read this page. This details the theory and consequences of a massive photon.
>
> http://www.phys.lsu.edu/students/kristina/PhMass/PhMass.html
>
Very interesting page (except for that the equations where nearly unreadable). However this gives an idea for a speculation. If the photon has a rest mass, then perhaps we could also have three types of photons, just like there are three types of neutrinos. As the muon and tauon photons are likely to be
very rare, i think the measurments described above is not possible to do
for them. But the rest-mass for the muon and taoun photons may be much
larger than for the electron-photon, so perhaps it is possible to
detect in some other experiment.

Any ideas on this??

/Thord.


7) Questions and Answers : Unix/Linux : Boinc not suspending projects properly and mixing them up (Message 9165)
Posted 4 Aug 2005 by Thord
Post:
I am running boinc 4.43 on linux, and it seems the problem is still
there. If it is a dual cpu runnning multiple projects the manager
display is more often wrong than right.
/Thord.



©2024 CERN