Message boards : ATLAS application : ATLAS native version 2.73
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 109
Credit: 38,284,991
RAC: 15,603
Message 40622 - Posted: 23 Nov 2019, 19:31:14 UTC

Apparently the nthreads variable is needed. This rig had last DLed 6c WUs.
<app_config>
<app_version>
    <app_name>ATLAS</app_name>
    <plan_class>native_mt</plan_class>
    <avg_ncpus>8.0</avg_ncpus>
</app_version>
</app_config>
DLed two 8c WUs. From client_state:
<app_version>
    <app_name>ATLAS</app_name>
    <version_num>273</version_num>
    <platform>x86_64-pc-linux-gnu</platform>
    <avg_ncpus>8.000000</avg_ncpus>
    <flops>1621673709.674330</flops>
    <plan_class>native_mt</plan_class>
    <api_version>7.7.0</api_version>
    <cmdline>--nthreads 6</cmdline>
Adding <cmdliine>--nthreads 8</cmdline> DLed 22 8c WUs.
Thanks so much. Now it will be easy to keep my cluster Fat 'n Happy.[/code]
ID: 40622 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2049
Credit: 154,398,907
RAC: 144,603
Message 40623 - Posted: 23 Nov 2019, 19:53:15 UTC - in response to Message 40622.  

<cmdline>--nthreads 6</cmdline>

I wonder where the --nthreads 6 setting comes from.
Either
- from the server if the web preferences are set to 6 cores?
- it was set before and you did not restart the BOINC client after you removed the <cmdline> tag?

Anyway. You are save if both values are in sync.
ID: 40623 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 109
Credit: 38,284,991
RAC: 15,603
Message 40629 - Posted: 24 Nov 2019, 14:09:29 UTC

This fix is working so good I now have an insurmountable problem. Overnight I've accumulated 250 ATLAS WUs in the upload queue. My transfer speed has slowed to 8 kBps.
Why do ISPs have a DL speed 20 times faster than the UL speed???
Can completed ATLAS WUs be shrunk from their present 100 to 250 MB???
ID: 40629 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2049
Credit: 154,398,907
RAC: 144,603
Message 40632 - Posted: 24 Nov 2019, 16:18:04 UTC - in response to Message 40629.  

This fix is working so good I now have an insurmountable problem. Overnight I've accumulated 250 ATLAS WUs in the upload queue. My transfer speed has slowed to 8 kBps.
Why do ISPs have a DL speed 20 times faster than the UL speed???
Can completed ATLAS WUs be shrunk from their present 100 to 250 MB???

You are not alone.
Most people forget to plan other resources beside CPU power and number of cores.
RAM, disk size, disk speed, overheating, local network over wi-fi instead a fast cable, routers that silently drop open connections because there are too many of them...
And not even a squid proxy can compensate a saturated upload line.
;-(
ID: 40632 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 109
Credit: 38,284,991
RAC: 15,603
Message 40633 - Posted: 24 Nov 2019, 17:14:37 UTC - in response to Message 40632.  
Last modified: 24 Nov 2019, 17:32:20 UTC

RAM, disk size, disk speed, overheating, local network over wi-fi instead a fast cable, routers that silently drop open connections because there are too many of them...
None of those are a problem for me. Just my cable internet upload speed of 6 MBps.
Can CERN do something to make the return files smaller???

I've tried this in my cc_config:
<max_file_xfers>6</max_file_xfers>
<max_file_xfers_per_project>6</max_file_xfers_per_project>
But to no avail.
ID: 40633 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 109
Credit: 38,284,991
RAC: 15,603
Message 40635 - Posted: 24 Nov 2019, 19:00:41 UTC

I've run ATLAS WUs from 1 to 16 CPUs. Up to 8 the return file size is always about 110 MB. At 12c it's about 250 MB.
Since this does not scale with the number of CPUs and then it doubles in size it would seem it carries a lot of dead weight.
CERN please jettison the flotsam and jetsam.
ID: 40635 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2049
Credit: 154,398,907
RAC: 144,603
Message 40637 - Posted: 24 Nov 2019, 21:01:52 UTC - in response to Message 40635.  

I've run ATLAS WUs from 1 to 16 CPUs. Up to 8 the return file size is always about 110 MB. At 12c it's about 250 MB.
Since this does not scale with the number of CPUs and then it doubles in size it would seem it carries a lot of dead weight.
CERN please jettison the flotsam and jetsam.

Might be a misinterpretation.

ATLAS sends out tasks from different batches.
Runtimes and sizes of input and output files are similar for tasks from the same batch but can be highly variable if you compare one batch with another batch.

Batches in progress can be seen here:
http://lhcathome.web.cern.ch/projects/atlas
ID: 40637 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 109
Credit: 38,284,991
RAC: 15,603
Message 40658 - Posted: 25 Nov 2019, 17:12:21 UTC

Yea, shortly after I said that I saw four 4c WUs that were 250 MB. No rhyme or reason.
Since the return files are so large the slow upload speed of ADSL connections is easily swamped.
I have to cut my ATLAS work in half if I stand any chance of clearing my upload logjam.
If CERN wants to maximize BOINC work then they should see if they can reduce the size of the largest return files.
ID: 40658 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1622
Credit: 75,821,778
RAC: 228,878
Message 40659 - Posted: 25 Nov 2019, 17:56:31 UTC - in response to Message 40658.  

If CERN wants to maximize BOINC work then they should see if they can reduce the size of the largest return files.

Atlas is optimized, you can find some News therefore in the Atlas-Folder.
Have 30 MBits upload, and Cern connect with 1.5 MBits for the Moment under Atlas.
And this is ok. 3 Min. for 240 MByte!
ID: 40659 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1622
Credit: 75,821,778
RAC: 228,878
Message 40660 - Posted: 25 Nov 2019, 19:00:09 UTC - in response to Message 40659.  

Time is over to Edit message before:
Had it controlled ATM in both for uploading:
1.5MBits in Windows, 25MBits in -native VM (Don't know what the difference could be)
ID: 40660 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 109
Credit: 38,284,991
RAC: 15,603
Message 40663 - Posted: 25 Nov 2019, 19:52:44 UTC - in response to Message 40659.  
Last modified: 25 Nov 2019, 19:54:20 UTC

If CERN wants to maximize BOINC work then they should see if they can reduce the size of the largest return files.
Atlas is optimized, you can find some News therefore in the Atlas-Folder.
I read the titles for last 2 years and found nothing relevant. Be glad to read it if I knew what you suggest I read.
Have 30 MBits upload, and Cern connect with 1.5 MBits for the Moment under Atlas.
And this is ok. 3 Min. for 240 MByte!
How many ATLAS WUs are you uploading at once??? I'm trying to UL a couple hundred from the same IP.
By "ATLAS is optimized" do you mean the file size is as small as it humanly can be and it can never get smaller, or something else???
ID: 40663 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 109
Credit: 38,284,991
RAC: 15,603
Message 40666 - Posted: 25 Nov 2019, 20:11:03 UTC

I wish I understood the impact of changing the number of CPUs in an ATLAS WU. Does an ATLAS WU run faster with more CPUs?
Since the return file size remains the same regardless of whether I have 1, 2, 3, or 16 CPUs in that WU I might as well just use 16c WUs and run only one at a time to reduce the number of files I have to upload.
Which means I'll get far too many 16c ATLAS WUs DLed and none to not enough nT and ST WUs.

Fixing Preferences so that one could specify maximum number of WUs each project would send to a computer (like WCG does) would give crunchers flexibility. ATLAS would have the added field #CPUs. E.g., if I set the limit for ATLAS to 3 WUs and I have 4 on my computer LHC@home would not send me another ATLAS WU until I got down to two WUs and then send just one.
ID: 40666 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1622
Credit: 75,821,778
RAC: 228,878
Message 40669 - Posted: 25 Nov 2019, 20:55:22 UTC - in response to Message 40663.  

By "ATLAS is optimized" do you mean the file size is as small as it humanly can be and it can never get smaller, or something else???

Yes.
You have to find your best configuration for Atlas yourself (CPU,Threads,RAM etc..) native or VM...
ID: 40669 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2049
Credit: 154,398,907
RAC: 144,603
Message 40670 - Posted: 25 Nov 2019, 21:33:11 UTC - in response to Message 40666.  

I wish I understood the impact of changing the number of CPUs in an ATLAS WU.

Each ATLAS task processes (currently) 200 events.

Phase 1
During this setup phase the app uses 1 core to extract the events from the input files and setup a global queue.
Sometimes less than 1 core while (lots of) downloads are in progress.

Phase 2
Then n worker threads are created and each worker pics an event from the queue to calculate a hit.
This continues until all events are processed.

Phase 3
Worst case at the end of phase 2 is that only 1 event is left and n-1 workers remain idle.
On average 50% of your cores will be idle during this phase.

Phase 4
Last phase (running on 1 core) is to collect the results from all workers and prepare the HITS file.


During all phases n cores remain allocated for this task from the perspective of your BOINC client and can't be used by other BOINC work.

Example:
If an 8 core host is running an 8-core ATLAS setup it will show 7 idle cores and only 1 running core during phases 1, 3 and 4.
ID: 40670 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 109
Credit: 38,284,991
RAC: 15,603
Message 40671 - Posted: 25 Nov 2019, 22:17:17 UTC

maeax, I don't think we're talking about the same dog's breakfast :-)

computezrmle, Thx! That's the behavior I've seen and now I know what's going on in the blackbox :-)
I think I'll go with:
<app_config>
<app>
    <name>ATLAS</name>
    <!-- Xeon E5-2699v4  22c44t  L3 Cache = 55 MB  -->
    <max_concurrent>2</max_concurrent>
</app>
<app_version>
    <app_name>ATLAS</app_name>
    <plan_class>native_mt</plan_class>
    <avg_ncpus>6</avg_ncpus>
    <cmdline>--nthreads 6</cmdline>
</app_version>
</app_config>
and leave No Limit/No Limit with ST & nT checked for a few days see how it shakes out.
ID: 40671 · Report as offensive     Reply Quote
Greger

Send message
Joined: 9 Jan 15
Posts: 151
Credit: 431,596,822
RAC: 0
Message 40817 - Posted: 6 Dec 2019, 17:02:14 UTC
Last modified: 6 Dec 2019, 17:30:54 UTC

I have a question about cernvm-fs. It would mount a filesystem and stored in /tmp/rootfs-xxxxx each at size of 1.3GB.
Is it reused to next work and re-use filesystem or would these filesystem map never be purged?

I ask because some host end up with disk full. Host could be full with a 250GB disk with 20-30GB to boinc and system with /tmp included it could grow up max then crash boinc-client. It could be double amount of folder then task running concurrently so ether they are not re-used properly or not purged after unmounted.
Could not found in dokumentation mention anything on these rootfs folder or in troubleshooting. Commands to reload,wipecache or even restart autofs would handle these these folder. System itself pure them only at start.

A reboot would fix it and system would purge but would this be needed?
ID: 40817 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2049
Credit: 154,398,907
RAC: 144,603
Message 40818 - Posted: 6 Dec 2019, 18:31:32 UTC - in response to Message 40817.  

Just to be sure.
Are you talking about the CVMFS client which is required for ATLAS native and Theory native?

If yes, then you may check CVMFS_CACHE_BASE and CVMFS_QUOTA_LIMIT in /etc/cvmfs/default.conf.

Default values are:
CVMFS_CACHE_BASE=/var/lib/cvmfs
CVMFS_QUOTA_LIMIT=4000

These settings create a boot persistent cache below /var/lib/cvmfs that does not use more than 4GB disk space.
ID: 40818 · Report as offensive     Reply Quote
Greger

Send message
Joined: 9 Jan 15
Posts: 151
Credit: 431,596,822
RAC: 0
Message 40819 - Posted: 6 Dec 2019, 19:58:09 UTC - in response to Message 40818.  

The cernvm-fs client yes not server one.

Cache is set on default and would show CVMFS_QUOTA_LIMIT=4000 in config and CVMFS_CACHE_BASE=/var/lib/cvmfs with 37MB in total with setup files included.
This looks fine to me and i check with
cvmfs_config showconfig
which post all lines with parameters. None of these lines mention the filesystem on that got mounted during operational.

I located /tmp on local host as this folder have high storage in use, the rootfs folder are used to atlas and issue would be that these folders increase and not get wiped after completed. My thought would be that filesystem are re-used or they failed to to get removed.

A host with 250GB running boinc only have hard limit at 100GB but normal operation with mix of project aty 20-30GB this include a few atlas running. When host run for several weeks of month without restart it looks like it would suffer on of disk full. It would result in boinc-client crash as system disk get full boinc limit is fine. Cvmfs filesystem is not included in boinc data so spin until system get full. The system does not handle this rootfs folders as might get correct info to do so.
ID: 40819 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 2049
Credit: 154,398,907
RAC: 144,603
Message 40825 - Posted: 7 Dec 2019, 9:08:25 UTC - in response to Message 40819.  

Sorry.
It looks like I don't really understand your problem.
Especially what data trashes your filesystem.
Could you post a few examples, e.g. pathnames and filenames, of those data?

Use of /tmp by CVMFS doesn't make sense if this path is not mentioned in the configuration.
Hence the question is which process writes to /tmp.
ID: 40825 · Report as offensive     Reply Quote
lazlo_vii
Avatar

Send message
Joined: 20 Nov 19
Posts: 21
Credit: 1,074,330
RAC: 0
Message 40829 - Posted: 7 Dec 2019, 11:28:57 UTC - in response to Message 40819.  

The cernvm-fs client yes not server one.

Cache is set on default and would show CVMFS_QUOTA_LIMIT=4000 in config and CVMFS_CACHE_BASE=/var/lib/cvmfs with 37MB in total with setup files included.
This looks fine to me and i check with
cvmfs_config showconfig
which post all lines with parameters. None of these lines mention the filesystem on that got mounted during operational.

I located /tmp on local host as this folder have high storage in use, the rootfs folder are used to atlas and issue would be that these folders increase and not get wiped after completed. My thought would be that filesystem are re-used or they failed to to get removed.

A host with 250GB running boinc only have hard limit at 100GB but normal operation with mix of project aty 20-30GB this include a few atlas running. When host run for several weeks of month without restart it looks like it would suffer on of disk full. It would result in boinc-client crash as system disk get full boinc limit is fine. Cvmfs filesystem is not included in boinc data so spin until system get full. The system does not handle this rootfs folders as might get correct info to do so.



If I understand what CVMFS does, it just mounts a remote filesystem locally. Like an advanced version of NFS exporting / of the host. Because it is remote filesystem it isn't writing anything in /cvmfs to your local system. If the files are written anywhere other than /cvmfs they would take up space on the local system. In that case if you are running distro that use systemd check:

man systemd-tmpfiles
man tmpfiles.d


As per https://askubuntu.com/questions/1086034/which-process-cleans-tmp-under-systemd-on-18-04lts-answered-here-no-duplicat
ID: 40829 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : ATLAS application : ATLAS native version 2.73


©2022 CERN