New Native Theory Version 1.1

Author	Message
m Send message Joined: 6 Sep 08 Posts: 116 Credit: 11,097,403 RAC: 5,330	Message 39491 - Posted: 3 Aug 2019, 19:14:20 UTC Not sure if I'm making progress or not... found and installed libseccomp 2.4.1 (the regular way from a lxc PPA) that produced this 18:17:01 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Updating config.json. 18:17:02 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Running Container 'runc'. container_linux.go:336: starting container process caused "process_linux.go:293: applying cgroup configuration for process caused \"mountpoint for devices not found\"" 18:17:02 BST +01:00 2019-08-03: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1. 18:17:02 (3846): cranky exited; CPU time 0.005645 18:17:02 (3846): app exit status: 0xce 18:17:02 (3846): called boinc_finish(195) 18:17:03 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] found this https://stackoverflow.com/questions/22555264/docker-hello-world-not-working/22555932#22555932 and this https://github.com/opencontainers/runc/issues/798 so inatalled cgroups-lite 1.11 with this result 18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Checking runc. 18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Creating the filesystem. 18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3 18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Updating config.json. 18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Running Container 'runc'. 18:34:48 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] ===> [runRivet] Sat Aug 3 17:34:48 UTC 2019 [boinc pp mb-inelastic 7000 - - phojet 1.12a default 100000 84] 18:34:53 BST +01:00 2019-08-03: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1. 18:34:54 (4451): cranky exited; CPU time 0.044983 18:34:54 (4451): app exit status: 0xce 18:34:54 (4451): called boinc_finish(195) all subsequent tasks end like this 18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Checking runc. 18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Creating the filesystem. 18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3 18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Updating config.json. 18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Running Container 'runc'. standard_init_linux.go:203: exec user process caused "too many levels of symbolic links" 18:45:44 BST +01:00 2019-08-03: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1. 18:45:44 (4733): cranky exited; CPU time 0.014341 18:45:44 (4733): app exit status: 0xce 18:45:44 (4733): called boinc_finish(195) 18:45:44 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] I haven't explicitly added any symlinks. Now I'm stuck. ID: 39491 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2099 Credit: 159,810,289 RAC: 143,647	Message 39514 - Posted: 7 Aug 2019, 19:44:17 UTC The third run will need a lot of time to finishing this task: boinc pp jets 13000 250,-,4160 - sherpa 2.2.0 default 2000 72] First Computer canceled after 4 days, the second after 26 days: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=117408514 ID: 39514 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2099 Credit: 159,810,289 RAC: 143,647	Message 39536 - Posted: 9 Aug 2019, 6:09:13 UTC https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=120079734 05:14:23 CEST +02:00 2019-08-09: cranky-0.0.29: [INFO] ===> [runRivet] Fri Aug 9 03:14:22 UTC 2019 [boinc pp jets 8000 25 - pythia6 6.423 pro-q2o 100000 86] 06:07:08 CEST +02:00 2019-08-09: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1. Both are running in a Status Code 1. ID: 39536 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2099 Credit: 159,810,289 RAC: 143,647	Message 39656 - Posted: 19 Aug 2019, 21:38:10 UTC - in response to Message 39536. https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=122039899 [boinc pp jets 8000 25 - pythia8 8.180 default 100000 90] 16:07:04 UTC +00:00 2019-08-19: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1. https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=121978893 ID: 39656 · Reply Quote

djoser Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0	Message 39749 - Posted: 29 Aug 2019, 19:15:47 UTC Hi everyone! Theory Native is single core only, right? Thanks!! Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us ID: 39749 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,488,820 RAC: 1,860	Message 39750 - Posted: 29 Aug 2019, 19:31:47 UTC - in response to Message 39749. Theory Native is single core only, right? Yes it is, but when you have idle time left it will use some extra cpu time to speed up the job. Example: https://lhcathome.cern.ch/lhcathome/result.php?resultid=244323423 ID: 39750 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 39751 - Posted: 29 Aug 2019, 19:33:08 UTC - in response to Message 39749. Last modified: 29 Aug 2019, 19:34:07 UTC Theory Native is single core only, right? It used to use two cores, but seems to be one now. ID: 39751 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2411 Credit: 226,309,041 RAC: 131,847	Message 39752 - Posted: 29 Aug 2019, 19:37:55 UTC - in response to Message 39749. Last modified: 29 Aug 2019, 19:41:26 UTC Theory Native is single core only, right? From the perspective of your BOINC client: YES But! A typical Theory native pstree looks like this: cranky-0.0.29───runc─┬─job───runRivet.sh─┬─rivetvm.exe │ ├─runRivet.sh───sleep │ ├─rungen.sh───pythia8.exe │ └─sleep └─8*[{runc}] In this example the 2 main processes are rivetvm.exe and pythia8.exe. Their total CPU share is usually greater than 1. ID: 39752 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 39753 - Posted: 29 Aug 2019, 19:47:55 UTC - in response to Message 39752. Last modified: 29 Aug 2019, 19:49:36 UTC In this example the 2 main processes are rivetvm.exe and pythia8.exe. Their total CPU share is usually greater than 1. I have noticed that on native ATLAS even more so. You can set whatever you want in an app_config., but it will use all your cores somehow. It can make it difficult to run with other projects by the way. I think it is best to devote a machine to it. ID: 39753 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2411 Credit: 226,309,041 RAC: 131,847	Message 39754 - Posted: 29 Aug 2019, 20:00:52 UTC - in response to Message 39753. IIRC ATLAS native requires the nthreads option to be set, e.g. <avg_ncpus>2.0</avg_ncpus> <cmdline>--nthreads 2</cmdline> Vbox apps do not need "nthreads" if <avg_ncpus> is already set. ID: 39754 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 39755 - Posted: 29 Aug 2019, 21:07:05 UTC - in response to Message 39754. Last modified: 29 Aug 2019, 21:08:32 UTC IIRC ATLAS native requires the nthreads option to be set, e.g. <avg_ncpus>2.0</avg_ncpus> <cmdline>--nthreads 2</cmdline> Vbox apps do not need "nthreads" if <avg_ncpus> is already set. The last time I did native ATLAS before putting in an app_config, it just used all the cores available. That was eight cores (as shown by BOINC), which maybe is all that ATLAS will use anyway? ID: 39755 · Reply Quote

Henry Nebrensky Send message Joined: 13 Jul 05 Posts: 167 Credit: 14,945,019 RAC: 564	Message 39762 - Posted: 30 Aug 2019, 8:56:45 UTC - in response to Message 39755. The last time I did native ATLAS before putting in an app_config, it just used all the cores available. But did you actually apply any limit? I've never seen Atlas-native disobey the "Max # CPUs" set through the website (which I believe translates into said --nthreads, for those of us that don't mess with XML), and yes I've tried at least values of 1, 2, 4 and "No limit"... IIRC the Atlas limit is 12 cores, but that's from a while back. Even at 8 cores there's noticeable inefficiency from the single-threaded start-up/shutdown phases. ID: 39762 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 39769 - Posted: 30 Aug 2019, 23:41:19 UTC - in response to Message 39762. Last modified: 30 Aug 2019, 23:43:10 UTC I've never seen Atlas-native disobey the "Max # CPUs" set through the website (which I believe translates into said --nthreads, for those of us that don't mess with XML), and yes I've tried at least values of 1, 2, 4 and "No limit".... BOINC will show however many cores that you set in the app_config. But run a "top" command, and you will see that all the cores are being used regardless of what you have set (at least up to 8, the maximum I tried). (I normally just set the "Max # CPUs" to either 8 or unlimited, so I don't know how a lesser value would affect it; I use the app_config to limit it further.) ID: 39769 · Reply Quote

Henry Nebrensky Send message Joined: 13 Jul 05 Posts: 167 Credit: 14,945,019 RAC: 564	Message 39786 - Posted: 1 Sep 2019, 17:42:04 UTC - in response to Message 39769. But run a "top" command, and you will see that all the cores are being used regardless of what you have set (at least up to 8, the maximum I tried). I think I misunderstood what you were trying to say (and then expressed myself badly). Long ago, in Sixtrack-only days, "Max # CPUs" meant roughly "Max. total cores to be used for BOINCing, leaving the rest free for the user's day job." VBox and Atlas-native tasks instead interpret it to mean "Cores/task", so you're right that that setting can't be used to control overall load on the machine any more. I presently still have "Max # CPUs" set to 4, to get the 8-core machine to refill with Atlas efficiently after running Sixtracks; top gives 7587 boinc 39 19 2618m 1.8g 11m R 99.8 11.4 263:33.42 athena.py 7589 boinc 39 19 2602m 1.8g 11m R 99.5 11.5 263:42.48 athena.py 32249 boinc 39 19 2597m 1.8g 44m R 99.1 11.3 1:15.13 athena.py 7586 boinc 39 19 2601m 1.8g 12m R 98.8 11.4 263:40.89 athena.py 7588 boinc 39 19 2602m 1.8g 11m R 97.8 11.4 263:19.11 athena.py 32250 boinc 39 19 2597m 1.8g 44m R 96.5 11.3 1:15.65 athena.py 32248 boinc 39 19 2597m 1.8g 44m R 95.2 11.3 1:16.12 athena.py 32251 boinc 39 19 2597m 1.8g 44m R 95.2 11.3 1:14.79 athena.py i.e. there are two 4-core tasks, one just started and one nearly 5 hours in. (A couple of weeks back there'd have been one 4-core task and some Sixtracks). So, indeed I've never seen Atlas-native (or indeed VBox) disobey the "Max # CPUs" set through the website, with the proviso that "Max # CPUs" there actually means "Cores/task", rather than what it used to/should do. The next step would be to then limit the number of tasks running at any one time, which is probably best done through an app_config; I can't remember if I've ever tried setting "Max # jobs" to just one, and anyway Atlas-native has, er, idiosyncratic ideas about how many tasks it queues at the client for strange beancounting purposes. (I read your The last time I did native ATLAS before putting in an app_config, it just used all the cores available. as implying that an Atlas-native task would grab any extra unused cores it sees at runtime irrespective of --nthreads, however that's been set, which I don't believe is true.) ID: 39786 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,635,251 RAC: 15,978	Message 39787 - Posted: 1 Sep 2019, 18:00:01 UTC - in response to Message 39786. Last modified: 1 Sep 2019, 18:01:04 UTC So, indeed I've never seen Atlas-native (or indeed VBox) disobey the "Max # CPUs" set through the website, with the proviso that "Max # CPUs" there actually means "Cores/task", rather than what it used to/should do. This "Max # CPUs" is in the project settings. The one you are probably confusing it with is still in the BOINC settings and is called "Use at most XX % of the CPUs". ID: 39787 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2411 Credit: 226,309,041 RAC: 131,847	Message 39788 - Posted: 1 Sep 2019, 18:27:51 UTC - in response to Message 39786. Could the ATLAS discussion please continued in the ATLAS thread? Long ago, in Sixtrack-only days, "Max # CPUs" meant roughly "Max. total cores to be used for BOINCing, leaving the rest free for the user's day job." The option "Max # CPUs" was never thought to limit the total cores for BOINC since BOINC has an option for this that can be used in app_config.xml Instead "Max # CPUs" has been introduced to set exactly what it is used for today. Unfortunately ATLAS uses "Max # CPUs" to also limit the #tasks that can be downloaded - a request from (unknown) accountants as David Cameron explained a long while ago. This results in client buffers that are only partly filled, especially if CPUs with lots of cores are used. ID: 39788 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 39789 - Posted: 1 Sep 2019, 23:09:47 UTC - in response to Message 39786. (I read your The last time I did native ATLAS before putting in an app_config, it just used all the cores available. as implying that an Atlas-native task would grab any extra unused cores it sees at runtime irrespective of --nthreads, however that's been set, which I don't believe is true.) I have not used -nthreads for a while, and don't recall how it behaves, so you are probably correct. I just use <avg_ncpus>, which uses all the (virtual) cores; i.e., threads on native ATLAS. I don't think it does that on native Theory, but I am not running it at the moment. ID: 39789 · Reply Quote

Ray Murray Volunteer moderator Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,859,285 RAC: 0	Message 39807 - Posted: 2 Sep 2019, 20:27:29 UTC A couple of months ago I got fed up with a series of Blue-screen-of-death loops on an old 2-core Athlon not being happy with a Windows update so I completely reformatted it and put on Linux Mint instead. After a few failed attempts, I have got it successfully running Theory_Native. I'm not entirely convinced that it's doing exactly as it should as, during setup, it didn't seem to like cvmfs_config and autofs, although "probe" returned OKs, and some finished tasks look too quick to have done much (unless this Native is waaay faster than on VBox/Windows). However it is returning M^cPlots so it must be working OK. It's running an ordinary VBox Theory where "Show VM Console" and "Show Graphics" let me see various outputs. My Native task doesn't have those buttons although I can see it is currently running a Herwig through "top" in a terminal. Can I get to those remote-desktop live outputs (events processed, etc) without being too Linuxy? (I'm not a convert yet. I still prefer point-and-click to Terminal.) ID: 39807 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2411 Credit: 226,309,041 RAC: 131,847	Message 39808 - Posted: 2 Sep 2019, 21:29:46 UTC - in response to Message 39807. Theory native doesn't have a point-and-click interface to your BOINC client. To monitor the progress of your running task like in console 2 of a vbox task you may do the following: Open a console window (either direct at the linux host or remote from another computer). In this console window run the command: tail -Fn100 /path_to_your_boinc_client/slots/x/cernvm/shared/runRivet.log x must be replaced by the slotnumber of your running task. ID: 39808 · Reply Quote

Ray Murray Volunteer moderator Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,859,285 RAC: 0	Message 39811 - Posted: 3 Sep 2019, 12:27:38 UTC - in response to Message 39808. OK, thanks, I thought it might be fiddly 😵 I don't think I'll be setting up the suspend/resume stuff either. Well, not until I have the time to get it wrong a few times. ID: 39811 · Reply Quote

LHC@home