Message boards :
Theory Application :
Issues Native Theory application
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5
Author | Message |
---|---|
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 ![]() ![]() |
[quoteI avoid sixtrack. It is too easy, and requires no special software. Anyone can run it, so I let them.{/quote] Good point. It illustrates the lost opportunity cost concept very well... when you crunch sixtrack you lose the opportunity to crunch a task that many other volunteers cannot crunch. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 ![]() ![]() |
Good point. It illustrates the lost opportunity cost concept very well... when you crunch sixtrack you lose the opportunity to crunch a task that many other volunteers cannot crunch. Good economics. I try to optimize the total return, while having fun at the same time (that is an economic benefit also). |
Send message Joined: 14 Jan 10 Posts: 1439 Credit: 9,630,548 RAC: 2,231 ![]() ![]() ![]() |
It rarely happens, but sometimes an error between all the valids. After 1.5 hours runtime: Exit status 195 (0x000000C3) EXIT_CHILD_FAILED https://lhcathome.cern.ch/lhcathome/result.php?resultid=220279871 |
Send message Joined: 14 Jan 10 Posts: 1439 Credit: 9,630,548 RAC: 2,231 ![]() ![]() ![]() |
Another error: https://lhcathome.cern.ch/lhcathome/result.php?resultid=220489764 Exit status - 195 (0x000000C3) EXIT_CHILD_FAILED Job description: ===> [runRivet] Wed Apr 3 08:48:44 UTC 2019 [boinc pp jets 8000 25 - pythia8 8.230 tune-monashstar 100000 38] in BOINC's Event log: Wed 03 Apr 2019 12:19:44 PM CEST | LHC@home | Computation for task TheoryN_2279-789428-38_0 finished Wed 03 Apr 2019 12:19:44 PM CEST | LHC@home | Output file TheoryN_2279-789428-38_0_r937641198_result for task TheoryN_2279-789428-38_0 absent Last lines of the runRivet.log after 100000 events processed: Generator run finished successfully 100000 events processed dumping histograms... Rivet.Analysis.Handler: INFO Finalising analyses terminate called after throwing an instance of 'YODA::LowStatsError' what(): Requested variance of a distribution with only one effective entry ./runRivet.sh: line 376: 263 Aborted (core dumped) $rivetExecString (wd: /shared/tmp/tmp.jIgxbeAbd0) INFO: waiting for jobs completion timeout=49 [1] 262 Done env $origEnv $generatorExecString [3]+ 264 Running display_service $tmpd_dump "$beam $process $energy $params $generator $version $tune" & Processing histograms... input = /shared/tmp/tmp.jIgxbeAbd0/flat output = /shared ./runRivet.sh: line 850: 264 Killed display_service $tmpd_dump "$beam $process $energy $params $generator $version $tune" (wd: /shared) ERROR: following histograms should be produced according to run parameters, but missing from Rivet output: ATLAS_2015_I1393758_d01-x01-y01 ATLAS_2015_I1393758_d02-x01-y01 ATLAS_2015_I1393758_d03-x01-y01 ATLAS_2015_I1393758_d04-x01-y01 ATLAS_2015_I1393758_d05-x01-y01 ATLAS_2015_I1393758_d06-x01-y01 ATLAS_2015_I1393758_d07-x01-y01 ATLAS_2015_I1393758_d08-x01-y01 ATLAS_2015_I1393758_d09-x01-y01 ATLAS_2015_I1393758_d10-x01-y01 ATLAS_2015_I1393758_d11-x01-y01 ATLAS_2015_I1393758_d12-x01-y01 ATLAS_2016_I1419070_d01-x01-y01 ATLAS_2016_I1419070_d02-x01-y01 ATLAS_2016_I1419070_d03-x01-y01 ATLAS_2016_I1419070_d04-x01-y01 ATLAS_2016_I1419070_d05-x01-y01 ATLAS_2016_I1419070_d06-x01-y01 ATLAS_2016_I1419070_d07-x01-y01 ATLAS_2016_I1419070_d08-x01-y01 ATLAS_2016_I1419070_d09-x01-y01 ATLAS_2016_I1419070_d10-x01-y01 ATLAS_2016_I1419070_d11-x01-y01 ATLAS_2016_I1419070_d12-x01-y01 check mapping of above histograms in configuration file: configuration/rivet-histograms.map |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 ![]() ![]() |
It rarely happens, but sometimes an error between all the valids.same here. Until now, 3 out of ~100 failed with the same error as mentioned above: https://lhcathome.cern.ch/lhcathome/result.php?resultid=221596097 https://lhcathome.cern.ch/lhcathome/result.php?resultid=221500071 https://lhcathome.cern.ch/lhcathome/result.php?resultid=221484725 Any idea why that happens? |
Send message Joined: 14 Jan 10 Posts: 1439 Credit: 9,630,548 RAC: 2,231 ![]() ![]() ![]() |
Any idea why that happens?From the previous post: mismatch between run parameters and Rivet output. The project has to solve this. I'm not sure whether your errors have the same cause. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 ![]() ![]() |
FWIW, I have always wondered whether you could run VBox and Native work units on the same machine, if you used two separate BOINC instances. It turns out that you can, if you put the VBox ones in the original BOINC instance (it does not run in the second one, at least under Ubuntu 16.04.6). I am running CMS in the first BOINC instance, and Native Theory in the second, with four cores each on an i7-4790. My ultimate goal is to remove VirtualBox entirely, and run native ATLAS in the first instance. That way, if Native Theory hangs up, I can at least limit the number of cores affected (until bronco comes up with a fix). |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 ![]() ![]() |
That way, if Native Theory hangs up, I can at least limit the number of cores affected (until bronco comes up with a fix). What do you mean by "if Native Theory hangs up"? If you mean the problem where the task runs into the deadline and doesn't stop, the latest version of my watchdog handles that by aborting the task 1 hour before deadline. It would be nice if there was a way to do a graceful shutdown but native Theory doesn't have that facility. Aborting the task isn't really what most volunteers will regard as a solution but it's better than just letting the task run until the server cancels it (which the server doesn't seem to be doing ATM). The only other problem I have noticed with native Theory is tasks ending with the 195 EXIT_CHILD_FAILED error which I don't understand and don't have a way to handle, yet. The 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED problem doesn't seem to affect native Theory which is fortunate because I see no way for a watchdog running on the user's account to detect the condition. So I think my watchdog does everything it can possibly do for both native and VBox Theory. Unless somebody has any further suggestions, I believe it's ready for beta test :) |
Send message Joined: 14 Jan 10 Posts: 1439 Credit: 9,630,548 RAC: 2,231 ![]() ![]() ![]() |
The 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED problem doesn't seem to affect native Theory which is fortunate because I see no way for a watchdog running on the user's account to detect the condition.I have had a native Theory with that condition and reported that Feb 19th at the dev-project. Extracted: In BOINC Manager: Aborting task Theory_2279-790023-18_2: exceeded disk limit: 3038.16MB > 1907.35MB runRivet.log 3184721191 bytes. Job: [boinc pp jets 8000 250,-,4160 - sherpa 1.2.3 default 31000 18] |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 ![]() ![]() |
The 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED problem doesn't seem to affect native Theory which is fortunate because I see no way for a watchdog running on the user's account to detect the condition.I have had a native Theory with that condition and reported that Feb 19th at the dev-project. Hmm. Don't know if my watchdog can detect that. Or perhaps I should say I don't know how to make it detect that. For Theory VBox it's easy. The script simply recursively walks the directory rooted at the task's slot dir and sums the sizes of all the files it finds. Running the script as root (or making user a member of the boinc group) ensures the script has read permission for all pathnames encountered. It encounters < 100 files. For native Theory it's not so easy. Walking the slot folder causes thousands of no read permission exceptions which of course are trapped and handled in the script. The problem is it finds either: 1) thousands of files and the total of the file sizes is ~10 X <rsc_disk_bound> which triggers task abort 2) just a few files that never total more than 0.01 X <rsc_disk_bound> Sometimes it just hangs on certain paths as if it's waiting for a response from the OS's stat function. Sometimes the response comes, sometimes not in which case the script hangs forever. I assume the problem walking the slot dir is because native Theory runs in a runc owned by user boinc-client. Sometimes the walk recurses into directories that appear to belong to CVMFS and that seems to be where it throws exceptions or hangs. |
![]() Send message Joined: 15 Jun 08 Posts: 2572 Credit: 259,073,336 RAC: 110,042 ![]() ![]() |
Ever heard of "du"? :-) |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 ![]() ![]() |
Nope. Apparently neither did Bill Gates until he bought Sysinternals. Thanks for that :-)) |
![]() Send message Joined: 7 Feb 14 Posts: 99 Credit: 5,180,005 RAC: 0 ![]() ![]() |
Is there no way to fix this?/cvmfs/grid.cern.ch/vc/containers/runc: symbol lookup error: /cvmfs/grid.cern.ch/vc/containers/runc: undefined symbol: seccomp_version https://lhcathome.cern.ch/lhcathome/result.php?resultid=254625750 lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04.6 LTS Release: 14.04 Codename: trusty uname -r 4.4.0-142-generic ldd /cvmfs/grid.cern.ch/vc/containers/runc linux-vdso.so.1 => (0x00007fffd3974000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f726f6ea000) libseccomp.so.2 => /usr/lib/x86_64-linux-gnu/libseccomp.so.2 (0x00007f726f4ce000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f726f2ca000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f726ef01000) /lib64/ld-linux-x86-64.so.2 (0x00007f727026a000) |
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,700,897 RAC: 7,265 ![]() ![]() ![]() |
From this post you may find that you need a more recent version of libseccomp than the one that comes with Ubuntu 14.04. If you can, move to 16.04. That one works. |
![]() Send message Joined: 15 Jun 08 Posts: 2572 Credit: 259,073,336 RAC: 110,042 ![]() ![]() |
https://en.wikipedia.org/wiki/Ubuntu_version_history#1404 "Normal LTS support is set to continue until 25 April 2019" Hence you may upgrade your OS. |
![]() Send message Joined: 7 Feb 14 Posts: 99 Credit: 5,180,005 RAC: 0 ![]() ![]() |
I have Xubuntu 18.04 on another partition, but all working stuff is still on Xubuntu 14.04. LHC runs very fine here (Xubuntu 18.04). ;) Thank you guys, I'm going to try those commands when I get back there. Here is another problem on Xub. 14.04: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5169&postid=40843#40843 :( |
©2025 CERN