Message boards :
Theory Application :
Very short tasks
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Jan 10 Posts: 1446 Credit: 9,708,961 RAC: 766 ![]() ![]() |
In your example result it's the science running inside the VM fails with error exit code 1. That sometimes happens. But In your tasks list I see a lot of Errors "EXIT_ABORTED_BY_CLIENT " with the message Process still present 5 min after writing finish file; aborting This often indicates an overloaded system. |
Send message Joined: 14 Jan 10 Posts: 1446 Credit: 9,708,961 RAC: 766 ![]() ![]() |
ALTAS and SixTrack tasks run without any problems. Is there anything I can do to 'relieve' the system to run Theory tasks? Maybe assign less CPU cores to the LHC project? You could reduce the number of CPUs in use by BOINC to 87.5% (2 cores free). I suppose you also have a task running on your GPU. Another way is to use an app_config.xml to configure the number of tasks in use by LHC. This file should be placed in the lhc project folder and ReRead by BOINC Manager's option: Read config files. Example app_config.xml <app_config> <project_max_concurrent>8</project_max_concurrent> <app> <name>ATLAS</name> <max_concurrent>2</max_concurrent> <fraction_done_exact/> </app> <app> <name>CMS</name> <max_concurrent>1</max_concurrent> <fraction_done_exact/> </app> <app> <name>sixtrack</name> <max_concurrent>16</max_concurrent> <fraction_done_exact/> </app> <app> <name>sixtracktest</name> <max_concurrent>16</max_concurrent> <fraction_done_exact/> </app> <app> <name>Theory</name> <max_concurrent>6</max_concurrent> </app> <app_version> <app_name>ATLAS</app_name> <plan_class>vbox64_mt_mcore_atlas</plan_class> <avg_ncpus>4.000000</avg_ncpus> <cmdline>--memory_size_mb 6600</cmdline> </app_version> <app_version> <app_name>CMS</app_name> <plan_class>vbox64</plan_class> <avg_ncpus>1.000000</avg_ncpus> <cmdline>--memory_size_mb 2048 --nthreads 1</cmdline> </app_version> <app_version> <app_name>Theory</app_name> <plan_class>vbox64_theory</plan_class> <avg_ncpus>1.000000</avg_ncpus> <cmdline>--memory_size_mb 730 --nthreads 1</cmdline> </app_version> </app_config> |
Send message Joined: 26 Nov 10 Posts: 11 Credit: 1,435,923 RAC: 0 |
Hi Christoph, The detailed job log indicate the job failure is due to a network connectivity problem - the machine is not able to download some of files from CVMFS network file system for job execution. I am not sure what is the core reason for this, but it also could be weak network connection or firewall configuration. FYI: - the performance statistics for this macine: http://mcplots-dev.cern.ch/production.php?view=user&system=3&userid=608791#10646229 - relevant part of the detailed log (IO error): $ cat pool/failed/2390/2390-1140084-3.tgz.log ===> [runRivet] Wed May 13 19:25:52 UTC 2020 [boinc pp mb-nsd 2360 - - pythia6 6.428 392 100000 3] ... Building rivetvm ... make: Entering directory `/shared/rivetvm' ... /cvmfs/sft.cern.ch/lcg/releases/fjcontrib/1.041-66c72/x86_64-slc6-gcc8-opt/lib/libfastjetcontribfragile.so: file not recognized: Input/output error collect2: error: ld returned 1 exit status make: *** [rivetvm.exe] Error 1 make: Leaving directory `/shared/rivetvm' ERROR: fail to compile rivetvm |
Send message Joined: 14 Jan 10 Posts: 1446 Credit: 9,708,961 RAC: 766 ![]() ![]() |
Great Anton, that you're looking in detail to it. Strange is that during VM startup "Probing /cvmfs/sft.cern.ch" is OK |
![]() ![]() Send message Joined: 24 Oct 04 Posts: 1195 Credit: 61,944,750 RAC: 82,676 ![]() ![]() |
https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project Christoph you can have a certain one of your pc's put in a separate group (work,home,school) and then set it to not get Theory tasks. As far as your connection if you want to take a quick look you can watch as a new task starts running via your VM Console and you will see it happen in the first 3 minutes of running where it tries to make it to the *runRivet* It will just end up like this if you have a slow connection. ![]() Instead of what you want like this ![]() VB tasks here usually need a d/l - u/l speed of 1.5Mbps or better or they will fail most of the time. |
![]() Send message Joined: 15 Jun 08 Posts: 2628 Credit: 267,218,631 RAC: 128,928 ![]() ![]() |
Some ideas to be checked. It might be unlikely but if the Theory vdi file is corrupt for some reason the BOINC client should be shut down whenever work allows to do so. Then remove the file. It will be downloaded automatically when you restart BOINC. Is the computer connected via wi-fi? -> not recommended as lots of data has to be transferred regularly. What about the connection to your ISP? Download bandwidth? Upload bandwidth? Typical Latency? It also might be helpful if you make your computers visible for others: https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project This avoids going via mcplots. |
Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0 ![]() ![]() |
Value in <max_concurrent>0</max_concurrent> need to be 1 or higher. 0 is not accepted as a value and it would ignore that line. |
![]() ![]() Send message Joined: 24 Oct 04 Posts: 1195 Credit: 61,944,750 RAC: 82,676 ![]() ![]() |
I put the machine in a separate group and deactivated Theory application for this group. You're welcome Christoph Yes you will find that a slow internet connection can be the biggest problem with VB tasks and that can happen with all different versions of isp especially if they throttle down your speed after you use a certain amount of data according to your contract ( I have used DSL and now Satellite doing this for over 9 years with VB tasks) In my case it is when they throttle down my speed after my monthly total is used and since I run VB tasks mine is used in the first 6 hours so I just have to watch closely and do *speed tests* and watch my Windows 10 Task Manager to see just how fast I am running before I try to start any........and even when I have full speed (up to 30Mbps) I have to be careful starting lots of these VB tasks and do a few at a time. Another thing you can watch in that VM Console is have that box up right after you start a new task so you can catch any of the typical FAIL warnings ( there is one you can just ignore) During page 2 of the VM Console you will see a timer running where it gives you 1min 40 secs to get that part finished before it goes to page 3 and checks the CVMFS (file system) and if it doesn't do that before the 1min 40 secs the next page will be where you see that it failed. Another problem is these tasks can fail to start and get to runRivet but just keep on running for hours and these will just end up as Invalid/computer error tasks....so when you catch them when they start you can abort them and not waste hours of your time. Here is a snapshot of that part ![]() I have this goofy *Bonus time* on my isp account between 2am and 8am so I had to get up early just to post this and start up some of the VB tasks I had waiting to run later after 8am and it is now 6:50am PDT where I am. Good luck |
©2025 CERN