Message boards : Number crunching : ATLAS errors
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Bill F
Avatar

Send message
Joined: 2 Jun 07
Posts: 32
Credit: 1,583,340
RAC: 0
Message 37073 - Posted: 22 Oct 2018, 1:02:50 UTC

I have re-enabled ATLAS as a project on LHC to run some of these WU's and every one so far as failed. When I looked at one of the WU's I noticed that it had been assigned to three different systems and failed on all three ?

wuid=102285314

Does a bad batch of WU's ever get generated ?

Thanks
Bill F
In October 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


ID: 37073 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37074 - Posted: 22 Oct 2018, 3:55:40 UTC - in response to Message 37073.  
Last modified: 22 Oct 2018, 4:00:02 UTC

I've never seen a bad batch of ATLAS tasks and I crunch quite a few of them. Check out my hosts and see that ATLAS tasks are doing just fine here. Drill down through the work units I've received and note that a great many of the tasks I've received failed on 1 or more hosts before I received it and completed it successfully. So what you are seeing is nothing new.

Check out the hosts that failed your task and note that none of them have ever returned a success. They are all misconfigured. One of them doesn't even have VT-x enabled. Look around and you will find numerous hosts that are misconfigured and failing every VirtualBox task they receive.

On the other hand, it could be that the ATLAS tasks in my cache are the tail end of a good batch and that as soon as I finish those I will start getting tasks from the bad batch and that they will all fail. But I doubt it. BTW, you might notice that I run the ATLAS native app but it makes no difference... the native app gets exactly the same tasks as the VBox app.

So my guess is there is something wrong on your host rather than in the tasks. Someone more familiar with the error codes in your task's logs will be along soon and they'll maybe be able to pinpoint the problem for you but until then you might want to work through Yeti's Checklist Version 3.
ID: 37074 · Report as offensive     Reply Quote
Cphipps

Send message
Joined: 1 Aug 14
Posts: 15
Credit: 6,816,509
RAC: 5,873
Message 37107 - Posted: 28 Oct 2018, 16:45:34 UTC - in response to Message 37073.  

I have had every Atlas job fail over the past three weeks, no matter what machine I run them on. All of my systems are Windows 10, VirtualBox 5.1.30. Msg Too many total results. Task id 207844096.
ID: 37107 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37108 - Posted: 28 Oct 2018, 19:11:48 UTC - in response to Message 37107.  

Your taskID 207844096 was a 4 core task on an 8 GB RAM machine... bad idea.

The VirtualBox tasks are EXTREMELY fussy. You can't just install VirtualBox and let 'em rip. You need to line up about a dozen ducks in a nice straight row and keep them that way or VBox tasks will fail. Theory tasks are a little less demanding but ATLAS and LHCb are an absolute b**ch. The only reason your LHCb tasks are verifying is because LHCb is broken in a way that prevents LHCb tasks from revving up to full speed. If they did get to full revs they would likely be failing too if run on 4 cores on 8 GB RAM. So.... bump it down to just 1 single core ATLAS task running at a time, no Theory, no LHCb, just ATLAS. If that doesn't work then that means you need to reconfigure a bunch of other stuff too as per Yeti's checklist but worry about those bridges if and when you get to them.

If you want to run Theory and ATLAS simultaneously then you ABSOLUTELY MUST set things up to limit the number of tasks that run simultaneously. You can sort of configure that in your web prefs but it doesn't work well. The best way is to use an app_config.xml file.

LHCb? Disable it. It's fubar.
ID: 37108 · Report as offensive     Reply Quote
Cphipps

Send message
Joined: 1 Aug 14
Posts: 15
Credit: 6,816,509
RAC: 5,873
Message 37110 - Posted: 28 Oct 2018, 20:31:00 UTC - in response to Message 37108.  

I've lowered the CPU limit to 1 but that does not seem to fix anything.
ID: 37110 · Report as offensive     Reply Quote
Cphipps

Send message
Joined: 1 Aug 14
Posts: 15
Credit: 6,816,509
RAC: 5,873
Message 37111 - Posted: 28 Oct 2018, 20:35:47 UTC - in response to Message 37110.  

On my other machine, I have 4 cores but that machine does not select any LHC project work. It is a 16gb machine with 1tb of disk and 4 cores Lenovo 7072-CTO, Windows 10 Pro. I have reset the project but not work is selected.
ID: 37111 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37114 - Posted: 28 Oct 2018, 21:59:51 UTC - in response to Message 37110.  

I've lowered the CPU limit to 1 but that does not seem to fix anything.

Are you using the website prefs to adjust the limit? If so then tasks downloaded prior to lowering the CPU limit will run with the old CPU limit. Looking at your list of tasks for that host it appears that you still have tasks in your cache that are tagged to run as 4 core tasks. The only way to change CPU limit after tasks are downloaded is via an app_config.xml. You can abort those tasks because they're gonna fail anyway. New tasks will then be tagged as 1 core unless you have an app_config.xml that specifies otherwise. Remember... even if they run single core they might still fail for other reasons but at least then you will know the failure isn't due to too many cores and then you can work on getting the next duck lined up by examining the stderr output for clues as to what you need to adjust next. Yes, it's a complicated process but it's the only way. One duck at a time. Eventually you'll get all the ducks in a row and then the tasks will succeed.

So a question for you... are you using an app_config.xml or are you setting CPU limit via the website prefs? Either way is fine but there are pitfalls to each method.
ID: 37114 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37115 - Posted: 28 Oct 2018, 22:23:42 UTC - in response to Message 37111.  

On my other machine, I have 4 cores but that machine does not select any LHC project work. It is a 16gb machine with 1tb of disk and 4 cores Lenovo 7072-CTO, Windows 10 Pro. I have reset the project but not work is selected.


16 GB gives you more leeway.

If you're not getting any work then it's most likely due to one of the following reasons:

1) You are running other projects on that host and the scheduler thinks LHC tasks have received more than their share so it's fetching work for other projects until they catch up.

2) The Lenovo is associated with a different venue (default, home, school, work) than your other host (the one that does get work) and the VBox tasks are disabled for that venue.

So... determine which venue you have associated with the Lenovo and then see if the VBox apps are enabled for that venue. Caution: If this is the case then don't fix it the easy way by associating the Lenovo with the same venue as the other host. It will be easier to set things up if the 2 hosts are associated with different venues.

If it's neither of the above reasons then we'll look at other possible reasons but let's work on those 2 for now.
ID: 37115 · Report as offensive     Reply Quote
Cphipps

Send message
Joined: 1 Aug 14
Posts: 15
Credit: 6,816,509
RAC: 5,873
Message 37120 - Posted: 29 Oct 2018, 1:46:55 UTC - in response to Message 37115.  

My LHC preferences is set to home on the Lenovo 7072-CTO processor. I have 3 other computers that is set to select LHC projects. I have gotten other task to run on VBox but not any LHC task. How can I determine if Vbox apps are disabled. The Cosmology task seem to work ok.

Thanks
ID: 37120 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37122 - Posted: 29 Oct 2018, 9:16:08 UTC - in response to Message 37120.  

Find your name on the right side of this page, just below the LHC@home banner. Click your name, it takes you to one of your preferences pages. Click on the LHC@home preferences link. Then you should see Primary (default) preferences near the top with some check boxes. In the section below that you should see either a link that says something like "click to add a home venue" or, if you already clicked it at some earlier date you will see a section for "home" similar to the section for "Primary (default) preferences". Select which tasks you want for the Lenovo there. Keep it simple for now. Click "Edit preferences" and check only
    Use CPU
    Theory application


Then set "Max # of jobs for this project" to 3 and "Max # of CPUs for this project" to 1. That will allow 3 X 1 core Theory tasks to run simultaneously. That leaves 1 core free to help keep the computer responsive. If it remains responsive then you might try 4 CPUs but it's not recommended. Keep it simple for now... start out with 3.

Double check everything then click "Update preferences".

ID: 37122 · Report as offensive     Reply Quote
Cphipps

Send message
Joined: 1 Aug 14
Posts: 15
Credit: 6,816,509
RAC: 5,873
Message 37125 - Posted: 29 Oct 2018, 13:06:36 UTC - in response to Message 37122.  

Thanks, I have done this so lets see how this works.
ID: 37125 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37129 - Posted: 29 Oct 2018, 16:34:42 UTC - in response to Message 37125.  

You're welcome.
ID: 37129 · Report as offensive     Reply Quote
Cphipps

Send message
Joined: 1 Aug 14
Posts: 15
Credit: 6,816,509
RAC: 5,873
Message 37184 - Posted: 2 Nov 2018, 23:14:20 UTC - in response to Message 37129.  

Hi Bill. I did everything that was suggested and still the Lenovo 7072-cto does not select anything. The id for this machine is System-F. Do you have any other suggestions?

Thanks
ID: 37184 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37186 - Posted: 3 Nov 2018, 1:15:26 UTC - in response to Message 37184.  
Last modified: 3 Nov 2018, 1:16:11 UTC

Bill? As in Bronco Billy? I am honored :)

If that doesn't help then it could be that BOINC cannot see VirtualBox. If BOINC cannot see VirtualBox then it won't request tasks that need VirtualBox. Let's have a look at the Event Log...
1) restart BOINC client
2) click Tools -> Event Log
3) copy 'n paste the first 40 lines of the Event Log here
ID: 37186 · Report as offensive     Reply Quote
Cphipps

Send message
Joined: 1 Aug 14
Posts: 15
Credit: 6,816,509
RAC: 5,873
Message 37223 - Posted: 4 Nov 2018, 1:45:29 UTC - in response to Message 37186.  

Thanks Bill. Here is a copy of the BOINC startup log.
11/3/2018 1:23:25 AM | | Starting BOINC client version 7.12.1 for windows_x86_64
11/3/2018 1:23:25 AM | | log flags: file_xfer, sched_ops, task
11/3/2018 1:23:25 AM | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
11/3/2018 1:23:25 AM | | Data directory: C:\ProgramData\BOINC
11/3/2018 1:23:25 AM | | Running under account Cphipps
11/3/2018 1:23:27 AM | | CAL: ATI GPU 0: ATI Radeon HD 5400/R5 210 series (Cedar) (CAL version 1.4.1848, 512MB, 479MB available, 208 GFLOPS peak)
11/3/2018 1:23:27 AM | | OpenCL: AMD/ATI GPU 0: ATI Radeon HD 5400/R5 210 series (Cedar) (driver version 1800.11 (VM), device version OpenCL 1.2 AMD-APP (1800.11), 512MB, 479MB available, 208 GFLOPS peak)
11/3/2018 1:23:27 AM | | Host name: System-F
11/3/2018 1:23:27 AM | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz [Family 6 Model 42 Stepping 7]
11/3/2018 1:23:27 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm avx vmx smx tm2 pbe
11/3/2018 1:23:27 AM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.17763.00)
11/3/2018 1:23:27 AM | | Memory: 15.96 GB physical, 16.21 GB virtual
11/3/2018 1:23:27 AM | | Disk: 917.89 GB total, 682.15 GB free
11/3/2018 1:23:27 AM | | Local time is UTC -7 hours
11/3/2018 1:23:27 AM | | No WSL found.
11/3/2018 1:23:27 AM | | VirtualBox version: 5.1.30
11/3/2018 1:23:27 AM | Cosmology@Home | URL http://www.cosmologyathome.org/; Computer ID 239354; resource share 100
11/3/2018 1:23:27 AM | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 11501533; resource share 100
11/3/2018 1:23:27 AM | LHC@home | URL https://lhcathome.cern.ch/lhcathome/; Computer ID 10331395; resource share 100
11/3/2018 1:23:27 AM | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 584666; resource share 100
11/3/2018 1:23:27 AM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 7350937; resource share 100
11/3/2018 1:23:27 AM | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 3287431; resource share 100
11/3/2018 1:23:27 AM | yoyo@home | URL http://www.rechenkraft.net/yoyo/; Computer ID 133869; resource share 100
11/3/2018 1:23:27 AM | LHC@home | General prefs: from LHC@home (last modified 02-Nov-2018 16:02:36)
11/3/2018 1:23:27 AM | LHC@home | Computer location: home
11/3/2018 1:23:27 AM | | General prefs: using separate prefs for home
11/3/2018 1:23:27 AM | | Reading preferences override file
11/3/2018 1:23:27 AM | | Preferences:
11/3/2018 1:23:27 AM | | max memory usage when active: 13073.85 MB
11/3/2018 1:23:27 AM | | max memory usage when idle: 9805.39 MB
11/3/2018 1:23:27 AM | | max disk usage: 683.42 GB
11/3/2018 1:23:27 AM | | (to change preferences, visit a project web site or select Preferences in the Manager)
11/3/2018 1:23:27 AM | | Setting up project and slot directories
11/3/2018 1:23:27 AM | | Checking active tasks
11/3/2018 1:23:27 AM | | Using account manager BOINCstatsBAM!
11/3/2018 1:23:27 AM | | Setting up GUI RPC socket
11/3/2018 1:23:27 AM | | Checking presence of 582 project files
11/3/2018 1:23:28 AM | | Contacting account manager at http://bam.boincstats.com/
11/3/2018 1:23:40 AM | LHC@home | Sending scheduler request: Requested by project.
11/3/2018 1:23:40 AM | LHC@home | Requesting new tasks for CPU and AMD/ATI GPU
11/3/2018 1:23:45 AM | | Account manager: BAM! User: 169778, lcdrxxx
11/3/2018 1:23:45 AM | | Account manager: BAM! Host: 524429
11/3/2018 1:23:45 AM | | Account manager: Number of BAM! connections for this host: 1618
11/3/2018 1:23:45 AM | | Account manager contact succeeded
11/3/2018 1:23:45 AM | LHC@home | Scheduler request completed: got 0 new tasks
11/3/2018 1:23:45 AM | LHC@home | No tasks sent
11/3/2018 1:23:45 AM | LHC@home | No tasks are available for SixTrack
11/3/2018 1:23:45 AM | LHC@home | No tasks are available for sixtracktest
11/3/2018 1:23:45 AM | LHC@home | No tasks are available for LHCb Simulation
11/3/2018 1:23:45 AM | LHC@home | No tasks are available for Theory Simulation
11/3/2018 1:23:45 AM | LHC@home | No tasks are available for ATLAS Simulation
11/3/2018 1:23:50 AM | yoyo@home | Sending scheduler request: To report completed tasks.
11/3/2018 1:23:50 AM | yoyo@home | Reporting 1 completed tasks
11/3/2018 1:23:50 AM | yoyo@home | Not requesting tasks: don't need (CPU: not highest priority project; AMD/ATI GPU: not highest priority project)
11/3/2018 1:23:54 AM | yoyo@home | Scheduler request completed
11/3/2018 1:39:21 AM | yoyo@home | Computation for task ogr_181103000504_3_0 finished
11/3/2018 1:39:22 AM | LHC@home | Sending scheduler request: To fetch work.
11/3/2018 1:39:22 AM | LHC@home | Requesting new tasks for CPU and AMD/ATI GPU
11/3/2018 1:39:23 AM | yoyo@home | Started upload of ogr_181103000504_3_0_0
11/3/2018 1:39:23 AM | yoyo@home | Started upload of ogr_181103000504_3_0_1
11/3/2018 1:39:25 AM | yoyo@home | Finished upload of ogr_181103000504_3_0_0
11/3/2018 1:39:25 AM | LHC@home | Scheduler request completed: got 0 new tasks
11/3/2018 1:39:25 AM | LHC@home | No tasks sent
11/3/2018 1:39:25 AM | LHC@home | No tasks are available for SixTrack
11/3/2018 1:39:25 AM | LHC@home | No tasks are available for sixtracktest
11/3/2018 1:39:25 AM | LHC@home | No tasks are available for LHCb Simulation
11/3/2018 1:39:25 AM | LHC@home | No tasks are available for Theory Simulation
11/3/2018 1:39:25 AM | LHC@home | No tasks are available for ATLAS Simulation
ID: 37223 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 37224 - Posted: 4 Nov 2018, 8:26:48 UTC

Windows 10 needs at least VirtualBox 5.2 or higher.
You may download version 5.2.20 here https://www.virtualbox.org/wiki/Downloads
ID: 37224 · Report as offensive     Reply Quote
Cphipps

Send message
Joined: 1 Aug 14
Posts: 15
Credit: 6,816,509
RAC: 5,873
Message 37233 - Posted: 4 Nov 2018, 18:16:52 UTC - in response to Message 37224.  

Have updated VirtualBox. Still same results
11/4/2018 10:11:53 AM | | Starting BOINC client version 7.12.1 for windows_x86_64
11/4/2018 10:11:53 AM | | log flags: file_xfer, sched_ops, task
11/4/2018 10:11:53 AM | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
11/4/2018 10:11:53 AM | | Data directory: C:\ProgramData\BOINC
11/4/2018 10:11:53 AM | | Running under account Cphipps
11/4/2018 10:12:00 AM | | CAL: ATI GPU 0: ATI Radeon HD 5400/R5 210 series (Cedar) (CAL version 1.4.1848, 512MB, 479MB available, 208 GFLOPS peak)
11/4/2018 10:12:00 AM | | OpenCL: AMD/ATI GPU 0: ATI Radeon HD 5400/R5 210 series (Cedar) (driver version 1800.11 (VM), device version OpenCL 1.2 AMD-APP (1800.11), 512MB, 479MB available, 208 GFLOPS peak)
11/4/2018 10:12:00 AM | | Host name: System-F
11/4/2018 10:12:00 AM | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz [Family 6 Model 42 Stepping 7]
11/4/2018 10:12:00 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm avx vmx smx tm2 pbe
11/4/2018 10:12:00 AM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.17763.00)
11/4/2018 10:12:00 AM | | Memory: 15.96 GB physical, 16.21 GB virtual
11/4/2018 10:12:00 AM | | Disk: 917.89 GB total, 684.97 GB free
11/4/2018 10:12:00 AM | | Local time is UTC -8 hours
11/4/2018 10:12:00 AM | | No WSL found.
11/4/2018 10:12:00 AM | | VirtualBox version: 5.2.20
11/4/2018 10:12:01 AM | Cosmology@Home | URL http://www.cosmologyathome.org/; Computer ID 239354; resource share 100
11/4/2018 10:12:01 AM | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 11501533; resource share 100
11/4/2018 10:12:01 AM | LHC@home | URL https://lhcathome.cern.ch/lhcathome/; Computer ID 10331395; resource share 100
11/4/2018 10:12:01 AM | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 584666; resource share 100
11/4/2018 10:12:01 AM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 7350937; resource share 100
11/4/2018 10:12:01 AM | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 3287431; resource share 100
11/4/2018 10:12:01 AM | yoyo@home | URL http://www.rechenkraft.net/yoyo/; Computer ID 133869; resource share 100
11/4/2018 10:12:01 AM | LHC@home | General prefs: from LHC@home (last modified 02-Nov-2018 16:02:36)
11/4/2018 10:12:01 AM | LHC@home | Computer location: home
11/4/2018 10:12:01 AM | | General prefs: using separate prefs for home
11/4/2018 10:12:01 AM | | Reading preferences override file
11/4/2018 10:12:01 AM | | Preferences:
11/4/2018 10:12:01 AM | | max memory usage when active: 13073.85 MB
11/4/2018 10:12:01 AM | | max memory usage when idle: 9805.39 MB
11/4/2018 10:12:01 AM | | max disk usage: 686.36 GB
11/4/2018 10:12:01 AM | | (to change preferences, visit a project web site or select Preferences in the Manager)
11/4/2018 10:12:01 AM | | Setting up project and slot directories
11/4/2018 10:12:01 AM | | Checking active tasks
11/4/2018 10:12:01 AM | | Using account manager BOINCstatsBAM!
11/4/2018 10:12:01 AM | | Setting up GUI RPC socket
11/4/2018 10:12:01 AM | | Checking presence of 601 project files
11/4/2018 10:12:14 AM | LHC@home | Sending scheduler request: Requested by project.
11/4/2018 10:12:14 AM | LHC@home | Requesting new tasks for CPU
11/4/2018 10:12:19 AM | LHC@home | Scheduler request completed: got 0 new tasks
11/4/2018 10:12:19 AM | LHC@home | No tasks sent
11/4/2018 10:12:19 AM | LHC@home | No tasks are available for SixTrack
11/4/2018 10:12:19 AM | LHC@home | No tasks are available for sixtracktest
11/4/2018 10:12:19 AM | LHC@home | No tasks are available for LHCb Simulation
11/4/2018 10:12:19 AM | LHC@home | No tasks are available for Theory Simulation
11/4/2018 10:12:19 AM | LHC@home | No tasks are available for ATLAS Simulation
ID: 37233 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37234 - Posted: 4 Nov 2018, 18:49:06 UTC - in response to Message 37233.  

Please post a copy of file C:\ProgramData\BOINC\global_prefs_override.xml
ID: 37234 · Report as offensive     Reply Quote
Cphipps

Send message
Joined: 1 Aug 14
Posts: 15
Credit: 6,816,509
RAC: 5,873
Message 37235 - Posted: 4 Nov 2018, 20:45:31 UTC - in response to Message 37234.  

Thanks. Here is a copy of the file requested.
<?xml version="1.0"?>

-<global_preferences>

<run_on_batteries>1</run_on_batteries>

<run_if_user_active>1</run_if_user_active>

<run_gpu_if_user_active>1</run_gpu_if_user_active>

<suspend_cpu_usage>0.000000</suspend_cpu_usage>

<start_hour>0.000000</start_hour>

<end_hour>0.000000</end_hour>

<net_start_hour>0.000000</net_start_hour>

<net_end_hour>0.000000</net_end_hour>

<leave_apps_in_memory>0</leave_apps_in_memory>

<confirm_before_connecting>1</confirm_before_connecting>

<hangup_if_dialed>1</hangup_if_dialed>

<dont_verify_images>0</dont_verify_images>

<work_buf_min_days>0.000000</work_buf_min_days>

<work_buf_additional_days>0.250000</work_buf_additional_days>

<max_ncpus_pct>100.000000</max_ncpus_pct>

<cpu_scheduling_period_minutes>15.000000</cpu_scheduling_period_minutes>

<disk_interval>180.000000</disk_interval>

<disk_max_used_gb>0.000000</disk_max_used_gb>

<disk_max_used_pct>90.000000</disk_max_used_pct>

<disk_min_free_gb>0.000000</disk_min_free_gb>

<vm_max_used_pct>75.000000</vm_max_used_pct>

<ram_max_used_busy_pct>80.000000</ram_max_used_busy_pct>

<ram_max_used_idle_pct>60.000000</ram_max_used_idle_pct>

<max_bytes_sec_up>0.000000</max_bytes_sec_up>

<max_bytes_sec_down>0.000000</max_bytes_sec_down>

<cpu_usage_limit>100.000000</cpu_usage_limit>

<daily_xfer_limit_mb>0.000000</daily_xfer_limit_mb>

<daily_xfer_period_days>0</daily_xfer_period_days>

</global_preferences>
ID: 37235 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37236 - Posted: 4 Nov 2018, 21:37:39 UTC - in response to Message 37235.  

I don't see any reason for not receiving tasks. It's requesting tasks but not receiving :-(
It won't fix the problem but I recommend changing
<vm_max_used_pct>75.000000</vm_max_used_pct>
to
<vm_max_used_pct>100.000000</vm_max_used_pct>
to reduce chance of "missing heartbeat" error.

If the machine is too sluggish for you to use then reduce the number of CPUs allowed with
<max_ncpus_pct>99.000000</max_ncpus_pct>
instead of throttling the VM
ID: 37236 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : ATLAS errors


©2024 CERN