Message boards :
Number crunching :
ATLAS errors
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 Jun 07 Posts: 33 Credit: 1,583,519 RAC: 16 |
I have re-enabled ATLAS as a project on LHC to run some of these WU's and every one so far as failed. When I looked at one of the WU's I noticed that it had been assigned to three different systems and failed on all three ? wuid=102285314 Does a bad batch of WU's ever get generated ? Thanks Bill F In October 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic; There was no expiration date. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
I've never seen a bad batch of ATLAS tasks and I crunch quite a few of them. Check out my hosts and see that ATLAS tasks are doing just fine here. Drill down through the work units I've received and note that a great many of the tasks I've received failed on 1 or more hosts before I received it and completed it successfully. So what you are seeing is nothing new. Check out the hosts that failed your task and note that none of them have ever returned a success. They are all misconfigured. One of them doesn't even have VT-x enabled. Look around and you will find numerous hosts that are misconfigured and failing every VirtualBox task they receive. On the other hand, it could be that the ATLAS tasks in my cache are the tail end of a good batch and that as soon as I finish those I will start getting tasks from the bad batch and that they will all fail. But I doubt it. BTW, you might notice that I run the ATLAS native app but it makes no difference... the native app gets exactly the same tasks as the VBox app. So my guess is there is something wrong on your host rather than in the tasks. Someone more familiar with the error codes in your task's logs will be along soon and they'll maybe be able to pinpoint the problem for you but until then you might want to work through Yeti's Checklist Version 3. |
Send message Joined: 1 Aug 14 Posts: 15 Credit: 6,966,828 RAC: 4,501 |
I have had every Atlas job fail over the past three weeks, no matter what machine I run them on. All of my systems are Windows 10, VirtualBox 5.1.30. Msg Too many total results. Task id 207844096. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
Your taskID 207844096 was a 4 core task on an 8 GB RAM machine... bad idea. The VirtualBox tasks are EXTREMELY fussy. You can't just install VirtualBox and let 'em rip. You need to line up about a dozen ducks in a nice straight row and keep them that way or VBox tasks will fail. Theory tasks are a little less demanding but ATLAS and LHCb are an absolute b**ch. The only reason your LHCb tasks are verifying is because LHCb is broken in a way that prevents LHCb tasks from revving up to full speed. If they did get to full revs they would likely be failing too if run on 4 cores on 8 GB RAM. So.... bump it down to just 1 single core ATLAS task running at a time, no Theory, no LHCb, just ATLAS. If that doesn't work then that means you need to reconfigure a bunch of other stuff too as per Yeti's checklist but worry about those bridges if and when you get to them. If you want to run Theory and ATLAS simultaneously then you ABSOLUTELY MUST set things up to limit the number of tasks that run simultaneously. You can sort of configure that in your web prefs but it doesn't work well. The best way is to use an app_config.xml file. LHCb? Disable it. It's fubar. |
Send message Joined: 1 Aug 14 Posts: 15 Credit: 6,966,828 RAC: 4,501 |
I've lowered the CPU limit to 1 but that does not seem to fix anything. |
Send message Joined: 1 Aug 14 Posts: 15 Credit: 6,966,828 RAC: 4,501 |
On my other machine, I have 4 cores but that machine does not select any LHC project work. It is a 16gb machine with 1tb of disk and 4 cores Lenovo 7072-CTO, Windows 10 Pro. I have reset the project but not work is selected. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
I've lowered the CPU limit to 1 but that does not seem to fix anything. Are you using the website prefs to adjust the limit? If so then tasks downloaded prior to lowering the CPU limit will run with the old CPU limit. Looking at your list of tasks for that host it appears that you still have tasks in your cache that are tagged to run as 4 core tasks. The only way to change CPU limit after tasks are downloaded is via an app_config.xml. You can abort those tasks because they're gonna fail anyway. New tasks will then be tagged as 1 core unless you have an app_config.xml that specifies otherwise. Remember... even if they run single core they might still fail for other reasons but at least then you will know the failure isn't due to too many cores and then you can work on getting the next duck lined up by examining the stderr output for clues as to what you need to adjust next. Yes, it's a complicated process but it's the only way. One duck at a time. Eventually you'll get all the ducks in a row and then the tasks will succeed. So a question for you... are you using an app_config.xml or are you setting CPU limit via the website prefs? Either way is fine but there are pitfalls to each method. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
On my other machine, I have 4 cores but that machine does not select any LHC project work. It is a 16gb machine with 1tb of disk and 4 cores Lenovo 7072-CTO, Windows 10 Pro. I have reset the project but not work is selected. 16 GB gives you more leeway. If you're not getting any work then it's most likely due to one of the following reasons: 1) You are running other projects on that host and the scheduler thinks LHC tasks have received more than their share so it's fetching work for other projects until they catch up. 2) The Lenovo is associated with a different venue (default, home, school, work) than your other host (the one that does get work) and the VBox tasks are disabled for that venue. So... determine which venue you have associated with the Lenovo and then see if the VBox apps are enabled for that venue. Caution: If this is the case then don't fix it the easy way by associating the Lenovo with the same venue as the other host. It will be easier to set things up if the 2 hosts are associated with different venues. If it's neither of the above reasons then we'll look at other possible reasons but let's work on those 2 for now. |
Send message Joined: 1 Aug 14 Posts: 15 Credit: 6,966,828 RAC: 4,501 |
My LHC preferences is set to home on the Lenovo 7072-CTO processor. I have 3 other computers that is set to select LHC projects. I have gotten other task to run on VBox but not any LHC task. How can I determine if Vbox apps are disabled. The Cosmology task seem to work ok. Thanks |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
Find your name on the right side of this page, just below the LHC@home banner. Click your name, it takes you to one of your preferences pages. Click on the LHC@home preferences link. Then you should see Primary (default) preferences near the top with some check boxes. In the section below that you should see either a link that says something like "click to add a home venue" or, if you already clicked it at some earlier date you will see a section for "home" similar to the section for "Primary (default) preferences". Select which tasks you want for the Lenovo there. Keep it simple for now. Click "Edit preferences" and check only
Theory application
|
Send message Joined: 1 Aug 14 Posts: 15 Credit: 6,966,828 RAC: 4,501 |
Thanks, I have done this so lets see how this works. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
You're welcome. |
Send message Joined: 1 Aug 14 Posts: 15 Credit: 6,966,828 RAC: 4,501 |
Hi Bill. I did everything that was suggested and still the Lenovo 7072-cto does not select anything. The id for this machine is System-F. Do you have any other suggestions? Thanks |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
Bill? As in Bronco Billy? I am honored :) If that doesn't help then it could be that BOINC cannot see VirtualBox. If BOINC cannot see VirtualBox then it won't request tasks that need VirtualBox. Let's have a look at the Event Log... 1) restart BOINC client 2) click Tools -> Event Log 3) copy 'n paste the first 40 lines of the Event Log here |
Send message Joined: 1 Aug 14 Posts: 15 Credit: 6,966,828 RAC: 4,501 |
Thanks Bill. Here is a copy of the BOINC startup log. 11/3/2018 1:23:25 AM | | Starting BOINC client version 7.12.1 for windows_x86_64 11/3/2018 1:23:25 AM | | log flags: file_xfer, sched_ops, task 11/3/2018 1:23:25 AM | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8 11/3/2018 1:23:25 AM | | Data directory: C:\ProgramData\BOINC 11/3/2018 1:23:25 AM | | Running under account Cphipps 11/3/2018 1:23:27 AM | | CAL: ATI GPU 0: ATI Radeon HD 5400/R5 210 series (Cedar) (CAL version 1.4.1848, 512MB, 479MB available, 208 GFLOPS peak) 11/3/2018 1:23:27 AM | | OpenCL: AMD/ATI GPU 0: ATI Radeon HD 5400/R5 210 series (Cedar) (driver version 1800.11 (VM), device version OpenCL 1.2 AMD-APP (1800.11), 512MB, 479MB available, 208 GFLOPS peak) 11/3/2018 1:23:27 AM | | Host name: System-F 11/3/2018 1:23:27 AM | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz [Family 6 Model 42 Stepping 7] 11/3/2018 1:23:27 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm avx vmx smx tm2 pbe 11/3/2018 1:23:27 AM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.17763.00) 11/3/2018 1:23:27 AM | | Memory: 15.96 GB physical, 16.21 GB virtual 11/3/2018 1:23:27 AM | | Disk: 917.89 GB total, 682.15 GB free 11/3/2018 1:23:27 AM | | Local time is UTC -7 hours 11/3/2018 1:23:27 AM | | No WSL found. 11/3/2018 1:23:27 AM | | VirtualBox version: 5.1.30 11/3/2018 1:23:27 AM | Cosmology@Home | URL http://www.cosmologyathome.org/; Computer ID 239354; resource share 100 11/3/2018 1:23:27 AM | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 11501533; resource share 100 11/3/2018 1:23:27 AM | LHC@home | URL https://lhcathome.cern.ch/lhcathome/; Computer ID 10331395; resource share 100 11/3/2018 1:23:27 AM | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 584666; resource share 100 11/3/2018 1:23:27 AM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 7350937; resource share 100 11/3/2018 1:23:27 AM | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 3287431; resource share 100 11/3/2018 1:23:27 AM | yoyo@home | URL http://www.rechenkraft.net/yoyo/; Computer ID 133869; resource share 100 11/3/2018 1:23:27 AM | LHC@home | General prefs: from LHC@home (last modified 02-Nov-2018 16:02:36) 11/3/2018 1:23:27 AM | LHC@home | Computer location: home 11/3/2018 1:23:27 AM | | General prefs: using separate prefs for home 11/3/2018 1:23:27 AM | | Reading preferences override file 11/3/2018 1:23:27 AM | | Preferences: 11/3/2018 1:23:27 AM | | max memory usage when active: 13073.85 MB 11/3/2018 1:23:27 AM | | max memory usage when idle: 9805.39 MB 11/3/2018 1:23:27 AM | | max disk usage: 683.42 GB 11/3/2018 1:23:27 AM | | (to change preferences, visit a project web site or select Preferences in the Manager) 11/3/2018 1:23:27 AM | | Setting up project and slot directories 11/3/2018 1:23:27 AM | | Checking active tasks 11/3/2018 1:23:27 AM | | Using account manager BOINCstatsBAM! 11/3/2018 1:23:27 AM | | Setting up GUI RPC socket 11/3/2018 1:23:27 AM | | Checking presence of 582 project files 11/3/2018 1:23:28 AM | | Contacting account manager at http://bam.boincstats.com/ 11/3/2018 1:23:40 AM | LHC@home | Sending scheduler request: Requested by project. 11/3/2018 1:23:40 AM | LHC@home | Requesting new tasks for CPU and AMD/ATI GPU 11/3/2018 1:23:45 AM | | Account manager: BAM! User: 169778, lcdrxxx 11/3/2018 1:23:45 AM | | Account manager: BAM! Host: 524429 11/3/2018 1:23:45 AM | | Account manager: Number of BAM! connections for this host: 1618 11/3/2018 1:23:45 AM | | Account manager contact succeeded 11/3/2018 1:23:45 AM | LHC@home | Scheduler request completed: got 0 new tasks 11/3/2018 1:23:45 AM | LHC@home | No tasks sent 11/3/2018 1:23:45 AM | LHC@home | No tasks are available for SixTrack 11/3/2018 1:23:45 AM | LHC@home | No tasks are available for sixtracktest 11/3/2018 1:23:45 AM | LHC@home | No tasks are available for LHCb Simulation 11/3/2018 1:23:45 AM | LHC@home | No tasks are available for Theory Simulation 11/3/2018 1:23:45 AM | LHC@home | No tasks are available for ATLAS Simulation 11/3/2018 1:23:50 AM | yoyo@home | Sending scheduler request: To report completed tasks. 11/3/2018 1:23:50 AM | yoyo@home | Reporting 1 completed tasks 11/3/2018 1:23:50 AM | yoyo@home | Not requesting tasks: don't need (CPU: not highest priority project; AMD/ATI GPU: not highest priority project) 11/3/2018 1:23:54 AM | yoyo@home | Scheduler request completed 11/3/2018 1:39:21 AM | yoyo@home | Computation for task ogr_181103000504_3_0 finished 11/3/2018 1:39:22 AM | LHC@home | Sending scheduler request: To fetch work. 11/3/2018 1:39:22 AM | LHC@home | Requesting new tasks for CPU and AMD/ATI GPU 11/3/2018 1:39:23 AM | yoyo@home | Started upload of ogr_181103000504_3_0_0 11/3/2018 1:39:23 AM | yoyo@home | Started upload of ogr_181103000504_3_0_1 11/3/2018 1:39:25 AM | yoyo@home | Finished upload of ogr_181103000504_3_0_0 11/3/2018 1:39:25 AM | LHC@home | Scheduler request completed: got 0 new tasks 11/3/2018 1:39:25 AM | LHC@home | No tasks sent 11/3/2018 1:39:25 AM | LHC@home | No tasks are available for SixTrack 11/3/2018 1:39:25 AM | LHC@home | No tasks are available for sixtracktest 11/3/2018 1:39:25 AM | LHC@home | No tasks are available for LHCb Simulation 11/3/2018 1:39:25 AM | LHC@home | No tasks are available for Theory Simulation 11/3/2018 1:39:25 AM | LHC@home | No tasks are available for ATLAS Simulation |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,487,777 RAC: 1,773 |
Windows 10 needs at least VirtualBox 5.2 or higher. You may download version 5.2.20 here https://www.virtualbox.org/wiki/Downloads |
Send message Joined: 1 Aug 14 Posts: 15 Credit: 6,966,828 RAC: 4,501 |
Have updated VirtualBox. Still same results 11/4/2018 10:11:53 AM | | Starting BOINC client version 7.12.1 for windows_x86_64 11/4/2018 10:11:53 AM | | log flags: file_xfer, sched_ops, task 11/4/2018 10:11:53 AM | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8 11/4/2018 10:11:53 AM | | Data directory: C:\ProgramData\BOINC 11/4/2018 10:11:53 AM | | Running under account Cphipps 11/4/2018 10:12:00 AM | | CAL: ATI GPU 0: ATI Radeon HD 5400/R5 210 series (Cedar) (CAL version 1.4.1848, 512MB, 479MB available, 208 GFLOPS peak) 11/4/2018 10:12:00 AM | | OpenCL: AMD/ATI GPU 0: ATI Radeon HD 5400/R5 210 series (Cedar) (driver version 1800.11 (VM), device version OpenCL 1.2 AMD-APP (1800.11), 512MB, 479MB available, 208 GFLOPS peak) 11/4/2018 10:12:00 AM | | Host name: System-F 11/4/2018 10:12:00 AM | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz [Family 6 Model 42 Stepping 7] 11/4/2018 10:12:00 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm avx vmx smx tm2 pbe 11/4/2018 10:12:00 AM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.17763.00) 11/4/2018 10:12:00 AM | | Memory: 15.96 GB physical, 16.21 GB virtual 11/4/2018 10:12:00 AM | | Disk: 917.89 GB total, 684.97 GB free 11/4/2018 10:12:00 AM | | Local time is UTC -8 hours 11/4/2018 10:12:00 AM | | No WSL found. 11/4/2018 10:12:00 AM | | VirtualBox version: 5.2.20 11/4/2018 10:12:01 AM | Cosmology@Home | URL http://www.cosmologyathome.org/; Computer ID 239354; resource share 100 11/4/2018 10:12:01 AM | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 11501533; resource share 100 11/4/2018 10:12:01 AM | LHC@home | URL https://lhcathome.cern.ch/lhcathome/; Computer ID 10331395; resource share 100 11/4/2018 10:12:01 AM | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 584666; resource share 100 11/4/2018 10:12:01 AM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 7350937; resource share 100 11/4/2018 10:12:01 AM | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 3287431; resource share 100 11/4/2018 10:12:01 AM | yoyo@home | URL http://www.rechenkraft.net/yoyo/; Computer ID 133869; resource share 100 11/4/2018 10:12:01 AM | LHC@home | General prefs: from LHC@home (last modified 02-Nov-2018 16:02:36) 11/4/2018 10:12:01 AM | LHC@home | Computer location: home 11/4/2018 10:12:01 AM | | General prefs: using separate prefs for home 11/4/2018 10:12:01 AM | | Reading preferences override file 11/4/2018 10:12:01 AM | | Preferences: 11/4/2018 10:12:01 AM | | max memory usage when active: 13073.85 MB 11/4/2018 10:12:01 AM | | max memory usage when idle: 9805.39 MB 11/4/2018 10:12:01 AM | | max disk usage: 686.36 GB 11/4/2018 10:12:01 AM | | (to change preferences, visit a project web site or select Preferences in the Manager) 11/4/2018 10:12:01 AM | | Setting up project and slot directories 11/4/2018 10:12:01 AM | | Checking active tasks 11/4/2018 10:12:01 AM | | Using account manager BOINCstatsBAM! 11/4/2018 10:12:01 AM | | Setting up GUI RPC socket 11/4/2018 10:12:01 AM | | Checking presence of 601 project files 11/4/2018 10:12:14 AM | LHC@home | Sending scheduler request: Requested by project. 11/4/2018 10:12:14 AM | LHC@home | Requesting new tasks for CPU 11/4/2018 10:12:19 AM | LHC@home | Scheduler request completed: got 0 new tasks 11/4/2018 10:12:19 AM | LHC@home | No tasks sent 11/4/2018 10:12:19 AM | LHC@home | No tasks are available for SixTrack 11/4/2018 10:12:19 AM | LHC@home | No tasks are available for sixtracktest 11/4/2018 10:12:19 AM | LHC@home | No tasks are available for LHCb Simulation 11/4/2018 10:12:19 AM | LHC@home | No tasks are available for Theory Simulation 11/4/2018 10:12:19 AM | LHC@home | No tasks are available for ATLAS Simulation |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
Please post a copy of file C:\ProgramData\BOINC\global_prefs_override.xml |
Send message Joined: 1 Aug 14 Posts: 15 Credit: 6,966,828 RAC: 4,501 |
Thanks. Here is a copy of the file requested. <?xml version="1.0"?> -<global_preferences> <run_on_batteries>1</run_on_batteries> <run_if_user_active>1</run_if_user_active> <run_gpu_if_user_active>1</run_gpu_if_user_active> <suspend_cpu_usage>0.000000</suspend_cpu_usage> <start_hour>0.000000</start_hour> <end_hour>0.000000</end_hour> <net_start_hour>0.000000</net_start_hour> <net_end_hour>0.000000</net_end_hour> <leave_apps_in_memory>0</leave_apps_in_memory> <confirm_before_connecting>1</confirm_before_connecting> <hangup_if_dialed>1</hangup_if_dialed> <dont_verify_images>0</dont_verify_images> <work_buf_min_days>0.000000</work_buf_min_days> <work_buf_additional_days>0.250000</work_buf_additional_days> <max_ncpus_pct>100.000000</max_ncpus_pct> <cpu_scheduling_period_minutes>15.000000</cpu_scheduling_period_minutes> <disk_interval>180.000000</disk_interval> <disk_max_used_gb>0.000000</disk_max_used_gb> <disk_max_used_pct>90.000000</disk_max_used_pct> <disk_min_free_gb>0.000000</disk_min_free_gb> <vm_max_used_pct>75.000000</vm_max_used_pct> <ram_max_used_busy_pct>80.000000</ram_max_used_busy_pct> <ram_max_used_idle_pct>60.000000</ram_max_used_idle_pct> <max_bytes_sec_up>0.000000</max_bytes_sec_up> <max_bytes_sec_down>0.000000</max_bytes_sec_down> <cpu_usage_limit>100.000000</cpu_usage_limit> <daily_xfer_limit_mb>0.000000</daily_xfer_limit_mb> <daily_xfer_period_days>0</daily_xfer_period_days> </global_preferences> |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
I don't see any reason for not receiving tasks. It's requesting tasks but not receiving :-( It won't fix the problem but I recommend changing <vm_max_used_pct>75.000000</vm_max_used_pct> to <vm_max_used_pct>100.000000</vm_max_used_pct> to reduce chance of "missing heartbeat" error. If the machine is too sluggish for you to use then reduce the number of CPUs allowed with <max_ncpus_pct>99.000000</max_ncpus_pct> instead of throttling the VM |
©2024 CERN