Thread 'ATLAS vbox and native 3.01'

Author	Message
maeax Send message Joined: 2 May 07 Posts: 2278 Credit: 178,775,457 RAC: 7	Message 47958 - Posted: 1 Apr 2023, 7:26:55 UTC - in response to Message 47956. Toggleton, seeing the same. 238 MByte instead of 1.08 and 1.09 GByte. ID: 47958 · Reply Quote

BellyNitpicker Send message Joined: 16 Jun 20 Posts: 8 Credit: 2,318,092 RAC: 0	Message 47959 - Posted: 1 Apr 2023, 21:22:15 UTC - in response to Message 47957. With Version 3.01 something went wrong. maybe more RAM needed. Yes, I know my OS is Darwin. I've allocated 12.8GB RAM. That's all you can have, the rest is allocated to other usage. Do you need more?? ID: 47959 · Reply Quote

BellyNitpicker Send message Joined: 16 Jun 20 Posts: 8 Credit: 2,318,092 RAC: 0	Message 47960 - Posted: 1 Apr 2023, 21:34:37 UTC - in response to Message 47955. I don't think the ATLAS task like to be started and stopped. I would pick a single project application to run and stick to only that for now. I didn't switch the tasks. That was the BOINC manager. After an earlier runaway, I: suspended the project; set no new tasks; aborted all existing tasks; tasks; unsuspended; reset the project; allowed new tasks. All tasks downloaded (8) were ATLAS of the same type. After >7mins of processing, the active task was made pending and a new one started. My task switching was set to "every 60 mins". I've set it to 9 hours. Have also set my preference only for ATLAS simulation FTB, and max of 2 jobs. But only when I've run down the Ubuntu VM I started in place of Darwin BOINC, as the VM has a few days of work on it at the moment. ID: 47960 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 86,570	Message 47961 - Posted: 2 Apr 2023, 6:14:14 UTC - in response to Message 47960. BellyNitpicker wrote: However, my Ubuntu VMs run without problem How many of them do you run concurrently and how much RAM do they allocate in total? BellyNitpicker wrote: I: suspended the project; set no new tasks; aborted all existing tasks; tasks; unsuspended; reset the project; allowed new tasks. This is part of the problem since all logs are gone which may have included useful hints. So far the server shows successful ATLAS v2.03 logs but not a single one from v3.01. Suggestion: Run and report a Theory vbox task to see whether that succeeds - Theory requires much less ressources. Once Theory works fine, run a single ATLAS vbox task and let it send the logs back to the server. ID: 47961 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1491 Credit: 9,984,157 RAC: 1,022	Message 47963 - Posted: 3 Apr 2023, 9:27:16 UTC @David Cameron: The memory usage is not like expected. In ATLAS_vbox_3.01_job.xml the memory usage is set to 4000MB, but at the moment the old method for memory usage for VMs is still used: 3000MB + 900MB/core like 3900, 4800, 5700, 6600 . . . . 10200 for 1, 2 ,3, 4 .... 8 (unlimited) cores. ID: 47963 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 793 Credit: 63,380,780 RAC: 23,805	Message 47964 - Posted: 3 Apr 2023, 13:24:05 UTC - in response to Message 47963. @David Cameron: The memory usage is not like expected. In ATLAS_vbox_3.01_job.xml the memory usage is set to 4000MB, but at the moment the old method for memory usage for VMs is still used: 3000MB + 900MB/core like 3900, 4800, 5700, 6600 . . . . 10200 for 1, 2 ,3, 4 .... 8 (unlimited) cores. In slot directory there is a init_data.xml file where you can find line: <rsc_memory_bound>6920601600.000000</rsc_memory_bound> This is for my 4 core Atlas task. ID: 47964 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1491 Credit: 9,984,157 RAC: 1,022	Message 47965 - Posted: 3 Apr 2023, 16:54:09 UTC - in response to Message 47964. In slot directory there is a init_data.xml file where you can find line: <rsc_memory_bound>6920601600.000000</rsc_memory_bound> This is for my 4 core Atlas task. Yeah, that's the old formula 3900MB + 4 * 900 MB = 6600MB = 6920601600 bytes On the development system we tested the new application with much lower memory and it was working fine even with 3000MB RAM, but here for safety it is set to 4000MB. However this setting in the xml-file is not respected on the production system. Until this is fixed you could lower the memory usage yourself by using an app_config.xml in the lhcathome.cern.ch_lhcathome-project directory. Example: <app_config> <app_version> <app_name>ATLAS</app_name> <plan_class>vbox64_mt_mcore_atlas</plan_class> <cmdline>--memory_size_mb 4000 --nthreads 4</cmdline> </app_version> </app_config> ID: 47965 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2278 Credit: 178,775,457 RAC: 7	Message 47967 - Posted: 4 Apr 2023, 10:01:53 UTC https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=209638954 [2023-04-04 09:52:51] 2023-04-04 07:51:40,779 \| INFO \| job summary report [2023-04-04 09:52:51] 2023-04-04 07:51:40,779 \| INFO \| -------------------------------------------------- [2023-04-04 09:52:51] 2023-04-04 07:51:40,779 \| INFO \| PanDA job id: 5807945461 [2023-04-04 09:52:51] 2023-04-04 07:51:40,779 \| INFO \| task id: 32930087 [2023-04-04 09:52:51] 2023-04-04 07:51:40,779 \| INFO \| error 1/1: 1305: Failed to execute payload:PyJobTransforms.transform.execute 2023-04-04 09:47:55,790 CRITICAL Transform executor raised TransformValidationExceptio [2023-04-04 09:52:51] 2023-04-04 07:51:40,779 \| INFO \| status: LOG_TRANSFER = DONE [2023-04-04 09:52:51] 2023-04-04 07:51:40,779 \| INFO \| pilot state: failed [2023-04-04 09:52:51] 2023-04-04 07:51:40,780 \| INFO \| transexitcode: 65 [2023-04-04 09:52:51] 2023-04-04 07:51:40,780 \| INFO \| exeerrorcode: 65 [2023-04-04 09:52:51] 2023-04-04 07:51:40,780 \| INFO \| exeerrordiag: Non-zero return code from EVNTtoHITS (1) [2023-04-04 09:52:51] 2023-04-04 07:51:40,780 \| INFO \| exitcode: 65 ID: 47967 · Reply Quote

David Cameron Project administrator Project developer Project scientist Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0	Message 47977 - Posted: 6 Apr 2023, 7:36:10 UTC - in response to Message 47965. In slot directory there is a init_data.xml file where you can find line: 6920601600.000000 This is for my 4 core Atlas task. Yeah, that's the old formula 3900MB + 4 * 900 MB = 6600MB = 6920601600 bytes On the development system we tested the new application with much lower memory and it was working fine even with 3000MB RAM, but here for safety it is set to 4000MB. However this setting in the xml-file is not respected on the production system. Until this is fixed you could lower the memory usage yourself by using an app_config.xml in the lhcathome.cern.ch_lhcathome-project directory. Example: <app_config> <app_version> <app_name>ATLAS</app_name> <plan_class>vbox64_mt_mcore_atlas</plan_class> <cmdline>--memory_size_mb 4000 --nthreads 4</cmdline> </app_version> </app_config> We are still running the old Run 2 tasks here so we need the memory to scale up with number of cores. Once we exclusively run Run 3 tasks we can remove the memory scaling and use a fixed value of 4GB. ID: 47977 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1491 Credit: 9,984,157 RAC: 1,022	Message 47978 - Posted: 6 Apr 2023, 8:58:33 UTC - in response to Message 47977. We are still running the old Run 2 tasks here so we need the memory to scale up with number of cores. Once we exclusively run Run 3 tasks we can remove the memory scaling and use a fixed value of 4GB. In my opinion it does not matter whether it's a Run 2 or Run 3 task. In both cases the new ATLAS_vbox_3.01_image.vdi is used. I tested several setup's with 2, 3, 4, 5 and 8 threads with 'only' 4096 MB of RAM and all tasks were successful with almost no swapping. 8 threads: https://lhcathome.cern.ch/lhcathome/result.php?resultid=391160006 https://lhcathome.cern.ch/lhcathome/result.php?resultid=391160097 https://lhcathome.cern.ch/lhcathome/result.php?resultid=391459825 https://lhcathome.cern.ch/lhcathome/result.php?resultid=391459378 ID: 47978 · Reply Quote

BellyNitpicker Send message Joined: 16 Jun 20 Posts: 8 Credit: 2,318,092 RAC: 0	Message 47986 - Posted: 9 Apr 2023, 7:57:02 UTC - in response to Message 47961. This is part of the problem since all logs are gone which may have included useful hints. So far the server shows successful ATLAS v2.03 logs but not a single one from v3.01. Here is a log. I reduced the CPUs available to three, but left memory at 13GB. Two previously lost tasks downloaded, this time for 3CPUs. Both advertised their estimated duration as 4h20m. I ran one to conclusion, its actual duration was 35h15m. Remember that the 5CPU tasks also ran on for a very long time beyond their estimated time. NB. Prior to the events of the last week, I have been running Ubuntu VMs alongside BOINC on Darwin (with LHC as its only project) for more than two years without conflict or problem (Except for the short-term VBox memory leak in an earlier version of MacOS/Darwin). The three VMs have a total memory allocation of 12GB, so there's 7GB (give or take) left over for system activity. Thu 6 Apr 16:10:02 2023 \| \| cc_config.xml not found - using defaults Thu 6 Apr 16:10:02 2023 \| \| Starting BOINC client version 7.20.4 for x86_64-apple-darwin Thu 6 Apr 16:10:02 2023 \| \| log flags: file_xfer, sched_ops, task Thu 6 Apr 16:10:02 2023 \| \| Libraries: libcurl/7.79.1 SecureTransport zlib/1.2.11 c-ares/1.17.2 Thu 6 Apr 16:10:02 2023 \| \| Data directory: /Library/Application Support/BOINC Data Thu 6 Apr 16:10:02 2023 \| \| OpenCL: Intel GPU 0: Intel(R) UHD Graphics 630 (driver version 1.2(Jan 10 2023 21:29:09), device version OpenCL 1.2, 1536MB, 1536MB available, 230 GFLOPS peak) Thu 6 Apr 16:10:02 2023 \| \| OpenCL CPU: Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2) Thu 6 Apr 16:10:02 2023 \| \| Host name: NandLBTScience Thu 6 Apr 16:10:02 2023 \| \| Processor: 12 GenuineIntel Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz [x86 Family 6 Model 158 Stepping 10] Thu 6 Apr 16:10:02 2023 \| \| Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clfsh ds acpi mmx fxsr sse sse2 ss htt tm pbe pni pclmulqdq dtes64 mon dscpl vmx smx est tm2 ssse3 fma cx16 tpr pdcm sse4_1 sse4_2 x2apic movbe popcnt aes pcid xsave osxsave seglim64 tsctmr avx rdrand f16c Thu 6 Apr 16:10:02 2023 \| \| OS: Mac OS X 12.6.3 (Darwin 21.6.0) Thu 6 Apr 16:10:02 2023 \| \| Memory: 32.00 GB physical, 209.57 GB virtual Thu 6 Apr 16:10:02 2023 \| \| Disk: 465.63 GB total, 209.57 GB free Thu 6 Apr 16:10:02 2023 \| \| Local time is UTC +1 hours Thu 6 Apr 16:10:02 2023 \| \| VirtualBox version: 7.0.6r155176 Thu 6 Apr 16:10:02 2023 \| \| General prefs: from https://climateprediction.net/ (last modified 23-Dec-2021 14:43:38) Thu 6 Apr 16:10:02 2023 \| \| Computer location: home Thu 6 Apr 16:10:02 2023 \| \| General prefs: no separate prefs for home; using your defaults Thu 6 Apr 16:10:02 2023 \| \| Reading preferences override file Thu 6 Apr 16:10:02 2023 \| \| Preferences: Thu 6 Apr 16:10:02 2023 \| \| max memory usage when active: 13107.20 MB Thu 6 Apr 16:10:02 2023 \| \| max memory usage when idle: 13107.20 MB Thu 6 Apr 16:10:02 2023 \| \| max disk usage: 40.00 GB Thu 6 Apr 16:10:02 2023 \| \| max CPUs used: 5 Thu 6 Apr 16:10:02 2023 \| \| (to change preferences, visit a project web site or select Preferences in the Manager) Thu 6 Apr 16:10:02 2023 \| \| Setting up project and slot directories Thu 6 Apr 16:10:02 2023 \| \| Checking active tasks Thu 6 Apr 16:10:02 2023 \| LHC@home \| URL https://lhcathome.cern.ch/lhcathome/; Computer ID 10667017; resource share 10 Thu 6 Apr 16:10:02 2023 \| \| Setting up GUI RPC socket Thu 6 Apr 16:10:02 2023 \| \| Checking presence of 0 project files Thu 6 Apr 16:10:22 2023 \| \| General prefs: from https://climateprediction.net/ (last modified 23-Dec-2021 14:43:38) Thu 6 Apr 16:10:22 2023 \| \| Computer location: home Thu 6 Apr 16:10:22 2023 \| \| General prefs: no separate prefs for home; using your defaults Thu 6 Apr 16:10:22 2023 \| \| Reading preferences override file Thu 6 Apr 16:10:22 2023 \| \| Preferences: Thu 6 Apr 16:10:22 2023 \| \| max memory usage when active: 13107.20 MB Thu 6 Apr 16:10:22 2023 \| \| max memory usage when idle: 13107.20 MB Thu 6 Apr 16:10:22 2023 \| \| max disk usage: 40.00 GB Thu 6 Apr 16:10:22 2023 \| \| Number of usable CPUs has changed from 5 to 3. Thu 6 Apr 16:10:22 2023 \| \| max CPUs used: 3 Thu 6 Apr 16:10:22 2023 \| \| (to change preferences, visit a project web site or select Preferences in the Manager) Thu 6 Apr 16:10:38 2023 \| LHC@home \| update requested by user Thu 6 Apr 16:10:43 2023 \| LHC@home \| project resumed by user Thu 6 Apr 16:10:44 2023 \| LHC@home \| Master file download succeeded Thu 6 Apr 16:10:49 2023 \| LHC@home \| Sending scheduler request: Requested by user. Thu 6 Apr 16:10:49 2023 \| LHC@home \| Not requesting tasks: "no new tasks" requested via Manager Thu 6 Apr 16:10:50 2023 \| LHC@home \| Scheduler request completed Thu 6 Apr 16:10:50 2023 \| LHC@home \| Project requested delay of 6 seconds Thu 6 Apr 16:10:51 2023 \| LHC@home \| work fetch resumed by user Thu 6 Apr 16:11:00 2023 \| LHC@home \| Sending scheduler request: To fetch work. Thu 6 Apr 16:11:00 2023 \| LHC@home \| Requesting new tasks for CPU Thu 6 Apr 16:11:01 2023 \| LHC@home \| Scheduler request completed: got 2 new tasks Thu 6 Apr 16:11:01 2023 \| LHC@home \| Resent lost task FzWLDmF0o12nsSi4apGgGQJmABFKDmABFKDmjv4TDmAmnKDm8Wupsn_0 Thu 6 Apr 16:11:01 2023 \| LHC@home \| Resent lost task SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0 Thu 6 Apr 16:11:01 2023 \| LHC@home \| Project requested delay of 6 seconds Thu 6 Apr 16:11:03 2023 \| LHC@home \| Started download of vboxwrapper_26206_x86_64-apple-darwin Thu 6 Apr 16:11:03 2023 \| LHC@home \| Started download of ATLAS_vbox_3.01_job.xml Thu 6 Apr 16:11:04 2023 \| LHC@home \| Finished download of vboxwrapper_26206_x86_64-apple-darwin Thu 6 Apr 16:11:04 2023 \| LHC@home \| Finished download of ATLAS_vbox_3.01_job.xml Thu 6 Apr 16:11:04 2023 \| LHC@home \| Started download of ATLAS_vbox_3.01_image.vdi Thu 6 Apr 16:11:04 2023 \| LHC@home \| Started download of FzWLDmF0o12nsSi4apGgGQJmABFKDmABFKDmjv4TDmAmnKDm8Wupsn_EVNT.32794571._000076.pool.root.1 Thu 6 Apr 16:11:11 2023 \| LHC@home \| Sending scheduler request: To fetch work. Thu 6 Apr 16:11:11 2023 \| LHC@home \| Requesting new tasks for CPU Thu 6 Apr 16:11:12 2023 \| LHC@home \| Scheduler request completed: got 0 new tasks Thu 6 Apr 16:11:12 2023 \| LHC@home \| No tasks sent Thu 6 Apr 16:11:12 2023 \| LHC@home \| No tasks are available for ATLAS Simulation Thu 6 Apr 16:11:12 2023 \| LHC@home \| This computer has reached a limit on tasks in progress Thu 6 Apr 16:11:12 2023 \| LHC@home \| Project requested delay of 6 seconds Thu 6 Apr 16:13:13 2023 \| LHC@home \| work fetch suspended by user Thu 6 Apr 16:14:30 2023 \| LHC@home \| Finished download of FzWLDmF0o12nsSi4apGgGQJmABFKDmABFKDmjv4TDmAmnKDm8Wupsn_EVNT.32794571._000076.pool.root.1 Thu 6 Apr 16:14:30 2023 \| LHC@home \| Started download of FzWLDmF0o12nsSi4apGgGQJmABFKDmABFKDmjv4TDmAmnKDm8Wupsn_input.tar.gz Thu 6 Apr 16:14:31 2023 \| LHC@home \| Finished download of FzWLDmF0o12nsSi4apGgGQJmABFKDmABFKDmjv4TDmAmnKDm8Wupsn_input.tar.gz Thu 6 Apr 16:14:31 2023 \| LHC@home \| Started download of boinc_job_script.SuPtZn Thu 6 Apr 16:14:32 2023 \| LHC@home \| Finished download of boinc_job_script.SuPtZn Thu 6 Apr 16:14:32 2023 \| LHC@home \| Started download of SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_EVNT.32794571._000076.pool.root.1 Thu 6 Apr 16:18:32 2023 \| LHC@home \| Finished download of SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_EVNT.32794571._000076.pool.root.1 Thu 6 Apr 16:18:32 2023 \| LHC@home \| Started download of SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_input.tar.gz Thu 6 Apr 16:18:33 2023 \| LHC@home \| Finished download of SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_input.tar.gz Thu 6 Apr 16:18:33 2023 \| LHC@home \| Started download of boinc_job_script.Ao4zBs Thu 6 Apr 16:18:34 2023 \| LHC@home \| Finished download of boinc_job_script.Ao4zBs Thu 6 Apr 16:19:00 2023 \| LHC@home \| Finished download of ATLAS_vbox_3.01_image.vdi Thu 6 Apr 16:19:37 2023 \| LHC@home \| Starting task SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0 Thu 6 Apr 21:11:17 2023 \| LHC@home \| Sending scheduler request: Requested by project. Thu 6 Apr 21:11:17 2023 \| LHC@home \| Not requesting tasks: "no new tasks" requested via Manager Thu 6 Apr 21:11:18 2023 \| LHC@home \| Scheduler request completed Thu 6 Apr 21:11:18 2023 \| LHC@home \| Project requested delay of 6 seconds Fri 7 Apr 02:11:19 2023 \| LHC@home \| Sending scheduler request: Requested by project. Fri 7 Apr 02:11:19 2023 \| LHC@home \| Not requesting tasks: "no new tasks" requested via Manager Fri 7 Apr 02:11:20 2023 \| LHC@home \| Scheduler request completed Fri 7 Apr 02:11:20 2023 \| LHC@home \| Project requested delay of 6 seconds Fri 7 Apr 07:11:20 2023 \| LHC@home \| Sending scheduler request: Requested by project. Fri 7 Apr 07:11:20 2023 \| LHC@home \| Not requesting tasks: "no new tasks" requested via Manager Fri 7 Apr 07:11:21 2023 \| LHC@home \| Scheduler request completed Fri 7 Apr 07:11:21 2023 \| LHC@home \| Project requested delay of 6 seconds Fri 7 Apr 12:11:24 2023 \| LHC@home \| Sending scheduler request: Requested by project. Fri 7 Apr 12:11:24 2023 \| LHC@home \| Not requesting tasks: "no new tasks" requested via Manager Fri 7 Apr 12:11:26 2023 \| LHC@home \| Scheduler request completed Fri 7 Apr 12:11:26 2023 \| LHC@home \| Project requested delay of 6 seconds Fri 7 Apr 15:24:20 2023 \| LHC@home \| task FzWLDmF0o12nsSi4apGgGQJmABFKDmABFKDmjv4TDmAmnKDm8Wupsn_0 suspended by user Fri 7 Apr 17:11:30 2023 \| LHC@home \| Sending scheduler request: Requested by project. Fri 7 Apr 17:11:30 2023 \| LHC@home \| Not requesting tasks: "no new tasks" requested via Manager Fri 7 Apr 17:11:31 2023 \| LHC@home \| Scheduler request completed Fri 7 Apr 17:11:31 2023 \| LHC@home \| Project requested delay of 6 seconds Fri 7 Apr 22:11:33 2023 \| LHC@home \| Sending scheduler request: Requested by project. Fri 7 Apr 22:11:33 2023 \| LHC@home \| Not requesting tasks: "no new tasks" requested via Manager Fri 7 Apr 22:11:34 2023 \| LHC@home \| Scheduler request completed Fri 7 Apr 22:11:34 2023 \| LHC@home \| Project requested delay of 6 seconds Fri 7 Apr 22:25:22 2023 \| LHC@home \| Aborting task FzWLDmF0o12nsSi4apGgGQJmABFKDmABFKDmjv4TDmAmnKDm8Wupsn_0; not started and deadline has passed Fri 7 Apr 22:25:54 2023 \| LHC@home \| Sending scheduler request: To report completed tasks. Fri 7 Apr 22:25:54 2023 \| LHC@home \| Reporting 1 completed tasks Fri 7 Apr 22:25:54 2023 \| LHC@home \| Not requesting tasks: "no new tasks" requested via Manager Fri 7 Apr 22:25:55 2023 \| LHC@home \| Scheduler request completed Fri 7 Apr 22:25:55 2023 \| LHC@home \| Project requested delay of 6 seconds Sat 8 Apr 03:25:57 2023 \| LHC@home \| Sending scheduler request: Requested by project. Sat 8 Apr 03:25:57 2023 \| LHC@home \| Not requesting tasks: "no new tasks" requested via Manager Sat 8 Apr 03:25:58 2023 \| LHC@home \| Scheduler request completed Sat 8 Apr 03:25:58 2023 \| LHC@home \| Result SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0 is no longer usable Sat 8 Apr 03:25:58 2023 \| LHC@home \| Project requested delay of 6 seconds Sat 8 Apr 03:26:08 2023 \| LHC@home \| Sending scheduler request: To report completed tasks. Sat 8 Apr 03:26:08 2023 \| LHC@home \| Reporting 1 completed tasks Sat 8 Apr 03:26:08 2023 \| LHC@home \| Not requesting tasks: "no new tasks" requested via Manager Sat 8 Apr 03:26:09 2023 \| LHC@home \| Scheduler request completed Sat 8 Apr 03:26:09 2023 \| LHC@home \| Project requested delay of 6 seconds Sat 8 Apr 03:26:09 2023 \| LHC@home \| [error] garbage_collect(); still have active task for acked result SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0; state 5 Sat 8 Apr 03:26:10 2023 \| LHC@home \| [error] garbage_collect(); still have active task for acked result SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0; state 6 Sat 8 Apr 03:26:10 2023 \| LHC@home \| Computation for task SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0 finished Sat 8 Apr 03:26:10 2023 \| LHC@home \| Output file SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0_r1880572617_ATLAS_result for task SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0 absent Sat 8 Apr 03:26:20 2023 \| LHC@home \| Sending scheduler request: To report completed tasks. Sat 8 Apr 03:26:20 2023 \| LHC@home \| Reporting 1 completed tasks Sat 8 Apr 03:26:20 2023 \| LHC@home \| Not requesting tasks: "no new tasks" requested via Manager Sat 8 Apr 03:26:21 2023 \| LHC@home \| Scheduler request completed Sat 8 Apr 03:26:21 2023 \| LHC@home \| Project requested delay of 6 seconds Sat 8 Apr 03:26:21 2023 \| LHC@home \| [error] Got ack for task SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0, but can't find it ID: 47986 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 86,570	Message 47989 - Posted: 9 Apr 2023, 12:21:24 UTC - in response to Message 47986. There's still no log (stderr.txt) from an ATLAS task. BOINC manager's log shows that you did a project reset while you had tasks in progress. It also shows that you changed #cores from 5 to 3 which most likely crashed ATLAS. ATLAS is a multicore app and each task is set to a distinct #cores when the task starts. Hence, you must not change #cores while a multicore task is in progress. You have been asked to run and report a Theory task before you run another ATLAS task. Theory tasks are less resource hungry and even if they fail the stderr.txt reported back to the server may show important information. Hence, uncheck CMS and ATLAS on your prefs page until this is done. Now, to start with a clean environment it's recommended that you strictly follow this steps in order: 1. set all projects to NNT (no new tasks) 2. cancel all tasks from LHC@home that are not yet started 3. let all running tasks from LHC@home finish or cancel them 4. wait until all tasks from LHC@home are ready to be reported 5. "Update Project" to report them 6. On the project prefs page uncheck all apps but Theory 7. Also uncheck "If no work for selected applications is available, accept work from other applications?" 8. Set a small work buffer in your local client settings 9. If not tasks from LHC@home are left on your computer, reset the project 10. Allow work from LHC@home and request work 11. Set NNT once you got work 12. Run at least 1 Theory task and report it 13. Check whether that task appears on your computer's webpage: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10667017 14. On the prefs page uncheck Theory and enable ATLAS 15. Repeat steps 10.-13. with ATLAS ID: 47989 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 86,570	Message 47993 - Posted: 11 Apr 2023, 10:05:48 UTC It looks like all ATLAS (native) tasks currently running on all of my computers: - are configured to process 500 events instead of 200 - are using input files around 570 MB each It would be nice to get a clear statement from CERN whether this is the new standard now. As far as I understand the discussion at -dev 500 eventers were just tests: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=614 Options: - leave it at 200 - increase it to 500 - increase it to 2000 ID: 47993 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1491 Credit: 9,984,157 RAC: 1,022	Message 47994 - Posted: 11 Apr 2023, 11:51:14 UTC Last modified: 11 Apr 2023, 15:00:04 UTC These new tasks, created just this morning with the download of 570MB and a HITS-upload of 190MB, are the awaited 'Run 3' tasks. With 'top' only one python process is shown with reserved memory of 2.4 G and almost 800% CPU for 8 threads. The monitoring with ALT-F2 is garbled with only showing one random worker out of 8 and scrolling a lot of lines every minute. The initializing phase of these tasks is lasting about 3 times longer. ID: 47994 · Reply Quote

David Cameron Project administrator Project developer Project scientist Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0	Message 47996 - Posted: 11 Apr 2023, 20:11:43 UTC - in response to Message 47994. I can confirm we are now mostly running the Run 3 tasks. I asked for some test tasks here and got more than I thought :) These tasks are processing 500 events but each event is faster than before so the overall run time should be only a little bit longer. Also as you have noticed the console monitoring doesnâ€™t work due to changes in the logging format. The â€œtopâ€ monitoring on Alt-F3 also shows a single python process instead of multiple athena.py processes. ID: 47996 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1491 Credit: 9,984,157 RAC: 1,022	Message 47997 - Posted: 12 Apr 2023, 5:40:17 UTC - in response to Message 47996. These tasks are processing 500 events but each event is faster than before so the overall run time should be only a little bit longer. On a laptop i5-6200U using 3 core-VMs: 200 events Run 2: Run Time 35,265.99s CPU 100,259.10s 500 events Run 3: Run Time 32,849.15s CPU 93,464.47s ID: 47997 · Reply Quote

rbpeake Send message Joined: 17 Sep 04 Posts: 106 Credit: 36,549,147 RAC: 72	Message 48002 - Posted: 12 Apr 2023, 21:26:19 UTC - in response to Message 47996. I can confirm we are now mostly running the Run 3 tasks. I asked for some test tasks here and got more than I thought :) These tasks are processing 500 events but each event is faster than before so the overall run time should be only a little bit longer. Also as you have noticed the console monitoring doesnâ€™t work due to changes in the logging format. The â€œtopâ€ monitoring on Alt-F3 also shows a single python process instead of multiple athena.py processes. Just idle curiosity, wondering what it is about the Run 3 tasks that make them run more efficiently than those of Run 2? Thanks. Regards, Bob P. ID: 48002 · Reply Quote

Sesson Send message Joined: 4 Apr 19 Posts: 31 Credit: 4,860,362 RAC: 1	Message 48105 - Posted: 16 May 2023, 14:29:01 UTC - in response to Message 48002. ATLAS Athena framework undergoes important renovation I hope the ATLAS project can set a smaller default memory limit for VirtualBox VMs that is appropriate for Run 3 tasks. ID: 48105 · Reply Quote

zepingouin Send message Joined: 7 Jan 07 Posts: 41 Credit: 16,112,504 RAC: 427	Message 48644 - Posted: 23 Sep 2023, 16:54:44 UTC - in response to Message 48105. I modified the command line to monitor ATLAS native 3.01: In this example, I used 2 CPUs per task hence the tail -n2. sudo watch -n10 "find /var/lib/boinc-client/slots/ \( -name \"log.EVNTtoHITS\" -o -name \"AthenaMP.log\" \) \|sort \|xargs -I {} -n1 sh -c \"egrep 'INFO.*Run:Event ' {} \|tail -n2\"\|sort -k 7,7" An example of output: 17:32:48 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool 390 0 INFO Run:Event 450000:20848791 (200th event for this worker) took 82.67 s. New average 93.67 +- 3.91 17:32:44 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool 391 1 INFO Run:Event 450000:20848792 (192th event for this worker) took 45.15 s. New average 98.49 +- 3.622 17:32:03 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool 362 0 INFO Run:Event 450000:22570763 (186th event for this worker) took 40.7 s. New average 96.78 +- 3.699 17:32:53 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool 363 1 INFO Run:Event 450000:22570764 (178th event for this worker) took 128.6 s. New average 102.1 +- 4.028 17:33:07 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool 312 1 INFO Run:Event 450000:22644313 (159th event for this worker) took 209.2 s. New average 95.61 +- 3.997 17:33:01 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool 313 0 INFO Run:Event 450000:22644314 (155th event for this worker) took 152.8 s. New average 99 +- 4.297 ID: 48644 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 86,570	Message 48645 - Posted: 23 Sep 2023, 17:59:29 UTC - in response to Message 48644. Nice! You got the idea and modified it according to your needs. +1 ID: 48645 · Reply Quote