Message boards : ATLAS application : ATLAS vbox and native 3.01
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
maeax

Send message
Joined: 2 May 07
Posts: 2096
Credit: 159,671,803
RAC: 143,689
Message 47958 - Posted: 1 Apr 2023, 7:26:55 UTC - in response to Message 47956.  

Toggleton,
seeing the same. 238 MByte instead of 1.08 and 1.09 GByte.
ID: 47958 · Report as offensive     Reply Quote
BellyNitpicker

Send message
Joined: 16 Jun 20
Posts: 8
Credit: 2,318,092
RAC: 0
Message 47959 - Posted: 1 Apr 2023, 21:22:15 UTC - in response to Message 47957.  

With Version 3.01 something went wrong. maybe more RAM needed.


Yes, I know my OS is Darwin. I've allocated 12.8GB RAM. That's all you can have, the rest is allocated to other usage. Do you need more??
ID: 47959 · Report as offensive     Reply Quote
BellyNitpicker

Send message
Joined: 16 Jun 20
Posts: 8
Credit: 2,318,092
RAC: 0
Message 47960 - Posted: 1 Apr 2023, 21:34:37 UTC - in response to Message 47955.  

I don't think the ATLAS task like to be started and stopped.
I would pick a single project application to run and stick to only that for now.


I didn't switch the tasks. That was the BOINC manager.

After an earlier runaway, I: suspended the project; set no new tasks; aborted all existing tasks; tasks; unsuspended; reset the project; allowed new tasks.

All tasks downloaded (8) were ATLAS of the same type.

After >7mins of processing, the active task was made pending and a new one started. My task switching was set to "every 60 mins". I've set it to 9 hours. Have also set my preference only for ATLAS simulation FTB, and max of 2 jobs.

But only when I've run down the Ubuntu VM I started in place of Darwin BOINC, as the VM has a few days of work on it at the moment.
ID: 47960 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,150,800
RAC: 128,371
Message 47961 - Posted: 2 Apr 2023, 6:14:14 UTC - in response to Message 47960.  

BellyNitpicker wrote:
However, my Ubuntu VMs run without problem

How many of them do you run concurrently and how much RAM do they allocate in total?


BellyNitpicker wrote:
I: suspended the project; set no new tasks; aborted all existing tasks; tasks; unsuspended; reset the project; allowed new tasks.

This is part of the problem since all logs are gone which may have included useful hints.
So far the server shows successful ATLAS v2.03 logs but not a single one from v3.01.

Suggestion:
Run and report a Theory vbox task to see whether that succeeds - Theory requires much less ressources.
Once Theory works fine, run a single ATLAS vbox task and let it send the logs back to the server.
ID: 47961 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1279
Credit: 8,484,048
RAC: 1,651
Message 47963 - Posted: 3 Apr 2023, 9:27:16 UTC

@David Cameron:
The memory usage is not like expected.
In ATLAS_vbox_3.01_job.xml the memory usage is set to 4000MB,
but at the moment the old method for memory usage for VMs is still used: 3000MB + 900MB/core like 3900, 4800, 5700, 6600 . . . . 10200 for 1, 2 ,3, 4 .... 8 (unlimited) cores.
ID: 47963 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,618,769
RAC: 15,794
Message 47964 - Posted: 3 Apr 2023, 13:24:05 UTC - in response to Message 47963.  

@David Cameron:
The memory usage is not like expected.
In ATLAS_vbox_3.01_job.xml the memory usage is set to 4000MB,
but at the moment the old method for memory usage for VMs is still used: 3000MB + 900MB/core like 3900, 4800, 5700, 6600 . . . . 10200 for 1, 2 ,3, 4 .... 8 (unlimited) cores.

In slot directory there is a init_data.xml file where you can find line: <rsc_memory_bound>6920601600.000000</rsc_memory_bound>
This is for my 4 core Atlas task.
ID: 47964 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1279
Credit: 8,484,048
RAC: 1,651
Message 47965 - Posted: 3 Apr 2023, 16:54:09 UTC - in response to Message 47964.  

In slot directory there is a init_data.xml file where you can find line: <rsc_memory_bound>6920601600.000000</rsc_memory_bound>
This is for my 4 core Atlas task.
Yeah, that's the old formula 3900MB + 4 * 900 MB = 6600MB = 6920601600 bytes
On the development system we tested the new application with much lower memory and it was working fine even with 3000MB RAM,
but here for safety it is set to 4000MB. However this setting in the xml-file is not respected on the production system.
Until this is fixed you could lower the memory usage yourself by using an app_config.xml in the lhcathome.cern.ch_lhcathome-project directory.
Example:
<app_config>
 <app_version>
  <app_name>ATLAS</app_name>
  <plan_class>vbox64_mt_mcore_atlas</plan_class>
  <cmdline>--memory_size_mb 4000 --nthreads 4</cmdline>
 </app_version>
</app_config>
ID: 47965 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2096
Credit: 159,671,803
RAC: 143,689
Message 47967 - Posted: 4 Apr 2023, 10:01:53 UTC

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=209638954
[2023-04-04 09:52:51] 2023-04-04 07:51:40,779 | INFO | job summary report
[2023-04-04 09:52:51] 2023-04-04 07:51:40,779 | INFO | --------------------------------------------------
[2023-04-04 09:52:51] 2023-04-04 07:51:40,779 | INFO | PanDA job id: 5807945461
[2023-04-04 09:52:51] 2023-04-04 07:51:40,779 | INFO | task id: 32930087
[2023-04-04 09:52:51] 2023-04-04 07:51:40,779 | INFO | error 1/1: 1305: Failed to execute payload:PyJobTransforms.transform.execute 2023-04-04 09:47:55,790 CRITICAL Transform executor raised TransformValidationExceptio
[2023-04-04 09:52:51] 2023-04-04 07:51:40,779 | INFO | status: LOG_TRANSFER = DONE
[2023-04-04 09:52:51] 2023-04-04 07:51:40,779 | INFO | pilot state: failed
[2023-04-04 09:52:51] 2023-04-04 07:51:40,780 | INFO | transexitcode: 65
[2023-04-04 09:52:51] 2023-04-04 07:51:40,780 | INFO | exeerrorcode: 65
[2023-04-04 09:52:51] 2023-04-04 07:51:40,780 | INFO | exeerrordiag: Non-zero return code from EVNTtoHITS (1)
[2023-04-04 09:52:51] 2023-04-04 07:51:40,780 | INFO | exitcode: 65
ID: 47967 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 47977 - Posted: 6 Apr 2023, 7:36:10 UTC - in response to Message 47965.  

In slot directory there is a init_data.xml file where you can find line: 6920601600.000000
This is for my 4 core Atlas task.
Yeah, that's the old formula 3900MB + 4 * 900 MB = 6600MB = 6920601600 bytes
On the development system we tested the new application with much lower memory and it was working fine even with 3000MB RAM,
but here for safety it is set to 4000MB. However this setting in the xml-file is not respected on the production system.
Until this is fixed you could lower the memory usage yourself by using an app_config.xml in the lhcathome.cern.ch_lhcathome-project directory.
Example:
<app_config>
 <app_version>
  <app_name>ATLAS</app_name>
  <plan_class>vbox64_mt_mcore_atlas</plan_class>
  <cmdline>--memory_size_mb 4000 --nthreads 4</cmdline>
 </app_version>
</app_config>


We are still running the old Run 2 tasks here so we need the memory to scale up with number of cores. Once we exclusively run Run 3 tasks we can remove the memory scaling and use a fixed value of 4GB.
ID: 47977 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1279
Credit: 8,484,048
RAC: 1,651
Message 47978 - Posted: 6 Apr 2023, 8:58:33 UTC - in response to Message 47977.  

We are still running the old Run 2 tasks here so we need the memory to scale up with number of cores. Once we exclusively run Run 3 tasks we can remove the memory scaling and use a fixed value of 4GB.
In my opinion it does not matter whether it's a Run 2 or Run 3 task. In both cases the new ATLAS_vbox_3.01_image.vdi is used.
I tested several setup's with 2, 3, 4, 5 and 8 threads with 'only' 4096 MB of RAM and all tasks were successful with almost no swapping.
8 threads:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=391160006
https://lhcathome.cern.ch/lhcathome/result.php?resultid=391160097
https://lhcathome.cern.ch/lhcathome/result.php?resultid=391459825
https://lhcathome.cern.ch/lhcathome/result.php?resultid=391459378
ID: 47978 · Report as offensive     Reply Quote
BellyNitpicker

Send message
Joined: 16 Jun 20
Posts: 8
Credit: 2,318,092
RAC: 0
Message 47986 - Posted: 9 Apr 2023, 7:57:02 UTC - in response to Message 47961.  

This is part of the problem since all logs are gone which may have included useful hints.
So far the server shows successful ATLAS v2.03 logs but not a single one from v3.01.


Here is a log. I reduced the CPUs available to three, but left memory at 13GB. Two previously lost tasks downloaded, this time for 3CPUs. Both advertised their estimated duration as 4h20m. I ran one to conclusion, its actual duration was 35h15m. Remember that the 5CPU tasks also ran on for a very long time beyond their estimated time.

NB. Prior to the events of the last week, I have been running Ubuntu VMs alongside BOINC on Darwin (with LHC as its only project) for more than two years without conflict or problem (Except for the short-term VBox memory leak in an earlier version of MacOS/Darwin). The three VMs have a total memory allocation of 12GB, so there's 7GB (give or take) left over for system activity.

Thu 6 Apr 16:10:02 2023 | | cc_config.xml not found - using defaults
Thu 6 Apr 16:10:02 2023 | | Starting BOINC client version 7.20.4 for x86_64-apple-darwin
Thu 6 Apr 16:10:02 2023 | | log flags: file_xfer, sched_ops, task
Thu 6 Apr 16:10:02 2023 | | Libraries: libcurl/7.79.1 SecureTransport zlib/1.2.11 c-ares/1.17.2
Thu 6 Apr 16:10:02 2023 | | Data directory: /Library/Application Support/BOINC Data
Thu 6 Apr 16:10:02 2023 | | OpenCL: Intel GPU 0: Intel(R) UHD Graphics 630 (driver version 1.2(Jan 10 2023 21:29:09), device version OpenCL 1.2, 1536MB, 1536MB available, 230 GFLOPS peak)
Thu 6 Apr 16:10:02 2023 | | OpenCL CPU: Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2)
Thu 6 Apr 16:10:02 2023 | | Host name: NandLBTScience
Thu 6 Apr 16:10:02 2023 | | Processor: 12 GenuineIntel Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz [x86 Family 6 Model 158 Stepping 10]
Thu 6 Apr 16:10:02 2023 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clfsh ds acpi mmx fxsr sse sse2 ss htt tm pbe pni pclmulqdq dtes64 mon dscpl vmx smx est tm2 ssse3 fma cx16 tpr pdcm sse4_1 sse4_2 x2apic movbe popcnt aes pcid xsave osxsave seglim64 tsctmr avx rdrand f16c
Thu 6 Apr 16:10:02 2023 | | OS: Mac OS X 12.6.3 (Darwin 21.6.0)
Thu 6 Apr 16:10:02 2023 | | Memory: 32.00 GB physical, 209.57 GB virtual
Thu 6 Apr 16:10:02 2023 | | Disk: 465.63 GB total, 209.57 GB free
Thu 6 Apr 16:10:02 2023 | | Local time is UTC +1 hours
Thu 6 Apr 16:10:02 2023 | | VirtualBox version: 7.0.6r155176
Thu 6 Apr 16:10:02 2023 | | General prefs: from https://climateprediction.net/ (last modified 23-Dec-2021 14:43:38)
Thu 6 Apr 16:10:02 2023 | | Computer location: home
Thu 6 Apr 16:10:02 2023 | | General prefs: no separate prefs for home; using your defaults
Thu 6 Apr 16:10:02 2023 | | Reading preferences override file
Thu 6 Apr 16:10:02 2023 | | Preferences:
Thu 6 Apr 16:10:02 2023 | | max memory usage when active: 13107.20 MB
Thu 6 Apr 16:10:02 2023 | | max memory usage when idle: 13107.20 MB
Thu 6 Apr 16:10:02 2023 | | max disk usage: 40.00 GB
Thu 6 Apr 16:10:02 2023 | | max CPUs used: 5
Thu 6 Apr 16:10:02 2023 | | (to change preferences, visit a project web site or select Preferences in the Manager)
Thu 6 Apr 16:10:02 2023 | | Setting up project and slot directories
Thu 6 Apr 16:10:02 2023 | | Checking active tasks
Thu 6 Apr 16:10:02 2023 | LHC@home | URL https://lhcathome.cern.ch/lhcathome/; Computer ID 10667017; resource share 10
Thu 6 Apr 16:10:02 2023 | | Setting up GUI RPC socket
Thu 6 Apr 16:10:02 2023 | | Checking presence of 0 project files
Thu 6 Apr 16:10:22 2023 | | General prefs: from https://climateprediction.net/ (last modified 23-Dec-2021 14:43:38)
Thu 6 Apr 16:10:22 2023 | | Computer location: home
Thu 6 Apr 16:10:22 2023 | | General prefs: no separate prefs for home; using your defaults
Thu 6 Apr 16:10:22 2023 | | Reading preferences override file
Thu 6 Apr 16:10:22 2023 | | Preferences:
Thu 6 Apr 16:10:22 2023 | | max memory usage when active: 13107.20 MB
Thu 6 Apr 16:10:22 2023 | | max memory usage when idle: 13107.20 MB
Thu 6 Apr 16:10:22 2023 | | max disk usage: 40.00 GB
Thu 6 Apr 16:10:22 2023 | | Number of usable CPUs has changed from 5 to 3.
Thu 6 Apr 16:10:22 2023 | | max CPUs used: 3
Thu 6 Apr 16:10:22 2023 | | (to change preferences, visit a project web site or select Preferences in the Manager)
Thu 6 Apr 16:10:38 2023 | LHC@home | update requested by user
Thu 6 Apr 16:10:43 2023 | LHC@home | project resumed by user
Thu 6 Apr 16:10:44 2023 | LHC@home | Master file download succeeded
Thu 6 Apr 16:10:49 2023 | LHC@home | Sending scheduler request: Requested by user.
Thu 6 Apr 16:10:49 2023 | LHC@home | Not requesting tasks: "no new tasks" requested via Manager
Thu 6 Apr 16:10:50 2023 | LHC@home | Scheduler request completed
Thu 6 Apr 16:10:50 2023 | LHC@home | Project requested delay of 6 seconds
Thu 6 Apr 16:10:51 2023 | LHC@home | work fetch resumed by user
Thu 6 Apr 16:11:00 2023 | LHC@home | Sending scheduler request: To fetch work.
Thu 6 Apr 16:11:00 2023 | LHC@home | Requesting new tasks for CPU
Thu 6 Apr 16:11:01 2023 | LHC@home | Scheduler request completed: got 2 new tasks
Thu 6 Apr 16:11:01 2023 | LHC@home | Resent lost task FzWLDmF0o12nsSi4apGgGQJmABFKDmABFKDmjv4TDmAmnKDm8Wupsn_0
Thu 6 Apr 16:11:01 2023 | LHC@home | Resent lost task SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0
Thu 6 Apr 16:11:01 2023 | LHC@home | Project requested delay of 6 seconds
Thu 6 Apr 16:11:03 2023 | LHC@home | Started download of vboxwrapper_26206_x86_64-apple-darwin
Thu 6 Apr 16:11:03 2023 | LHC@home | Started download of ATLAS_vbox_3.01_job.xml
Thu 6 Apr 16:11:04 2023 | LHC@home | Finished download of vboxwrapper_26206_x86_64-apple-darwin
Thu 6 Apr 16:11:04 2023 | LHC@home | Finished download of ATLAS_vbox_3.01_job.xml
Thu 6 Apr 16:11:04 2023 | LHC@home | Started download of ATLAS_vbox_3.01_image.vdi
Thu 6 Apr 16:11:04 2023 | LHC@home | Started download of FzWLDmF0o12nsSi4apGgGQJmABFKDmABFKDmjv4TDmAmnKDm8Wupsn_EVNT.32794571._000076.pool.root.1
Thu 6 Apr 16:11:11 2023 | LHC@home | Sending scheduler request: To fetch work.
Thu 6 Apr 16:11:11 2023 | LHC@home | Requesting new tasks for CPU
Thu 6 Apr 16:11:12 2023 | LHC@home | Scheduler request completed: got 0 new tasks
Thu 6 Apr 16:11:12 2023 | LHC@home | No tasks sent
Thu 6 Apr 16:11:12 2023 | LHC@home | No tasks are available for ATLAS Simulation
Thu 6 Apr 16:11:12 2023 | LHC@home | This computer has reached a limit on tasks in progress
Thu 6 Apr 16:11:12 2023 | LHC@home | Project requested delay of 6 seconds
Thu 6 Apr 16:13:13 2023 | LHC@home | work fetch suspended by user
Thu 6 Apr 16:14:30 2023 | LHC@home | Finished download of FzWLDmF0o12nsSi4apGgGQJmABFKDmABFKDmjv4TDmAmnKDm8Wupsn_EVNT.32794571._000076.pool.root.1
Thu 6 Apr 16:14:30 2023 | LHC@home | Started download of FzWLDmF0o12nsSi4apGgGQJmABFKDmABFKDmjv4TDmAmnKDm8Wupsn_input.tar.gz
Thu 6 Apr 16:14:31 2023 | LHC@home | Finished download of FzWLDmF0o12nsSi4apGgGQJmABFKDmABFKDmjv4TDmAmnKDm8Wupsn_input.tar.gz
Thu 6 Apr 16:14:31 2023 | LHC@home | Started download of boinc_job_script.SuPtZn
Thu 6 Apr 16:14:32 2023 | LHC@home | Finished download of boinc_job_script.SuPtZn
Thu 6 Apr 16:14:32 2023 | LHC@home | Started download of SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_EVNT.32794571._000076.pool.root.1
Thu 6 Apr 16:18:32 2023 | LHC@home | Finished download of SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_EVNT.32794571._000076.pool.root.1
Thu 6 Apr 16:18:32 2023 | LHC@home | Started download of SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_input.tar.gz
Thu 6 Apr 16:18:33 2023 | LHC@home | Finished download of SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_input.tar.gz
Thu 6 Apr 16:18:33 2023 | LHC@home | Started download of boinc_job_script.Ao4zBs
Thu 6 Apr 16:18:34 2023 | LHC@home | Finished download of boinc_job_script.Ao4zBs
Thu 6 Apr 16:19:00 2023 | LHC@home | Finished download of ATLAS_vbox_3.01_image.vdi
Thu 6 Apr 16:19:37 2023 | LHC@home | Starting task SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0
Thu 6 Apr 21:11:17 2023 | LHC@home | Sending scheduler request: Requested by project.
Thu 6 Apr 21:11:17 2023 | LHC@home | Not requesting tasks: "no new tasks" requested via Manager
Thu 6 Apr 21:11:18 2023 | LHC@home | Scheduler request completed
Thu 6 Apr 21:11:18 2023 | LHC@home | Project requested delay of 6 seconds
Fri 7 Apr 02:11:19 2023 | LHC@home | Sending scheduler request: Requested by project.
Fri 7 Apr 02:11:19 2023 | LHC@home | Not requesting tasks: "no new tasks" requested via Manager
Fri 7 Apr 02:11:20 2023 | LHC@home | Scheduler request completed
Fri 7 Apr 02:11:20 2023 | LHC@home | Project requested delay of 6 seconds
Fri 7 Apr 07:11:20 2023 | LHC@home | Sending scheduler request: Requested by project.
Fri 7 Apr 07:11:20 2023 | LHC@home | Not requesting tasks: "no new tasks" requested via Manager
Fri 7 Apr 07:11:21 2023 | LHC@home | Scheduler request completed
Fri 7 Apr 07:11:21 2023 | LHC@home | Project requested delay of 6 seconds
Fri 7 Apr 12:11:24 2023 | LHC@home | Sending scheduler request: Requested by project.
Fri 7 Apr 12:11:24 2023 | LHC@home | Not requesting tasks: "no new tasks" requested via Manager
Fri 7 Apr 12:11:26 2023 | LHC@home | Scheduler request completed
Fri 7 Apr 12:11:26 2023 | LHC@home | Project requested delay of 6 seconds
Fri 7 Apr 15:24:20 2023 | LHC@home | task FzWLDmF0o12nsSi4apGgGQJmABFKDmABFKDmjv4TDmAmnKDm8Wupsn_0 suspended by user
Fri 7 Apr 17:11:30 2023 | LHC@home | Sending scheduler request: Requested by project.
Fri 7 Apr 17:11:30 2023 | LHC@home | Not requesting tasks: "no new tasks" requested via Manager
Fri 7 Apr 17:11:31 2023 | LHC@home | Scheduler request completed
Fri 7 Apr 17:11:31 2023 | LHC@home | Project requested delay of 6 seconds
Fri 7 Apr 22:11:33 2023 | LHC@home | Sending scheduler request: Requested by project.
Fri 7 Apr 22:11:33 2023 | LHC@home | Not requesting tasks: "no new tasks" requested via Manager
Fri 7 Apr 22:11:34 2023 | LHC@home | Scheduler request completed
Fri 7 Apr 22:11:34 2023 | LHC@home | Project requested delay of 6 seconds
Fri 7 Apr 22:25:22 2023 | LHC@home | Aborting task FzWLDmF0o12nsSi4apGgGQJmABFKDmABFKDmjv4TDmAmnKDm8Wupsn_0; not started and deadline has passed
Fri 7 Apr 22:25:54 2023 | LHC@home | Sending scheduler request: To report completed tasks.
Fri 7 Apr 22:25:54 2023 | LHC@home | Reporting 1 completed tasks
Fri 7 Apr 22:25:54 2023 | LHC@home | Not requesting tasks: "no new tasks" requested via Manager
Fri 7 Apr 22:25:55 2023 | LHC@home | Scheduler request completed
Fri 7 Apr 22:25:55 2023 | LHC@home | Project requested delay of 6 seconds
Sat 8 Apr 03:25:57 2023 | LHC@home | Sending scheduler request: Requested by project.
Sat 8 Apr 03:25:57 2023 | LHC@home | Not requesting tasks: "no new tasks" requested via Manager
Sat 8 Apr 03:25:58 2023 | LHC@home | Scheduler request completed
Sat 8 Apr 03:25:58 2023 | LHC@home | Result SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0 is no longer usable
Sat 8 Apr 03:25:58 2023 | LHC@home | Project requested delay of 6 seconds
Sat 8 Apr 03:26:08 2023 | LHC@home | Sending scheduler request: To report completed tasks.
Sat 8 Apr 03:26:08 2023 | LHC@home | Reporting 1 completed tasks
Sat 8 Apr 03:26:08 2023 | LHC@home | Not requesting tasks: "no new tasks" requested via Manager
Sat 8 Apr 03:26:09 2023 | LHC@home | Scheduler request completed
Sat 8 Apr 03:26:09 2023 | LHC@home | Project requested delay of 6 seconds
Sat 8 Apr 03:26:09 2023 | LHC@home | [error] garbage_collect(); still have active task for acked result SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0; state 5
Sat 8 Apr 03:26:10 2023 | LHC@home | [error] garbage_collect(); still have active task for acked result SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0; state 6
Sat 8 Apr 03:26:10 2023 | LHC@home | Computation for task SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0 finished
Sat 8 Apr 03:26:10 2023 | LHC@home | Output file SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0_r1880572617_ATLAS_result for task SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0 absent
Sat 8 Apr 03:26:20 2023 | LHC@home | Sending scheduler request: To report completed tasks.
Sat 8 Apr 03:26:20 2023 | LHC@home | Reporting 1 completed tasks
Sat 8 Apr 03:26:20 2023 | LHC@home | Not requesting tasks: "no new tasks" requested via Manager
Sat 8 Apr 03:26:21 2023 | LHC@home | Scheduler request completed
Sat 8 Apr 03:26:21 2023 | LHC@home | Project requested delay of 6 seconds
Sat 8 Apr 03:26:21 2023 | LHC@home | [error] Got ack for task SRvLDmC4o12nfZGDcpSWOuwoABFKDmABFKDm4gFKDmfrZKDmJuWicn_0, but can't find it
ID: 47986 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,150,800
RAC: 128,371
Message 47989 - Posted: 9 Apr 2023, 12:21:24 UTC - in response to Message 47986.  

There's still no log (stderr.txt) from an ATLAS task.

BOINC manager's log shows that you did a project reset while you had tasks in progress.
It also shows that you changed #cores from 5 to 3 which most likely crashed ATLAS.
ATLAS is a multicore app and each task is set to a distinct #cores when the task starts.
Hence, you must not change #cores while a multicore task is in progress.

You have been asked to run and report a Theory task before you run another ATLAS task.
Theory tasks are less resource hungry and even if they fail the stderr.txt reported back to the server may show important information.
Hence, uncheck CMS and ATLAS on your prefs page until this is done.

Now, to start with a clean environment it's recommended that you strictly follow this steps in order:
1. set all projects to NNT (no new tasks)
2. cancel all tasks from LHC@home that are not yet started
3. let all running tasks from LHC@home finish or cancel them
4. wait until all tasks from LHC@home are ready to be reported
5. "Update Project" to report them
6. On the project prefs page uncheck all apps but Theory
7. Also uncheck "If no work for selected applications is available, accept work from other applications?"
8. Set a small work buffer in your local client settings
9. If not tasks from LHC@home are left on your computer, reset the project
10. Allow work from LHC@home and request work
11. Set NNT once you got work
12. Run at least 1 Theory task and report it
13. Check whether that task appears on your computer's webpage:
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10667017
14. On the prefs page uncheck Theory and enable ATLAS
15. Repeat steps 10.-13. with ATLAS
ID: 47989 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,150,800
RAC: 128,371
Message 47993 - Posted: 11 Apr 2023, 10:05:48 UTC

It looks like all ATLAS (native) tasks currently running on all of my computers:
- are configured to process 500 events instead of 200
- are using input files around 570 MB each


It would be nice to get a clear statement from CERN whether this is the new standard now.
As far as I understand the discussion at -dev 500 eventers were just tests:
https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=614

Options:
- leave it at 200
- increase it to 500
- increase it to 2000
ID: 47993 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1279
Credit: 8,484,048
RAC: 1,651
Message 47994 - Posted: 11 Apr 2023, 11:51:14 UTC
Last modified: 11 Apr 2023, 15:00:04 UTC

These new tasks, created just this morning with the download of 570MB and a HITS-upload of 190MB, are the awaited 'Run 3' tasks.
With 'top' only one python process is shown with reserved memory of 2.4 G and almost 800% CPU for 8 threads.
The monitoring with ALT-F2 is garbled with only showing one random worker out of 8 and scrolling a lot of lines every minute.
The initializing phase of these tasks is lasting about 3 times longer.
ID: 47994 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 47996 - Posted: 11 Apr 2023, 20:11:43 UTC - in response to Message 47994.  

I can confirm we are now mostly running the Run 3 tasks. I asked for some test tasks here and got more than I thought :)

These tasks are processing 500 events but each event is faster than before so the overall run time should be only a little bit longer.

Also as you have noticed the console monitoring doesn’t work due to changes in the logging format. The “top” monitoring on Alt-F3 also shows a single python process instead of multiple athena.py processes.
ID: 47996 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1279
Credit: 8,484,048
RAC: 1,651
Message 47997 - Posted: 12 Apr 2023, 5:40:17 UTC - in response to Message 47996.  

These tasks are processing 500 events but each event is faster than before so the overall run time should be only a little bit longer.
On a laptop i5-6200U using 3 core-VMs:
200 events Run 2: Run Time 35,265.99s CPU 100,259.10s
500 events Run 3: Run Time 32,849.15s CPU 93,464.47s
ID: 47997 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 99
Credit: 30,701,303
RAC: 5,849
Message 48002 - Posted: 12 Apr 2023, 21:26:19 UTC - in response to Message 47996.  

I can confirm we are now mostly running the Run 3 tasks. I asked for some test tasks here and got more than I thought :)

These tasks are processing 500 events but each event is faster than before so the overall run time should be only a little bit longer.

Also as you have noticed the console monitoring doesn’t work due to changes in the logging format. The “top” monitoring on Alt-F3 also shows a single python process instead of multiple athena.py processes.


Just idle curiosity, wondering what it is about the Run 3 tasks that make them run more efficiently than those of Run 2?
Thanks.
Regards,
Bob P.
ID: 48002 · Report as offensive     Reply Quote
Sesson

Send message
Joined: 4 Apr 19
Posts: 31
Credit: 3,857,032
RAC: 8,412
Message 48105 - Posted: 16 May 2023, 14:29:01 UTC - in response to Message 48002.  

ATLAS Athena framework undergoes important renovation

I hope the ATLAS project can set a smaller default memory limit for VirtualBox VMs that is appropriate for Run 3 tasks.
ID: 48105 · Report as offensive     Reply Quote
Profile zepingouin
Avatar

Send message
Joined: 7 Jan 07
Posts: 41
Credit: 15,959,427
RAC: 19
Message 48644 - Posted: 23 Sep 2023, 16:54:44 UTC - in response to Message 48105.  

I modified the command line to monitor ATLAS native 3.01:
In this example, I used 2 CPUs per task hence the tail -n2.
sudo watch -n10 "find /var/lib/boinc-client/slots/ \( -name \"log.EVNTtoHITS\" -o -name \"AthenaMP.log\" \) |sort |xargs -I {} -n1 sh -c \"egrep 'INFO.*Run:Event ' {} |tail -n2\"|sort -k 7,7"

An example of output:
17:32:48 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool       390     0    INFO          Run:Event 450000:20848791       (200th event for this worker) took 82.67 s. New average 93.67 +- 3.91
17:32:44 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool       391     1    INFO          Run:Event 450000:20848792       (192th event for this worker) took 45.15 s. New average 98.49 +- 3.622
17:32:03 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool       362     0    INFO          Run:Event 450000:22570763       (186th event for this worker) took 40.7 s. New average 96.78 +- 3.699
17:32:53 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool       363     1    INFO          Run:Event 450000:22570764       (178th event for this worker) took 128.6 s. New average 102.1 +- 4.028
17:33:07 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool       312     1    INFO          Run:Event 450000:22644313       (159th event for this worker) took 209.2 s. New average 95.61 +- 3.997
17:33:01 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool       313     0    INFO          Run:Event 450000:22644314       (155th event for this worker) took 152.8 s. New average 99 +- 4.297
ID: 48644 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,150,800
RAC: 128,371
Message 48645 - Posted: 23 Sep 2023, 17:59:29 UTC - in response to Message 48644.  

Nice!
You got the idea and modified it according to your needs.
+1
ID: 48645 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : ATLAS application : ATLAS vbox and native 3.01


©2024 CERN