Message boards : Theory Application : High disk reads
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 865
Credit: 719,978,266
RAC: 177,301
Message 50785 - Posted: 14 Oct 2024, 18:11:13 UTC
Last modified: 14 Oct 2024, 18:11:42 UTC

Did someone else see this? Good that my SSD is fast, some task pushing on towards 100 TB of data



Task Manager

ID: 50785 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 865
Credit: 719,978,266
RAC: 177,301
Message 50786 - Posted: 14 Oct 2024, 18:16:32 UTC

ID: 50786 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2633
Credit: 271,791,872
RAC: 92,432
Message 50787 - Posted: 14 Oct 2024, 19:57:03 UTC - in response to Message 50785.  

Looks like the affected VMs are busy with swapping.
What happens if you configure them to use 1 GB RAM (or even 2 GB)?
Beside less disk activity this measure should increase average CPU usage (currently around 33 %).
ID: 50787 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 865
Credit: 719,978,266
RAC: 177,301
Message 50788 - Posted: 14 Oct 2024, 20:18:07 UTC - in response to Message 50787.  

From the discussion before its just this batch of WU's that could do with more memory?

I don't see alot of writes only reads, so it doesn't seem like swaping?

Anyhow I turned it up to 1 GB, as you can see I have plenty of memory for the ATLAS task so no problem for me.
ID: 50788 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2633
Credit: 271,791,872
RAC: 92,432
Message 50791 - Posted: 15 Oct 2024, 7:13:37 UTC - in response to Message 50788.  

One of my computers is currently running a Herwig7 (native) for more than 7.5 days.
That task has 3 main processes Herwig, rivetvm.exe and runRivet.sh which use a total of >900MB physical RAM.

A standard Theory VM is configured with 630 MB "physical RAM" which means the same task would be forced to swap out large amounts of data.
Now just a guess:
Imagine the scientific processes use a large data array (or a large DB) and traverses through it for each event that it processes.
Then it permanently drops parts of the array from RAM to read the next parts from disk.
This could explain the huge read activity as well as the low CPU usage since the CPU has to wait for that data.

Beside that the VM needs RAM for the OS and the CVMFS cache.
The CVMFS cache can be several GB and each object that is not already in the page cache has to be read from disk.

Might be a good idea to increase the default RAM setting for Theory VMs on the project server.
ID: 50791 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1448
Credit: 9,719,650
RAC: 230
Message 50794 - Posted: 15 Oct 2024, 8:14:31 UTC - in response to Message 50785.  

@Toby: Did you notice, whether this high disk-read throughput was during the integration phase or the event-processing phase? I suppose the latter.
For me it was during the event processing phase and computezrmle confirmed that he saw three processes only used during event processing.

@computezrmle: At the momemt I'm running a Herwig7.2.1 on a slow laptop (6th.gen), but after 4 days still in the integration phase (integrate 460 of 760)
I gave this VM 1024MB of RAM and 2 threads. After the integration however only 4000 events have to be processed.
Atm: From the swap: 965116 free, 83456 used and 196224 avail Mem.
From Mem 72576 free, 559484 used and 370996 cache.
ID: 50794 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2633
Credit: 271,791,872
RAC: 92,432
Message 50795 - Posted: 15 Oct 2024, 8:41:16 UTC - in response to Message 50794.  

I gave this VM 1024MB of RAM and 2 threads.

I'd leave the default of 1 thread for Theory VMs for the following reasons:
- BOINC (LHC@home) configures Theory vbox as singlecore and gets confused if you change this
- On saturated computers VirtualBox needs significantly more internal CPU cycles to run multicore VMs compared to singlecore
- There's only 1 process using nearly all CPU%/TIME (here: Herwig)
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                  
 617962 boinc3    39  19 1202136 858012  49436 R 97.54 0.651     90,00 Herwig                                                                                                                                   
  21468 boinc3    39  19  477256  32928   3864 S 1.536 0.025  39:25.23 rivetvm.exe                                                                                                                              
   7028 boinc3    39  19   32464  13032   1972 S 0.230 0.010  15:53.74 runRivet.sh
ID: 50795 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 865
Credit: 719,978,266
RAC: 177,301
Message 50809 - Posted: 15 Oct 2024, 19:10:31 UTC - in response to Message 50791.  

Make sense, I imagine there is a large write at the begining to dump the db to page file then many reads later.

Since the other projects use more memory it would make sense that the Theroy task have something similar or the theroy tasks are reserved for lighter work to support the user base with less powerful computers. 3rd option would be to split the work in to 2 types that users can opt into.
ID: 50809 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 865
Credit: 719,978,266
RAC: 177,301
Message 50811 - Posted: 15 Oct 2024, 19:15:41 UTC - in response to Message 50794.  
Last modified: 15 Oct 2024, 19:18:12 UTC

Looks like its processing?

ID: 50811 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2633
Credit: 271,791,872
RAC: 92,432
Message 50812 - Posted: 15 Oct 2024, 19:19:59 UTC - in response to Message 50811.  

Would be interesting if a higher RAM value for the VM leads to lower disk read activity combined with a higher CPU usage (in Vbox manager).
ID: 50812 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 865
Credit: 719,978,266
RAC: 177,301
Message 50813 - Posted: 15 Oct 2024, 19:29:17 UTC - in response to Message 50812.  

BOINC decide to run ATLAS now so there wasn't any new theroy running with 1 GB, they come up at some point.



A little sad with these deadlines

ID: 50813 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2266
Credit: 175,669,348
RAC: 3,819
Message 50814 - Posted: 15 Oct 2024, 19:35:30 UTC - in response to Message 50813.  
Last modified: 15 Oct 2024, 19:36:47 UTC

Herwig only in mcplot atm:
https://mcplots-dev.cern.ch/production.php?view=runs&rev=2794&display=succ

a few successful, but the most not successful!
ID: 50814 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1448
Credit: 9,719,650
RAC: 230
Message 50817 - Posted: 16 Oct 2024, 5:36:35 UTC - in response to Message 50814.  

Herwig only in mcplot atm:
https://mcplots-dev.cern.ch/production.php?view=runs&rev=2794&display=succ

a few successful, but the most not successful!
You can't call 'unknown' as not successful. How I read the figures:
19423 attempts
 1684 success
  267 failure
17472 unknown
ID: 50817 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1199
Credit: 67,117,675
RAC: 96,139
Message 50818 - Posted: 16 Oct 2024, 8:14:52 UTC

Herwig7 used to always run with no problems and I probably have some of those saved in my records since I tend to save a few of each version of event generator since I have always watched them start so I could tell which one it was......but lately they want to run for 10 days and are actually running the entire time when checking the running log
ID: 50818 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1868
Credit: 135,782,604
RAC: 99,806
Message 50824 - Posted: 16 Oct 2024, 16:15:49 UTC - in response to Message 50817.  

Herwig only in mcplot atm:
https://mcplots-dev.cern.ch/production.php?view=runs&rev=2794&display=succ

a few successful, but the most not successful!
You can't call 'unknown' as not successful. How I read the figures:
19423 attempts
 1684 success
  267 failure
17472 unknown
what exactly does "unknown" mean ? The high number of "unknown" irritates me. Are they useful for the science, or are they not?
ID: 50824 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1448
Credit: 9,719,650
RAC: 230
Message 50825 - Posted: 16 Oct 2024, 16:32:15 UTC - in response to Message 50824.  

Herwig only in mcplot atm:
https://mcplots-dev.cern.ch/production.php?view=runs&rev=2794&display=succ

a few successful, but the most not successful!
You can't call 'unknown' as not successful. How I read the figures:
19423 attempts
 1684 success
  267 failure
17472 unknown
what exactly does "unknown" mean ? The high number of "unknown" irritates me. Are they useful for the science, or are they not?
Unknown means: jobs still in the pipeline (on the server, in the (re)send-queue, processing by a client or elsewhere.

Latest figures:

19423 attempts
 1757 success
  271 failure
17395 unknown
ID: 50825 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 865
Credit: 719,978,266
RAC: 177,301
Message 50829 - Posted: 16 Oct 2024, 18:23:20 UTC

1GB looks good.

ID: 50829 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1868
Credit: 135,782,604
RAC: 99,806
Message 50834 - Posted: 17 Oct 2024, 13:03:10 UTC - in response to Message 50829.  

1GB looks good.
is this app_config setting the correct way to increase the memory to 1GB:
<cmdline>--memory_size_mb 1024</cmdline>
ID: 50834 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1448
Credit: 9,719,650
RAC: 230
Message 50835 - Posted: 17 Oct 2024, 13:10:52 UTC - in response to Message 50834.  
Last modified: 17 Oct 2024, 13:11:13 UTC

1GB looks good.
is this app_config setting the correct way to increase the memory to 1GB:
<cmdline>--memory_size_mb 1024</cmdline>

An example of a whole app_config.xml for Windows:
<app_config>
 <project_max_concurrent>3</project_max_concurrent>
 <app>
  <name>ATLAS</name>
  <max_concurrent>1</max_concurrent>
 </app>
 <app>
  <name>CMS</name>
  <max_concurrent>1</max_concurrent>
 </app>
 <app>
  <name>Theory</name>
  <max_concurrent>2</max_concurrent>
 </app>
 <app_version>
  <app_name>ATLAS</app_name>
  <plan_class>vbox64_mt_mcore_atlas</plan_class>
  <avg_ncpus>3</avg_ncpus>
  <cmdline>--memory_size_mb 4096 --nthreads 3</cmdline>
 </app_version>
 <app_version>
  <app_name>CMS</app_name>
  <plan_class>vbox64_mt_mcore_cms</plan_class>
  <avg_ncpus>3</avg_ncpus>
  <cmdline>--memory_size_mb 2048 --nthreads 4</cmdline>
 </app_version>
 <app_version>
  <app_name>Theory</app_name>
  <plan_class>vbox64_theory</plan_class>
  <avg_ncpus>1</avg_ncpus>
  <cmdline>--memory_size_mb 1024 --nthreads 1</cmdline>
 </app_version>
</app_config>
ID: 50835 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 865
Credit: 719,978,266
RAC: 177,301
Message 50838 - Posted: 18 Oct 2024, 19:50:48 UTC

After some time its still good, there is less reads than writes overall. about 45 MB of page file
ID: 50838 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Theory Application : High disk reads


©2025 CERN