Message boards :
Theory Application :
High disk reads
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 Sep 08 Posts: 865 Credit: 720,047,390 RAC: 173,391 ![]() ![]() ![]() |
Did someone else see this? Good that my SSD is fast, some task pushing on towards 100 TB of data ![]() Task Manager ![]() |
Send message Joined: 27 Sep 08 Posts: 865 Credit: 720,047,390 RAC: 173,391 ![]() ![]() ![]() |
|
![]() Send message Joined: 15 Jun 08 Posts: 2634 Credit: 271,838,578 RAC: 91,465 ![]() ![]() |
Looks like the affected VMs are busy with swapping. What happens if you configure them to use 1 GB RAM (or even 2 GB)? Beside less disk activity this measure should increase average CPU usage (currently around 33 %). |
Send message Joined: 27 Sep 08 Posts: 865 Credit: 720,047,390 RAC: 173,391 ![]() ![]() ![]() |
From the discussion before its just this batch of WU's that could do with more memory? I don't see alot of writes only reads, so it doesn't seem like swaping? Anyhow I turned it up to 1 GB, as you can see I have plenty of memory for the ATLAS task so no problem for me. |
![]() Send message Joined: 15 Jun 08 Posts: 2634 Credit: 271,838,578 RAC: 91,465 ![]() ![]() |
One of my computers is currently running a Herwig7 (native) for more than 7.5 days. That task has 3 main processes Herwig, rivetvm.exe and runRivet.sh which use a total of >900MB physical RAM. A standard Theory VM is configured with 630 MB "physical RAM" which means the same task would be forced to swap out large amounts of data. Now just a guess: Imagine the scientific processes use a large data array (or a large DB) and traverses through it for each event that it processes. Then it permanently drops parts of the array from RAM to read the next parts from disk. This could explain the huge read activity as well as the low CPU usage since the CPU has to wait for that data. Beside that the VM needs RAM for the OS and the CVMFS cache. The CVMFS cache can be several GB and each object that is not already in the page cache has to be read from disk. Might be a good idea to increase the default RAM setting for Theory VMs on the project server. |
Send message Joined: 14 Jan 10 Posts: 1448 Credit: 9,719,650 RAC: 230 ![]() ![]() |
@Toby: Did you notice, whether this high disk-read throughput was during the integration phase or the event-processing phase? I suppose the latter. For me it was during the event processing phase and computezrmle confirmed that he saw three processes only used during event processing. @computezrmle: At the momemt I'm running a Herwig7.2.1 on a slow laptop (6th.gen), but after 4 days still in the integration phase (integrate 460 of 760) I gave this VM 1024MB of RAM and 2 threads. After the integration however only 4000 events have to be processed. Atm: From the swap: 965116 free, 83456 used and 196224 avail Mem. From Mem 72576 free, 559484 used and 370996 cache. |
![]() Send message Joined: 15 Jun 08 Posts: 2634 Credit: 271,838,578 RAC: 91,465 ![]() ![]() |
I gave this VM 1024MB of RAM and 2 threads. I'd leave the default of 1 thread for Theory VMs for the following reasons: - BOINC (LHC@home) configures Theory vbox as singlecore and gets confused if you change this - On saturated computers VirtualBox needs significantly more internal CPU cycles to run multicore VMs compared to singlecore - There's only 1 process using nearly all CPU%/TIME (here: Herwig) PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 617962 boinc3 39 19 1202136 858012 49436 R 97.54 0.651 90,00 Herwig 21468 boinc3 39 19 477256 32928 3864 S 1.536 0.025 39:25.23 rivetvm.exe 7028 boinc3 39 19 32464 13032 1972 S 0.230 0.010 15:53.74 runRivet.sh |
Send message Joined: 27 Sep 08 Posts: 865 Credit: 720,047,390 RAC: 173,391 ![]() ![]() ![]() |
Make sense, I imagine there is a large write at the begining to dump the db to page file then many reads later. Since the other projects use more memory it would make sense that the Theroy task have something similar or the theroy tasks are reserved for lighter work to support the user base with less powerful computers. 3rd option would be to split the work in to 2 types that users can opt into. |
Send message Joined: 27 Sep 08 Posts: 865 Credit: 720,047,390 RAC: 173,391 ![]() ![]() ![]() |
Looks like its processing? ![]() |
![]() Send message Joined: 15 Jun 08 Posts: 2634 Credit: 271,838,578 RAC: 91,465 ![]() ![]() |
Would be interesting if a higher RAM value for the VM leads to lower disk read activity combined with a higher CPU usage (in Vbox manager). |
Send message Joined: 27 Sep 08 Posts: 865 Credit: 720,047,390 RAC: 173,391 ![]() ![]() ![]() |
BOINC decide to run ATLAS now so there wasn't any new theroy running with 1 GB, they come up at some point. ![]() A little sad with these deadlines ![]() |
Send message Joined: 2 May 07 Posts: 2266 Credit: 175,669,348 RAC: 3,819 ![]() ![]() |
Herwig only in mcplot atm: https://mcplots-dev.cern.ch/production.php?view=runs&rev=2794&display=succ a few successful, but the most not successful! |
Send message Joined: 14 Jan 10 Posts: 1448 Credit: 9,719,650 RAC: 230 ![]() ![]() |
Herwig only in mcplot atm:You can't call 'unknown' as not successful. How I read the figures: 19423 attempts 1684 success 267 failure 17472 unknown |
![]() ![]() Send message Joined: 24 Oct 04 Posts: 1199 Credit: 67,133,848 RAC: 92,117 ![]() ![]() |
Herwig7 used to always run with no problems and I probably have some of those saved in my records since I tend to save a few of each version of event generator since I have always watched them start so I could tell which one it was......but lately they want to run for 10 days and are actually running the entire time when checking the running log |
Send message Joined: 18 Dec 15 Posts: 1868 Credit: 135,786,992 RAC: 94,102 ![]() ![]() |
what exactly does "unknown" mean ? The high number of "unknown" irritates me. Are they useful for the science, or are they not?Herwig only in mcplot atm:You can't call 'unknown' as not successful. How I read the figures: |
Send message Joined: 14 Jan 10 Posts: 1448 Credit: 9,719,650 RAC: 230 ![]() ![]() |
Unknown means: jobs still in the pipeline (on the server, in the (re)send-queue, processing by a client or elsewhere.what exactly does "unknown" mean ? The high number of "unknown" irritates me. Are they useful for the science, or are they not?Herwig only in mcplot atm:You can't call 'unknown' as not successful. How I read the figures: Latest figures: 19423 attempts 1757 success 271 failure 17395 unknown |
Send message Joined: 27 Sep 08 Posts: 865 Credit: 720,047,390 RAC: 173,391 ![]() ![]() ![]() |
1GB looks good. ![]() |
Send message Joined: 18 Dec 15 Posts: 1868 Credit: 135,786,992 RAC: 94,102 ![]() ![]() |
1GB looks good.is this app_config setting the correct way to increase the memory to 1GB: <cmdline>--memory_size_mb 1024</cmdline> |
Send message Joined: 14 Jan 10 Posts: 1448 Credit: 9,719,650 RAC: 230 ![]() ![]() |
1GB looks good.is this app_config setting the correct way to increase the memory to 1GB: An example of a whole app_config.xml for Windows: <app_config> <project_max_concurrent>3</project_max_concurrent> <app> <name>ATLAS</name> <max_concurrent>1</max_concurrent> </app> <app> <name>CMS</name> <max_concurrent>1</max_concurrent> </app> <app> <name>Theory</name> <max_concurrent>2</max_concurrent> </app> <app_version> <app_name>ATLAS</app_name> <plan_class>vbox64_mt_mcore_atlas</plan_class> <avg_ncpus>3</avg_ncpus> <cmdline>--memory_size_mb 4096 --nthreads 3</cmdline> </app_version> <app_version> <app_name>CMS</app_name> <plan_class>vbox64_mt_mcore_cms</plan_class> <avg_ncpus>3</avg_ncpus> <cmdline>--memory_size_mb 2048 --nthreads 4</cmdline> </app_version> <app_version> <app_name>Theory</app_name> <plan_class>vbox64_theory</plan_class> <avg_ncpus>1</avg_ncpus> <cmdline>--memory_size_mb 1024 --nthreads 1</cmdline> </app_version> </app_config> |
Send message Joined: 27 Sep 08 Posts: 865 Credit: 720,047,390 RAC: 173,391 ![]() ![]() ![]() |
After some time its still good, there is less reads than writes overall. about 45 MB of page file |
©2025 CERN