21) Message boards : Theory Application : Theory simulation takes way too long (Message 50040)
Posted 12 days ago by computezrmle
Post:
If I see a long running task on any of my systems that has a small chance to finish, I let it run.
If I see a task like the one in question, I cancel it.

On your system it's your decision.
You already mentioned the relevant numbers.
Why do you ask anybody else?
22) Message boards : Theory Application : Theory simulation takes way too long (Message 50038)
Posted 12 days ago by computezrmle
Post:
Theory tasks usually start with #events = 100000.
In very rare cases they don't finish within the 10 day limit.

If the long runtime is
- not caused by a local issue and
- mcplots does not get enough valid results for a given set of input parameters
the same task type is reissued with a lower #events.

This reduction may happen repeatedly until enough valid results are returned.
A statement like "Cern-IT have no interest" is simply wrong.
23) Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs (Message 50026)
Posted 14 days ago by computezrmle
Post:
some people have not been running the VirtualBox extension pack

It is not a must to install the extension pack if you just want to run a headless VM.

I've also seen some errors activating the "multiattach" feature we use...

This is most likely solved with the upcoming new vboxwrapper version from github.
I'll inform Laurence as soon as it is approved and merged over there.
24) Message boards : ATLAS application : Thank you and goodbye! (Message 50012)
Posted 16 days ago by computezrmle
Post:
You don't see an empty ATLAS queue here, do you?
That means scientists submit work to LHC@home.
Lots of work as can be seen here:
https://lhcathome.cern.ch/lhcathome/atlas_job.php

The details of the campaigns are usually not explained here.
So, guess they don't submit work just for fun.
25) Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs (Message 50008)
Posted 16 days ago by computezrmle
Post:
Couldn't resist and started a test on a laptop.

The first task started a 4-core VM and used CMS_2022_09_07_prod.vdi as harddisk (CMS Simulation v70.20 (vbox64)).
I requested a second task, got one, but a new CMS_2022_09_07.vdi was downloaded too (CMS Simulation v70.20 (vbox64_mt_mcore_cms)).
I checked the info of this 'fresh' HD" with: vboxmanage.exe showhdinfo d:\boinc1\projects\lhcathome.cern.ch_lhcathome\CMS_2022_09_07.vdi

The result:
VBoxManage.exe: error: Cannot register the hard disk 'D:\boinc1\projects\lhcathome.cern.ch_lhcathome\CMS_2022_09_07.vdi' {8fb925ef-3497-4bfb-88e3-bbab2930787f} because a hard disk 'D:\Boinc1\projects\lhcathomedev.cern.ch_lhcathome-dev\CMS_2022_09_07.vdi' with UUID {8fb925ef-3497-4bfb-88e3-bbab2930787f} already exists

So this task will crash when the UUID is not changed.
For the moment I'll do that for me locally with: vboxmanage internalcommands sethduuid "D:/boinc1/projects/lhcathome.cern.ch_lhcathome/CMS_2022_09_07.vdi"

UUID changed to: 40a2c82d-7fc7-4ed0-a7c6-163a7d3df252

This happens (usually after a BOINC restart) when the existing files are checked against their md5 hash.
Since the new UUID is written to the vdi file the md5 hash now doesn't match the one sent by the project server.
A new UUID works as long as you don't shut down BOINC.

You could also use "<dont_check_file_sizes>1</dont_check_file_sizes>" (sic!) in cc_config.xml to bypass the md5 check, but this affects all projects this client is connected to and could have other (unwanted) side effects.

The most reliable and permanent solution would be to get a correctly prepared vdi file from the project.
26) Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs (Message 50007)
Posted 16 days ago by computezrmle
Post:
According to CERN Grafana CMS distributes new singlecore tasks since yesterday late afternoon UTC.
27) Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs (Message 50006)
Posted 16 days ago by computezrmle
Post:
It has a different plan class.

Create an extra section in your app_config.xml and use
- <plan_class>vbox64</plan_class> for singlecore
- <plan_class>vbox64_mt_mcore_cms</plan_class> for multicore

See:
https://lhcathome.cern.ch/lhcathome/apps.php
https://boinc.berkeley.edu/wiki/Client_configuration#Project-level_configuration
28) Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs (Message 50001)
Posted 17 days ago by computezrmle
Post:
We talk about Windows here.

For many years you don't get the point:
CMS runs inside a Linux VM.
That Linux VM is the very same on a Windows host, on a (any) Linux host and even on Apple.


CMS need a correct functionally TCP/IP connection.
After 15 min. without connection, they are canceled.

Feel free to try to explain anything you want, but nobody forces you to explain things you obviously don't understand.
29) Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs (Message 49996)
Posted 17 days ago by computezrmle
Post:
Magic Quantum Mechanic wrote:
((Lots of text)) ...
I just love typing in the dark at 2am)

Lots of text to comment a post that wasn't for you.
Would have been better to type while light is switched on.
30) Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs (Message 49993)
Posted 17 days ago by computezrmle
Post:
Right.
A BOINC client that is attached to -dev and -prod has to deal with up to 4 CMS vdi files:
vdi-dev-old
vdi-dev-new
vdi-prod-old
vdi-prod-new

VirtualBox holds all virtual disks from the same user in 1 media registry which conflicts if they have the same name and/or the same UUID.
Hint_1: VirtualBox adds the UUID to the vdi file.
Hint_2: Using VBoxManage to clone the vdi file creates a new UUID while copying it does not

To avoid those conflicts the process should be:
Create a fresh vdi file for each app_version or use VBoxManage to clone the original one.

That way the new set of vdi files should look like:
vdi-dev-old -> CMS_2022_09_07.vdi (unchanged)
vdi-dev-new -> CMS_<release date>_mt_dev.vdi (for future mt releases)
vdi-prod-old -> CMS_2022_09_07_prod.vdi (unchanged)
vdi-prod-new -> CMS_<release date>_mt_prod.vdi (for future mt releases) (it's currently CMS_2022_09_07.vdi)

Hence, an updated vbox64_mt_mcore_cms app should be published.
ATLAS follows these scheme but it somehow got lost for CMS.
31) Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs (Message 49988)
Posted 17 days ago by computezrmle
Post:
This is (as of now) your most recently returned CMS task.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=410078311

The VM was a 1-core VM:
2024-04-21 16:17:27 (3178473): Setting CPU Count for VM. (1)


The task ran the envelope but didn't get a CMS job since the 1-core job queue is still dry.
Be aware that the envelope queue and the job queue are different.
The latter is much deeper in the process and has no direct connection to BOINC.

A good indicator is to compare runtime with CPU time.
Here: 33 min 40 sec vs. 2 min 9 sec
This means the VM tried a couple of times without success to get a job and finally gave up.

Since the short runtimes confuse BOINC's work fetch algorithm you will now get (in connection with a large work buffer) far too many CMS envelopes.
Once the job queue starts again to send jobs this may lead to a situation where your computer can't return all envelopes before the deadline.
Hence, keep your work buffer as small as possible.
32) Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs (Message 49981)
Posted 18 days ago by computezrmle
Post:
ATM there are only 4-core jobs in the backend queue.
The singlecore backend queue is empty.
33) Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs (Message 49953)
Posted 21 days ago by computezrmle
Post:
If the #cores must be configured at batch creation time, then please make a decision.

Either keep only the singlecore app
or drop the singlecore app and send out a multicore with a fix #cores that is in sync with the backend.

Do not mix batches having different core settings as this would break BOINC's work fetch, runtime estimation, credit system ...
34) Questions and Answers : Unix/Linux : theory simulation gets error at 14-minute mark (Message 49933)
Posted 27 days ago by computezrmle
Post:
I would also like to note that there is no warning/notice that additional software needs to be installed to run LHC tasks

It is mentioned at the homepage as well as a couple of times at the FAQ page:

https://lhcathome.cern.ch/lhcathome/
"Please note that some of the applications on LHC@home requre Virtual Box to be installed."

https://lhcathome.web.cern.ch/faq


As for CVMFS there's a pinned thread in "Number crunching":
Recommended CVMFS Configuration for Native Apps - HowTo v2

I accept that new volunteers might think this project works like "attach and forget", but I don't agree with those who immediately complain but obviously don't check the requirements, the forum and their logfiles when they get nothing but errors.
35) Questions and Answers : Windows : LHC not downloading work on new Windows 11 PC (Message 49930)
Posted 27 days ago by computezrmle
Post:
That's the point.
BOINC does not bundle a VM.
It bundles a VirtualBox software packet which can be used to run VMs.
Each (vbox) task from this project then runs vboxwrapper which creates, runs and finally removes a distinct VirtualBox VM for the task.

It usually does not matter where you get VirtualBox from, as long as it is a more or less recent version (currently 6.1 or 7.0).
All of them suffer from the same problems that are discussed in the other thread (and many posts before).
It's the Windows options that must be correctly used.

And that's also the reason why it makes no sense to discuss any self created VMs in this context.
Neither WSL based ones nor VirtualBox base ones.
36) Questions and Answers : Windows : LHC not downloading work on new Windows 11 PC (Message 49927)
Posted 28 days ago by computezrmle
Post:
WSL

Nope.
WSL is not "BOINC supplied".
And I asked rob about his interpretation.
37) Questions and Answers : Windows : LHC not downloading work on new Windows 11 PC (Message 49925)
Posted 28 days ago by computezrmle
Post:
The AMD Ryzen 9 7900X has virtualization support, but this is turned OFF by default, and report "virtualization not supported". Check in the BIOS that virtualization is enabled, then turn it ON. After rebooting that error message should disappear, however other messages may then appear, which may take a fair bit of work to clear them. Most common is trying to run the "BOINC supplied" form of virtual machine on a computer that has (or has had) the MS version of Linux installed, the two just aren't compatible with each other despite having very similar names....

What is this?
Could you please explain what you understand as '"BOINC supplied" form of virtual machine'?
38) Questions and Answers : Windows : LHC not downloading work on new Windows 11 PC (Message 49922)
Posted 28 days ago by computezrmle
Post:
Found it:
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10848959

This is what the computer details report:
Virtualbox (7.0.14) installed, CPU does not have hardware virtualization support

Now, check the forum for posts explaining this, e.g. this thread:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6098&postid=49307#49307
39) Questions and Answers : Windows : LHC not downloading work on new Windows 11 PC (Message 49921)
Posted 28 days ago by computezrmle
Post:
Hyper-V disabled, WSL installed.

This is mutually exclusive.
If you have WSL enabled Hyper-V is not disabled, hence may prohibit VirtualBox.

Please make your computers visible to check this.
40) Questions and Answers : Unix/Linux : theory simulation gets error at 14-minute mark (Message 49913)
Posted 29 days ago by computezrmle
Post:
I could also just steal the deb packages from the Ubuntu 23 apt sources

You can try this with a precompiled packet, but there's a risk that the new packet is build against libraries that are not up to date on your running system.
In most cases this affects BOINC Manager which is build against wxWidgets.

You may follow these steps:
1. Download the new BOINC client packet to a temporary directory but do NOT install it
2. cd to that directory
3. extract the files boinc, boinccmd and boincmgr
4. run "ldd boinc", "ldd boinccmd", "ldd boincmgr"

If any of the commands (mostly "ldd boincmgr") prints errors like "foobar.x.yz => not found" that library is missing.
Either install it (exactly the requested version) or do not use that BOINC version.

If "ldd ..." doesn't print an error you have a really good chance that it works.
Try the manager first.
1. make a backup of the old "boincmgr" file
2. replace the old file with the new one
3. Run the new BOINC manager

If this succeeds it should be save to also replace boinc/boinccmd and restart BOINC.


Previous 20 · Next 20


©2024 CERN