Message boards :
Theory Application :
New version v300.20
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
This new version has an updated vboxwrapper and the images are cloned to ensure a unique ID. |
Send message Joined: 14 Jan 10 Posts: 1409 Credit: 9,325,730 RAC: 9,392 |
<multiattach_vdi_file>Theory_2024_04_29_dev.xml</multiattach_vdi_file> |
Send message Joined: 28 Sep 04 Posts: 719 Credit: 48,121,938 RAC: 32,196 |
All seem to fail with error: Command: VBoxManage -q showhdinfo "C:\ProgramData\BOINC/projects/lhcathome.cern.ch_lhcathome/Theory_2024_04_29_dev.xml" Output: VBoxManage.exe: error: Could not find file for the medium 'C:\ProgramData\BOINC\projects\lhcathome.cern.ch_lhcathome\Theory_2024_04_29_dev.xml' (VERR_FILE_NOT_FOUND) VBoxManage.exe: error: Details: code VBOX_E_FILE_ERROR (0x80bb0004), component MediumWrap, interface IMedium, callee IUnknown VBoxManage.exe: error: Context: "OpenMedium(Bstr(pszFilenameOrUuid).raw(), enmDevType, enmAccessMode, fForceNewUuidOnOpen, pMedium.asOutParam())" at line 179 of file VBoxManageDisk.cpp 2024-04-29 20:56:12 (21480): Could not create VM 2024-04-29 20:56:12 (21480): ERROR: VM failed to start 2024-04-29 20:56:12 (21480): Powering off VM. 2024-04-29 20:56:12 (21480): Deregistering VM. (boinc_106b2e9625f4f065, slot#5) 2024-04-29 20:56:12 (21480): Removing network bandwidth throttle group from VM. 2024-04-29 20:56:12 (21480): Removing VM from VirtualBox. |
Send message Joined: 18 Dec 15 Posts: 1782 Credit: 116,755,102 RAC: 74,714 |
same problem here |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
Sorry, I started from the wrong xml file. New version v300.30 will fix this. |
Send message Joined: 4 Sep 22 Posts: 90 Credit: 15,099,564 RAC: 30,947 |
All the new Theory tasks are failing immediately on openSUSE Leap 15.5, and presumably also on all previous versions as well. It appears as if this is because the new tasks require a version of glibc which is not yet available in the main repos (ver 2.31 is the current version while the tasks appear to require 2.34). It looks like this situation will prevail until Leap 15.6 is released in early June; that release should include glibc 2.38 -- at least, that is what I am reading in the 15.6 repos. Anyone running openSUSE 15.5 or earlier has two options: 1) stop fetching Theory tasks until you have upgraded your system; 2) replace the OS with either the slowroll version (http://http://download.opensuse.org/slowroll/) or with Tumbleweed. |
Send message Joined: 4 Sep 22 Posts: 90 Credit: 15,099,564 RAC: 30,947 |
For the past 90 minutes, all Theory tasks have failed on my system after 3 minutes. https://lhcathome.cern.ch/lhcathome/result.php?resultid=410987839 |
Send message Joined: 14 Jan 10 Posts: 1409 Credit: 9,325,730 RAC: 9,392 |
For the past 90 minutes, all Theory tasks have failed on my system after 3 minutes.Could you retry after you have deleted the Theory-vdi's from VirtualBox Media manager without removing them from disk? |
Send message Joined: 4 Sep 22 Posts: 90 Credit: 15,099,564 RAC: 30,947 |
For the past 90 minutes, all Theory tasks have failed on my system after 3 minutes.Could you retry after you have deleted the Theory-vdi's from VirtualBox Media manager without removing them from disk? Did you mean to delete the Theory*.vdi's from the media manager? I had a task where the VM crashed, but it was still listed in the media manager. I deleted that, but left the Theory*.vdi entries alone. I just had a bunch of Theory tasks fail with the same error, and one of them left an orphan task file in the media manager. My primary interest in LHC is the CMS stuff anyway, so I'm probably not going to re-enable Theory tasks. |
Send message Joined: 15 Jun 08 Posts: 2519 Credit: 250,933,425 RAC: 128,055 |
... My primary interest in LHC is the CMS stuff ... For nearly a week CMS sent only 4-core jobs. Since then each CMS task your computer ran did a basic setup, ran 2 short benchmarks and shut itself down without doing any scientific work. To run those 4-core jobs the VM must be configured to allocate at least 4 cores (e.g. via the web prefs). Your VMs report 1 core: 2024-05-13 05:24:06 (31720): Setting CPU Count for VM. (1) |
Send message Joined: 4 Sep 22 Posts: 90 Credit: 15,099,564 RAC: 30,947 |
... My primary interest in LHC is the CMS stuff ... Could you then explain to me why, over the past week, my system has run and completed over 1000 CMS tasks, and received proper credit for same? Or is this observation about the previous week? To run those 4-core jobs the VM must be configured to allocate at least 4 cores (e.g. via the web prefs). The last time I configured my LHC account for multi-core tasks was shortly after I joined. I had been receiving tasks from all 3 projects before I made that change, after all I got from LHC was multi-core ATLAS tasks. Furthermore, the credit given for multi-core tasks is pathetic. Credit is awarded for total run-time, not CPU time. Well, let's look at that. Suppose we have a task that will run on one core for 4 hours. Close enough for government work, the CPU time will also be 4 hours. Now suppose that same task runs on 4 cores. It will require only 1 hour of run time, but will consume the same 4 hours of CPU time, but will be given only 1/4 the credit, even though it has consumed the same amount of computer resources as the first task. Where is the incentive to run 4-core tasks instead of 1-core tasks, when the credit is only 1/4 as much. Please do not mistake me for a credit whore, because I am not; I simply think that my computer resources should be worth the same no matter how much real time it takes to complete the work. Getting the finished results back to the people who want them in the shortest possible time is not the primary concern in any of this; in fact, it is of no concern to me except that it be returned before the task's allotted time expires. |
Send message Joined: 15 Jun 08 Posts: 2519 Credit: 250,933,425 RAC: 128,055 |
I already explained what your VMs did: "... ran a basic setup, ran 2 short benchmarks and shut itself down ..." That's why the runtime/CPU time is so low. Real job processing takes 2-6 h (averages). Compare that to computers running 4-core VMs. The short runtime pattern is typical for a VM not getting a job via WMAgent. The same pattern can be seen when the backend queue is empty. BOINC credits are given for valid envelopes no matter whether the VM processed a scientific job or not. BOINC simply does not understand the various return codes from the deeper script levels. Hence those are mostly hidden and a "success" is reported. In addition: - Multicore is new for CMS, hence some backend settings need to be tested/adjusted. - Credit issues have been dicussed multiple times for years. Find related posts and try to understand them. |
Send message Joined: 4 Sep 22 Posts: 90 Credit: 15,099,564 RAC: 30,947 |
I already explained what your VMs did: Please assume you are talking to someone with zero understanding of the inner workings of Boinc and VBox. That "ran a basic setup...." bit was so obscure to me as to be meaningless. Your latest seems to be suggesting that it is somehow my fault that the VM was not getting a job via WMAgent. I fail to see how that is even possible. I was set up to receive single-core tasks, that is what I got, and they ran as long as they ran. I should not (could not?) have been receiving 4-core tasks that somehow got run in only 1 thread. OK, so I set my preferences to 4 CPU tasks; it took a lot of fiddling to get my client to realize it had nothing from LHC in the job queue (it kept telling me I didn't need anything from this site) -- basically I had to turn of both Einstein and Rosetta -- and even then I had to turn off Atlas and Theory before I got any CMS tasks. Now I have CMS tasks running on 4 threads -- but that is 1/3 of the total capability of my machine, and I still have Einstein and Rosetta to bring back in. It looks like I will be pretty much stuck with 2, maybe 3, CMS tasks tops -- and that doesn't even take into consideration that I might with to bring Atlas and Theory back into the picture. I sure hope it doesn't take too long before the client gets it all sorted out -- ATM, the CMS tasks have been running for about an hour, and the client is telling me they still have almost 17 hours to go before completion. In addition: I hardly have the time to go searching through years of threads in multiple forums to find relevant threads, even if I start guessing at what search phrases *might* possibly find relevant material. HOWEVER... this is hardly an issue I am going to pursue any further. The only important thing right now is to get my system doing the projects I want to run. |
Send message Joined: 4 Sep 22 Posts: 90 Credit: 15,099,564 RAC: 30,947 |
Hello-ooooo? Is anyone looking at this stuff? Preferably someone who can fix it? Theory tasks are still failing after only 3 minutes. This one is from a week ago: https://lhcathome.cern.ch/lhcathome/result.php?resultid=411034415 And this one is from a few minutes ago: https://lhcathome.cern.ch/lhcathome/result.php?resultid=411228620 |
Send message Joined: 2 May 07 Posts: 2220 Credit: 173,696,209 RAC: 24,770 |
VBoxManage: error: Could not find a bandwidth group named 'boinc_372f2cc2be2d23c7_net' mittlere Uploadgeschwindigkeit 6525877.03 KB/sek mittlere Downloadgeschwindigkeit 7079.84 KB/sek Can you take a look to your networking for this VM? |
Send message Joined: 4 Sep 22 Posts: 90 Credit: 15,099,564 RAC: 30,947 |
VBoxManage: error: Could not find a bandwidth group named 'boinc_372f2cc2be2d23c7_net' And how would I go about doing that? That VM was removed from my system an hour ago. |
Send message Joined: 2 May 07 Posts: 2220 Credit: 173,696,209 RAC: 24,770 |
Have no idea, what going wrong with your network. You can limiting network parameter in OpenSuse, for example |
Send message Joined: 4 Sep 22 Posts: 90 Credit: 15,099,564 RAC: 30,947 |
Have no idea, what going wrong with your network. There's absolutely nothing wrong with my network. The error is about one or more files that are not present, not about networking errors. |
Send message Joined: 15 Jun 08 Posts: 2519 Credit: 250,933,425 RAC: 128,055 |
Post the output of mount |grep shm ; ls -hal /dev/shm/ Then prepare for a reboot. Then reboot. |
Send message Joined: 4 Sep 22 Posts: 90 Credit: 15,099,564 RAC: 30,947 |
Post the output of Why? |
©2024 CERN