Thread 'New version v300.20'

Author	Message
Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,177 RAC: 53	Message 50078 - Posted: 29 Apr 2024, 14:32:41 UTC This new version has an updated vboxwrapper and the images are cloned to ensure a unique ID. ID: 50078 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1556 Credit: 10,100,748 RAC: 1,717	Message 50081 - Posted: 29 Apr 2024, 15:18:45 UTC - in response to Message 50078. <multiattach_vdi_file>Theory_2024_04_29_dev.xml</multiattach_vdi_file> ID: 50081 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 804 Credit: 66,015,403 RAC: 28,139	Message 50082 - Posted: 29 Apr 2024, 17:59:56 UTC All seem to fail with error: Command: VBoxManage -q showhdinfo "C:\ProgramData\BOINC/projects/lhcathome.cern.ch_lhcathome/Theory_2024_04_29_dev.xml" Output: VBoxManage.exe: error: Could not find file for the medium 'C:\ProgramData\BOINC\projects\lhcathome.cern.ch_lhcathome\Theory_2024_04_29_dev.xml' (VERR_FILE_NOT_FOUND) VBoxManage.exe: error: Details: code VBOX_E_FILE_ERROR (0x80bb0004), component MediumWrap, interface IMedium, callee IUnknown VBoxManage.exe: error: Context: "OpenMedium(Bstr(pszFilenameOrUuid).raw(), enmDevType, enmAccessMode, fForceNewUuidOnOpen, pMedium.asOutParam())" at line 179 of file VBoxManageDisk.cpp 2024-04-29 20:56:12 (21480): Could not create VM 2024-04-29 20:56:12 (21480): ERROR: VM failed to start 2024-04-29 20:56:12 (21480): Powering off VM. 2024-04-29 20:56:12 (21480): Deregistering VM. (boinc_106b2e9625f4f065, slot#5) 2024-04-29 20:56:12 (21480): Removing network bandwidth throttle group from VM. 2024-04-29 20:56:12 (21480): Removing VM from VirtualBox. ID: 50082 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1984 Credit: 162,050,164 RAC: 85,663	Message 50083 - Posted: 29 Apr 2024, 18:23:07 UTC same problem here ID: 50083 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,177 RAC: 53	Message 50085 - Posted: 29 Apr 2024, 19:44:12 UTC - in response to Message 50083. Sorry, I started from the wrong xml file. New version v300.30 will fix this. ID: 50085 · Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 101 Credit: 18,845,362 RAC: 10,341	Message 50087 - Posted: 29 Apr 2024, 22:11:51 UTC All the new Theory tasks are failing immediately on openSUSE Leap 15.5, and presumably also on all previous versions as well. It appears as if this is because the new tasks require a version of glibc which is not yet available in the main repos (ver 2.31 is the current version while the tasks appear to require 2.34). It looks like this situation will prevail until Leap 15.6 is released in early June; that release should include glibc 2.38 -- at least, that is what I am reading in the 15.6 repos. Anyone running openSUSE 15.5 or earlier has two options: 1) stop fetching Theory tasks until you have upgraded your system; 2) replace the OS with either the slowroll version (http://http://download.opensuse.org/slowroll/) or with Tumbleweed. ID: 50087 · Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 101 Credit: 18,845,362 RAC: 10,341	Message 50169 - Posted: 11 May 2024, 23:55:48 UTC For the past 90 minutes, all Theory tasks have failed on my system after 3 minutes. https://lhcathome.cern.ch/lhcathome/result.php?resultid=410987839 ID: 50169 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1556 Credit: 10,100,748 RAC: 1,717	Message 50170 - Posted: 12 May 2024, 7:47:38 UTC - in response to Message 50169. For the past 90 minutes, all Theory tasks have failed on my system after 3 minutes. https://lhcathome.cern.ch/lhcathome/result.php?resultid=410987839 Could you retry after you have deleted the Theory-vdi's from VirtualBox Media manager without removing them from disk? ID: 50170 · Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 101 Credit: 18,845,362 RAC: 10,341	Message 50173 - Posted: 13 May 2024, 9:32:32 UTC - in response to Message 50170. For the past 90 minutes, all Theory tasks have failed on my system after 3 minutes. https://lhcathome.cern.ch/lhcathome/result.php?resultid=410987839 Could you retry after you have deleted the Theory-vdi's from VirtualBox Media manager without removing them from disk? Did you mean to delete the Theory.vdi's from the media manager? I had a task where the VM crashed, but it was still listed in the media manager. I deleted that, but left the Theory.vdi entries alone. I just had a bunch of Theory tasks fail with the same error, and one of them left an orphan task file in the media manager. My primary interest in LHC is the CMS stuff anyway, so I'm probably not going to re-enable Theory tasks. ID: 50173 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2753 Credit: 304,095,708 RAC: 113,138	Message 50174 - Posted: 13 May 2024, 12:19:20 UTC - in response to Message 50173. ]... My primary interest in LHC is the CMS stuff ...[/quote] For nearly a week CMS sent only 4-core jobs. Since then each CMS task your computer ran did a basic setup, ran 2 short benchmarks and shut itself down without doing any scientific work. To run those 4-core jobs the VM must be configured to allocate at least 4 cores (e.g. via the web prefs). Your VMs report 1 core: [pre]2024-05-13 05:24:06 (31720): Setting CPU Count for VM. (1)[/pre] ID: 50174 · Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 101 Credit: 18,845,362 RAC: 10,341	Message 50180 - Posted: 14 May 2024, 0:58:45 UTC - in response to Message 50174. ] ... My primary interest in LHC is the CMS stuff ... For nearly a week CMS sent only 4-core jobs. Since then each CMS task your computer ran did a basic setup, ran 2 short benchmarks and shut itself down without doing any scientific work.[/quote] Could you then explain to me why, over the past week, my system has run and completed over 1000 CMS tasks, and received proper credit for same? Or is this observation about the previous week? To run those 4-core jobs the VM must be configured to allocate at least 4 cores (e.g. via the web prefs). Your VMs report 1 core: [pre]2024-05-13 05:24:06 (31720): Setting CPU Count for VM. (1)[/pre] The last time I configured my LHC account for multi-core tasks was shortly after I joined. I had been receiving tasks from all 3 projects before I made that change, after all I got from LHC was multi-core ATLAS tasks. Furthermore, the credit given for multi-core tasks is pathetic. Credit is awarded for total run-time, not CPU time. Well, let's look at that. Suppose we have a task that will run on one core for 4 hours. Close enough for government work, the CPU time will also be 4 hours. Now suppose that same task runs on 4 cores. It will require only 1 hour of run time, but will consume the same 4 hours of CPU time, but will be given only 1/4 the credit, even though it has consumed the same amount of computer resources as the first task. Where is the incentive to run 4-core tasks instead of 1-core tasks, when the credit is only 1/4 as much. Please do not mistake me for a credit whore, because I am not; I simply think that my computer resources should be worth the same no matter how much real time it takes to complete the work. Getting the finished results back to the people who want them in the shortest possible time is not the primary concern in any of this; in fact, it is of no concern to me except that it be returned before the task's allotted time expires. ID: 50180 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2753 Credit: 304,095,708 RAC: 113,138	Message 50182 - Posted: 14 May 2024, 7:34:52 UTC - in response to Message 50180. I already explained what your VMs did: "... ran a basic setup, ran 2 short benchmarks and shut itself down ..." That's why the runtime/CPU time is so low. Real job processing takes 2-6 h (averages). Compare that to computers running 4-core VMs. The short runtime pattern is typical for a VM not getting a job via WMAgent. The same pattern can be seen when the backend queue is empty. BOINC credits are given for valid envelopes no matter whether the VM processed a scientific job or not. BOINC simply does not understand the various return codes from the deeper script levels. Hence those are mostly hidden and a "success" is reported. In addition: - Multicore is new for CMS, hence some backend settings need to be tested/adjusted. - Credit issues have been dicussed multiple times for years. Find related posts and try to understand them. ID: 50182 · Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 101 Credit: 18,845,362 RAC: 10,341	Message 50183 - Posted: 14 May 2024, 12:21:49 UTC - in response to Message 50182. I already explained what your VMs did: "... ran a basic setup, ran 2 short benchmarks and shut itself down ..." That's why the runtime/CPU time is so low. Real job processing takes 2-6 h (averages). Compare that to computers running 4-core VMs. The short runtime pattern is typical for a VM not getting a job via WMAgent. The same pattern can be seen when the backend queue is empty. BOINC credits are given for valid envelopes no matter whether the VM processed a scientific job or not. BOINC simply does not understand the various return codes from the deeper script levels. Hence those are mostly hidden and a "success" is reported. Please assume you are talking to someone with zero understanding of the inner workings of Boinc and VBox. That "ran a basic setup...." bit was so obscure to me as to be meaningless. Your latest seems to be suggesting that it is somehow my fault that the VM was not getting a job via WMAgent. I fail to see how that is even possible. I was set up to receive single-core tasks, that is what I got, and they ran as long as they ran. I should not (could not?) have been receiving 4-core tasks that somehow got run in only 1 thread. OK, so I set my preferences to 4 CPU tasks; it took a lot of fiddling to get my client to realize it had nothing from LHC in the job queue (it kept telling me I didn't need anything from this site) -- basically I had to turn of both Einstein and Rosetta -- and even then I had to turn off Atlas and Theory before I got any CMS tasks. Now I have CMS tasks running on 4 threads -- but that is 1/3 of the total capability of my machine, and I still have Einstein and Rosetta to bring back in. It looks like I will be pretty much stuck with 2, maybe 3, CMS tasks tops -- and that doesn't even take into consideration that I might with to bring Atlas and Theory back into the picture. I sure hope it doesn't take too long before the client gets it all sorted out -- ATM, the CMS tasks have been running for about an hour, and the client is telling me they still have almost 17 hours to go before completion. In addition: - Multicore is new for CMS, hence some backend settings need to be tested/adjusted. - Credit issues have been dicussed multiple times for years. Find related posts and try to understand them. I hardly have the time to go searching through years of threads in multiple forums to find relevant threads, even if I start guessing at what search phrases might possibly find relevant material. HOWEVER... this is hardly an issue I am going to pursue any further. The only important thing right now is to get my system doing the projects I want to run. ID: 50183 · Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 101 Credit: 18,845,362 RAC: 10,341	Message 50205 - Posted: 20 May 2024, 1:27:26 UTC Hello-ooooo? Is anyone looking at this stuff? Preferably someone who can fix it? Theory tasks are still failing after only 3 minutes. This one is from a week ago: https://lhcathome.cern.ch/lhcathome/result.php?resultid=411034415 And this one is from a few minutes ago: https://lhcathome.cern.ch/lhcathome/result.php?resultid=411228620 ID: 50205 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2304 Credit: 179,727,092 RAC: 20,376	Message 50206 - Posted: 20 May 2024, 2:20:06 UTC - in response to Message 50205. VBoxManage: error: Could not find a bandwidth group named 'boinc_372f2cc2be2d23c7_net' mittlere Uploadgeschwindigkeit 6525877.03 KB/sek mittlere Downloadgeschwindigkeit 7079.84 KB/sek Can you take a look to your networking for this VM? ID: 50206 · Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 101 Credit: 18,845,362 RAC: 10,341	Message 50207 - Posted: 20 May 2024, 2:39:07 UTC - in response to Message 50206. VBoxManage: error: Could not find a bandwidth group named 'boinc_372f2cc2be2d23c7_net' mittlere Uploadgeschwindigkeit 6525877.03 KB/sek mittlere Downloadgeschwindigkeit 7079.84 KB/sek Can you take a look to your networking for this VM? And how would I go about doing that? That VM was removed from my system an hour ago. ID: 50207 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2304 Credit: 179,727,092 RAC: 20,376	Message 50208 - Posted: 20 May 2024, 2:46:38 UTC - in response to Message 50207. Have no idea, what going wrong with your network. You can limiting network parameter in OpenSuse, for example ID: 50208 · Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 101 Credit: 18,845,362 RAC: 10,341	Message 50209 - Posted: 20 May 2024, 3:05:59 UTC - in response to Message 50208. Have no idea, what going wrong with your network. You can limiting network parameter in OpenSuse, for example There's absolutely nothing wrong with my network. The error is about one or more files that are not present, not about networking errors. ID: 50209 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2753 Credit: 304,095,708 RAC: 113,138	Message 50210 - Posted: 20 May 2024, 5:42:29 UTC - in response to Message 50209. he output of [pre]mount \|grep shm ; ls -hal /dev/shm/[/pre] Then prepare for a reboot. Then reboot. ID: 50210 · Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 101 Credit: 18,845,362 RAC: 10,341	Message 50211 - Posted: 20 May 2024, 6:20:55 UTC - in response to Message 50210. ]Post the output of [pre]mount \|grep shm ; ls -hal /dev/shm/[/pre] Then prepare for a reboot. Then reboot.[/quote] Why? ID: 50211 · Reply Quote