Message boards :
CMS Application :
CMS Tasks Failing
Message board moderation
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · Next
Author | Message |
---|---|
Send message Joined: 15 Feb 06 Posts: 67 Credit: 460,896 RAC: 0 |
I tried switching off CMS tasks in my Preferences... It doesn't work. In desperation I've stopped all work from the project. Thankfully, LHC *does* respect that (there's a number that don't). However, if it's any help, I can conform that CMS is broken even with a clean install on a brand new computer. sQuonk Plague of Mice Intel Core i3-9100 CPU@3.60 GHz, but it's doing its bit just the same. |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 456 |
vt-x need to be enabled in the BIOS of a Intel-PC, also Hyper-V in Windows need to be DISABLED. After a reboot and other Errors, please report it. |
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 298 |
vt-x need to be enabled in the BIOS of a Intel-PC, Is there a good recipe for disabling Hyper-V? I tried several methods found on the Web, but still I get Virtualization Virtualbox (6.1.12) installed, CPU does not have hardware virtualization support in https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10653693 I thought it might have been because I had Windows Subsystem for Linux installed, but after I removed that, Hyper-V still comes back every time I boot. |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 456 |
Have only one Intel(HP) and there is HYPER-V in Windows-Features not enabled all other PC are AMD (SVM for Virtualization in BIOS). No idea myself, why Hyper-V is enabled after reboot, sorry. |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
Is there a good recipe for disabling Hyper-V? A recent comment posted by Microsoft: https://docs.microsoft.com/en-us/troubleshoot/windows-client/application-management/virtualization-apps-not-work-with-hyper-v |
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 298 |
Is there a good recipe for disabling Hyper-V? Thanks. I've done most of those I think, but I'll go through it step-by-step. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
After a long time I got a CMS task, which ended in failure. Condor ended in 10656 s. Is that right? Tullio |
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 298 |
After a long time I got a CMS task, which ended in failure. Condor ended in 10656 s. Is that right? CPU time seems credible for running one job, but unfortunately there's not enough information in your log file to say why it then failed. I'm seeing suspicions of network problems overall, but nothing concrete to put my finger on just yet. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I am runnig both Atlas, Theory and Sixtrack on this same CPU, plus QuChemPedIA@home, all using VirtualBox save SixTrack and all run well, On QuChem, using VirtualBox because it is a Linux project. I am faster than most Linux CPUs, even those with 128 processors. I am using a 6 processor Intel i5 9400F CPU. I have the rank 51 in RAC. Tullio |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
Another CMS task failed exactly with the same message. I am now running 4 Theory tasks. I had to remove the McAfee antivirus program to run Atlas tasks and am now using Windows Defender. Tullio |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
Condor ended after 10637 seconds. Atlas and Theory tasks all complete. QuChemPedIA@home . using VirtualBox, run perfeclttly. I am number 50 in RAC rank, although my Intel i5 CPU is far inferior to Intel i7 and AMD Ryzen Threadripper CPUs running Linux. They take a longer time without using VirtualBox. Tullio. |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 456 |
Tullio, can you ping vocms0267.cern.ch on a shell. This Condor-Server is using Port 9618 when CMS Task is running. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
Pinged it from a Linux virtual machine on a Windows 10 host. 27 packets transmitted, 0 packet loss. Thanks anyway Tullio wifi at 5 GHz |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
At the end of a subtask calculation CMS uploads a 120 MB result file directly from inside the VM to vc-cms-output.s3.cern.ch. Beside that CMS reports the status to: vc-cms-output.s3.cern.ch port 443 WMAgent port 4080 HTCondor port 9618 From the logs and the total runtime it can be seen that the errors always happen at that point and since the VM doesn't get a 2nd subtask the whole task fails after a couple of attempts. Of course, this is a nasty behaviour, but ATM we have to deal with it. The important thing is to find out whether the result upload, the reporting or the request for fresh work fails. Unfortunately the logfile doesn't tell us any details. Wi-Fi might be a factor. It's nice to know that your wi-fi is running at 5 GHz but this doesn't tell us anything about the connection stability and net. data rates. A cable connection should be used whenever possible. The upload of the 120 MB result file should be visible in the network monitoring, either on the host or at the internet router. If this upload fails corresponding error messages appear at the VM consoles - you may look at ALT-F4, ALT-F5 ... and post them here. Another factor could be a malware protection suite that is configured to firewall some of the communication packets but lets packets pass that are used for the VM's basic network tests. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I am completing Atlas tasks on 2 CPUs and QuChem tasks where I am in the top 50 users against many Linux hosts with up to 128 processors. My tasks using VirtualBox are faster than most Linux hosts, with rare exceptions. They don't use VirtualBox. Tullio |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
...and QuChem tasks where I am in the top 50 users against many Linux hosts with up to 128 processors. That's very nice. Nonetheless, I don't know the network requirements of QuChem. Hence I can't compare them to CMS. Even ATLAS has other requirements than CMS, especially regarding the server side job distribution systems. ATLAS contacts Panda while CMS contacts HTCondor and WMAgent. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
OK. But I complete also Theory tasks, that I run since it was callet Test4Theory@home, on invitation by Ben Segal, who has sent me a handwritten letter and a polo shirt. Happy New Year, Ben! Tullio |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 456 |
+1 Found this thread about HT Condor from 7 Years ago: https://lists.cs.wisc.edu/archive/htcondor-users/2013-January/msg00139.shtml |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
One of my VMs got it's 1st CMS subtask this morning 5:34:40 UTC and successfully finished it 8:50:01 UTC. This is a runtime of >3:15. A few seconds after the result upload the same VM got it's 2nd subtask. Hence, I doubt the VMs are affected by a 7 year old issue that might have been a disk space error on a Windows machine rather than a bug inside a VM running Linux. This should not initiate an OS war, it's just that - independent from the host OS - CMS always runs on the same Linux VM image. In addition inside this VM the scientific apps are encapsulated a second time in a singularity container. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
Frankly, I don't know much about containers. I used to run Test4Theory@home tasks on a Linux host. Now I run QuChem Linux tasks on a Windows 10 host using a wrapper. They all run well and that satisfies me. Tullio |
©2024 CERN