Message boards :
CMS Application :
What is this nonsense?
Message board moderation
Author | Message |
---|---|
Send message Joined: 4 Sep 22 Posts: 91 Credit: 15,771,109 RAC: 15,908 |
I have now had 52 CMS tasks in a row fail and all the logs I've checked show these messages: 2024-07-25 15:59:32 (21088): Guest Log: [INFO] Requesting an idtoken from LHC@home 2024-07-25 15:59:33 (21088): Guest Log: [INFO] CMS application starting. Check log files. 2024-07-25 16:11:48 (21088): Guest Log: [ERROR] VM expects at least 4 CPUs but reports only 2. 2024-07-25 16:11:48 (21088): Guest Log: [DEBUG] Volunteer: hadron (806228) 2024-07-25 16:11:48 (21088): Guest Log: [INFO] Shutting Down. 2024-07-25 16:12:18 (21088): VM Completion File Detected. 2024-07-25 16:12:18 (21088): VM Completion Message: VM expects at least 4 CPUs but reports only 2. So what? Is CMS now demanding that I must run tasks on 4 CPUs? This is the <app_version> section for CMS: <app_version> <app_name>CMS</app_name> <avg_ncpus>2</avg_ncpus> <plan_class>vbox64_mt_mcore_cms</plan_class> <cmdline>--nthreads 2</cmdline> </app_version>It's been like this since CMS became capable of running on multiple threads, without problem until now. So I have changed the settings to run the tasks on 4 threads, and so far, things are looking OK. |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 2,013 |
2024-07-25 16:12:18 (21088): VM Completion Message: VM expects at least 4 CPUs but reports only 2. |
Send message Joined: 4 Sep 22 Posts: 91 Credit: 15,771,109 RAC: 15,908 |
2024-07-25 16:12:18 (21088): VM Completion Message: VM expects at least 4 CPUs but reports only 2. Yes, maeax. I can read, but you haven't even tried to answer my question. Once again, is this the reason why all those CMS tasks failed after only 14 to 16 minutes? |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 2,013 |
Ok, but, what if Cern-IT had changed it in the .xml? |
Send message Joined: 4 Sep 22 Posts: 91 Credit: 15,771,109 RAC: 15,908 |
Ok, You're just speculating. Anyway, that file is overridden by whatever is in the app_config.xml file. I've set my config back to 2 threads per task to see if the error returns -- and yes, it has. 5 tasks failed within 14 to 16 minutes. Back to 4 CPUs, even if they do take 12 to 13 hours to run. My next question is one I know you will not/can not have an answer for, maeax -- just what gives anyone at CERN the right to dictate how I am allowed to allocate the resources of my computer? |
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,737,455 RAC: 1,317 |
Ok, We don't dictate how you allocate your resources. We do specify what resources are required to properly run our simulations. |
Send message Joined: 4 Sep 22 Posts: 91 Credit: 15,771,109 RAC: 15,908 |
My next question is one I know you will not/can not have an answer for, maeax -- just what gives anyone at CERN the right to dictate how I am allowed to allocate the resources of my computer? OK, so are you saying that yes, CMS tasks will only run if they are allocated 4 threads? If that is true, then you most certainly are dictating how I allocate my resources. |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,880,596 RAC: 39,051 |
@Ivan +1 @hadron Did you really believe that your computer delivered valid results within 30 min/2.5 min while other computers need many hours? How naive! In reality your computer got credits for empty envelopes without any scientific payload (for many weeks!). This has now been stopped. Like all volunteers you have to respect the requirements and set up your VMs accordingly. Your choice is to either do so or to leave. |
Send message Joined: 4 Sep 22 Posts: 91 Credit: 15,771,109 RAC: 15,908 |
@Ivan Please tell me then, just why can I not set these tasks to run on only 2 threads? Every task running on only 2 has failed. That has never happened before. Now, the tasks will not run unless I give them 4 threads. Why? This does not happen with Atlas; those run just fine with however many threads I allow them to have, from 1 to 8. Why can CMS tasks not be configured the same way? And no, I was not naive when all those tasks were completing with no real work being done. I just assumed that was because they were testing things to make sure they had got it right. Back then, I could set CMS tasks to run on any number of threads between 1 and 4, and they completed just fine. Now they will not -- they must be given 4 threads, or they will fail. If all you have to offer is "like it or leave", then I think your help desk "expert" credentials are in serious need of review. |
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,470,586 RAC: 3,147 |
This has now been stopped.Since then it seems I only get this when requesting CMS-tasks: LHC@home 26 Jul 19:21:48 Requesting new tasks for CPU LHC@home 26 Jul 19:21:49 Scheduler request completed: got 0 new tasks LHC@home 26 Jul 19:21:49 No tasks sent LHC@home 26 Jul 19:21:49 No tasks are available for CMS Simulation Even reset and detaching the project did not solve this. No app_config.xml in use and project setting 1 job and CPUs no limit or # of CPUs 4. |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,880,596 RAC: 39,051 |
The CMS patch activated last night affects the process inside the VM. It has nothing to do with BOINC (especially the work fetch). Hence, BOINC related issues are not caused by the CMS patch. |
Send message Joined: 4 Sep 22 Posts: 91 Credit: 15,771,109 RAC: 15,908 |
The CMS patch activated last night affects the process inside the VM. That is less than useless. This is not a BOINC-related issue; this is a CMS issue. Once again: CMS tasks now will not run on less than 4 threads. Why? Maybe Ivan can offer some light on this? Please? |
Send message Joined: 9 Feb 09 Posts: 25 Credit: 2,384,730 RAC: 3,576 |
Hello, the points don't matter to me, I offer my computer to help with scientific tasks when I don't use it, I don't expect anything in return, the hardware in computers unfortunately becomes obsolete in a matter of years, a multitude of computers have simply been wasted use that for video games. You as an individual can decide how you want your machine to work, but what you cannot do is demand that others who want to use your machine adapt to your preferences if they do not match what they are looking for for their results and program. There will be cases where certain Relatively old hardware cannot support those minimum levels of the work that is sent to them and there will be other more modern hardware that could possibly do it better, faster and with better results. In your case and that of some others, you take tasks where a task was executed without any calculation work, in the case of others, they basically get more than 5000 tasks that all become aborted, giving enormous work to the servers and day after day day and day after day 24 hours/7 days. I think this needs to set some limits in both cases. What I don't understand is why the difference in bonus points for the same CPU work time in CMS, if you run it on Linux they give you 30,000 points, if you run it on Windows they give you 3,000 points... I don't care but to others.... |
Send message Joined: 24 Oct 04 Posts: 1173 Credit: 54,834,089 RAC: 16,184 |
Yeah my Windows 10 version CMS run the same time and my Credit average is 1,300 Never got ones like this of course |
Send message Joined: 3 Nov 12 Posts: 59 Credit: 141,502,698 RAC: 81,802 |
Give it some days for aligning. This will stabilize at 1000 to 3000 credits per wu. It's a boinc-thing... There is no difference between operating systems |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 2,013 |
For Atlas a limit of 24 Tasks is set from the project, when the venue is no limit (not 1-8). The default venue for the .xml from the project is the problem here. |
Send message Joined: 4 Sep 22 Posts: 91 Credit: 15,771,109 RAC: 15,908 |
Hello, Thank you for your input. Points don't matter to me either. I only use them to determine if things are running smoothly. If the recent average for a project is fairly constant over time, then things are probably OK, and don't need much attention from me; if it starts to drop unexpectedly, then I am interested in knowing why. As for my computer, it is all less than 2 years old. The CPU and system board are Ryzen 9/AM4, so this is certainly not an issue. You as an individual can decide how you want your machine to work, but what you cannot do is demand that others who want to use your machine adapt to your preferences if they do not match what they are looking for for their results and program. Of course they have a right to determine minimum requirements. I would simply like to be advised well in advance if those requirements are going to change. CMS tasks used to run on fewer than 4 threads; now they will not. There was, as far as I am aware, no advance warning of this change. It would have been nice to know; then I would not have had over 60 tasks fail with no clear indication of why. In your case and that of some others, you take tasks where a task was executed without any calculation work, in the case of others, they basically get more than 5000 tasks that all become aborted, giving enormous work to the servers and day after day day and day after day 24 hours/7 days. I already addressed this: I assume they were doing final testing to make sure everything worked properly before sending out any real tasks that performed meaningful scientific work. What I don't understand is why the difference in bonus points for the same CPU work time in CMS, if you run it on Linux they give you 30,000 points, if you run it on Windows they give you 3,000 points... I don't care but to others.... That is very strange. This is the first I've known about this. Credit for work done should be based on the amount of actual work done, not on which operating system one is using. |
Send message Joined: 9 Feb 09 Posts: 25 Credit: 2,384,730 RAC: 3,576 |
Nombre CMS_501246_1721941877.683423_0 Unidad de trabajo 224447234 Creado 25 de julio de 2024, 21:11:22 UTC Enviado 25 de julio de 2024, 21:58:17 UTC Límite de tiempo para informar 25 de agosto de 2024, 21:58:17 UTC Coger 26 de julio de 2024, 12:45:19 UTC Estado del servidor Acerca de Resultado Con éxito Estado del cliente Hecho Estado de salida 0 (0x00000000) Identificación de la computadora 10815183 Tiempo de ejecución 13 horas 51 min 11 seg Tiempo de CPU 2 días 4 horas 7 min 44 seg Estado de validación Válido Crédito 25.611,60 Pico de FLOPS del dispositivo 32,17 GFLOPS Versión de la aplicación Simulación de CMS v70.30 (vbox64_mt_mcore_cms) x86_64-pc-linux-gnu Tamaño máximo del conjunto de trabajo 1,86 GB Tamaño máximo de intercambio 3,56 GB Uso máximo del disco 1,91 GB The thing is that looking a little more I realize that in both cms they are equal in the point bonuses in both linux/windoms (1500 to 3000 credits) and I was surprised by the point bonus in his linux of this user from the previous posts when he really starts to finish work. with calculation in CMS. hadron it's your computer. 25611 credits In all your CMS tasks they do not go below 25,000 credits But not only that, with the previous CMS finished with 150-second CPU calculations, you already had 1150 credits. Your topic comes from a long time ago, not from now. Your computer receives a x10 credit bonus for the same task performed on another computer. |
Send message Joined: 4 Sep 22 Posts: 91 Credit: 15,771,109 RAC: 15,908 |
hadron it's your computer. 25611 credits No, it is not my computer. Credit is handed out by the LHC@H servers, I believe. I think at most, it may be my BOINC client needing to re-calibrate the credit it claims now that a) the tasks require 4 threads and take 12 or 13 hours to complete, and b) for a rather long time, the do-nothing tasks were running on 2 threads and taking only minutes. Once again, this is not an old issue; it is completely new. I have already said I don't care about what was going on before; I have said more than once my belief that the do-nothing tasks were just the CMS project making sure all the kinks and bugs were out of the system before they restarted the real work. |
Send message Joined: 17 Sep 04 Posts: 105 Credit: 32,824,853 RAC: 389 |
I agree that a more prominent notice of the new CMS requirement would be a good idea. What is somewhat misleading is that under project setup for CMS you can choose different numbers of CPU’s to use without being aware that 4-CPU’s is the only configuration that will return valid CMS results. It is different than the other LHC projects. Regards, Bob P. |
©2024 CERN