Message boards : CMS Application : What is this nonsense?
Message board moderation

To post messages, you must log in.

AuthorMessage
hadron

Send message
Joined: 4 Sep 22
Posts: 91
Credit: 15,858,970
RAC: 20,710
Message 50504 - Posted: 25 Jul 2024, 22:23:51 UTC
Last modified: 25 Jul 2024, 23:13:39 UTC

I have now had 52 CMS tasks in a row fail and all the logs I've checked show these messages:

2024-07-25 15:59:32 (21088): Guest Log: [INFO] Requesting an idtoken from LHC@home
2024-07-25 15:59:33 (21088): Guest Log: [INFO] CMS application starting. Check log files.
2024-07-25 16:11:48 (21088): Guest Log: [ERROR] VM expects at least 4 CPUs but reports only 2.
2024-07-25 16:11:48 (21088): Guest Log: [DEBUG] Volunteer: hadron (806228)
2024-07-25 16:11:48 (21088): Guest Log: [INFO] Shutting Down.
2024-07-25 16:12:18 (21088): VM Completion File Detected.
2024-07-25 16:12:18 (21088): VM Completion Message: VM expects at least 4 CPUs but reports only 2.

So what? Is CMS now demanding that I must run tasks on 4 CPUs?

This is the <app_version> section for CMS:
<app_version>
        <app_name>CMS</app_name>
        <avg_ncpus>2</avg_ncpus>
        <plan_class>vbox64_mt_mcore_cms</plan_class>
        <cmdline>--nthreads 2</cmdline>
    </app_version>
It's been like this since CMS became capable of running on multiple threads, without problem until now.

So I have changed the settings to run the tasks on 4 threads, and so far, things are looking OK.
ID: 50504 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2243
Credit: 173,902,375
RAC: 1,652
Message 50505 - Posted: 26 Jul 2024, 4:53:07 UTC - in response to Message 50504.  

2024-07-25 16:12:18 (21088): VM Completion Message: VM expects at least 4 CPUs but reports only 2.
ID: 50505 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 91
Credit: 15,858,970
RAC: 20,710
Message 50506 - Posted: 26 Jul 2024, 12:48:41 UTC - in response to Message 50505.  

2024-07-25 16:12:18 (21088): VM Completion Message: VM expects at least 4 CPUs but reports only 2.

Yes, maeax. I can read, but you haven't even tried to answer my question.
Once again, is this the reason why all those CMS tasks failed after only 14 to 16 minutes?
ID: 50506 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2243
Credit: 173,902,375
RAC: 1,652
Message 50507 - Posted: 26 Jul 2024, 13:00:58 UTC - in response to Message 50506.  

Ok,
but, what if Cern-IT had changed it in the .xml?
ID: 50507 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 91
Credit: 15,858,970
RAC: 20,710
Message 50508 - Posted: 26 Jul 2024, 15:40:34 UTC - in response to Message 50507.  

Ok,
but, what if Cern-IT had changed it in the .xml?

You're just speculating. Anyway, that file is overridden by whatever is in the app_config.xml file.
I've set my config back to 2 threads per task to see if the error returns -- and yes, it has. 5 tasks failed within 14 to 16 minutes.
Back to 4 CPUs, even if they do take 12 to 13 hours to run.

My next question is one I know you will not/can not have an answer for, maeax -- just what gives anyone at CERN the right to dictate how I am allowed to allocate the resources of my computer?
ID: 50508 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1060
Credit: 7,737,455
RAC: 1,081
Message 50509 - Posted: 26 Jul 2024, 15:59:11 UTC - in response to Message 50508.  

Ok,
but, what if Cern-IT had changed it in the .xml?

You're just speculating. Anyway, that file is overridden by whatever is in the app_config.xml file.
I've set my config back to 2 threads per task to see if the error returns -- and yes, it has. 5 tasks failed within 14 to 16 minutes.
Back to 4 CPUs, even if they do take 12 to 13 hours to run.

My next question is one I know you will not/can not have an answer for, maeax -- just what gives anyone at CERN the right to dictate how I am allowed to allocate the resources of my computer?

We don't dictate how you allocate your resources. We do specify what resources are required to properly run our simulations.
ID: 50509 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 91
Credit: 15,858,970
RAC: 20,710
Message 50510 - Posted: 26 Jul 2024, 16:06:30 UTC - in response to Message 50509.  

My next question is one I know you will not/can not have an answer for, maeax -- just what gives anyone at CERN the right to dictate how I am allowed to allocate the resources of my computer?

We don't dictate how you allocate your resources. We do specify what resources are required to properly run our simulations.

OK, so are you saying that yes, CMS tasks will only run if they are allocated 4 threads?
If that is true, then you most certainly are dictating how I allocate my resources.
ID: 50510 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2534
Credit: 254,130,347
RAC: 54,019
Message 50512 - Posted: 26 Jul 2024, 16:44:21 UTC

@Ivan
+1


@hadron
Did you really believe that your computer delivered valid results within 30 min/2.5 min while other computers need many hours?
How naive!

In reality your computer got credits for empty envelopes without any scientific payload (for many weeks!).
This has now been stopped.
Like all volunteers you have to respect the requirements and set up your VMs accordingly.
Your choice is to either do so or to leave.
ID: 50512 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 91
Credit: 15,858,970
RAC: 20,710
Message 50513 - Posted: 26 Jul 2024, 17:00:27 UTC - in response to Message 50512.  

@Ivan
+1


@hadron
Did you really believe that your computer delivered valid results within 30 min/2.5 min while other computers need many hours?
How naive!

In reality your computer got credits for empty envelopes without any scientific payload (for many weeks!).
This has now been stopped.
Like all volunteers you have to respect the requirements and set up your VMs accordingly.
Your choice is to either do so or to leave.

Please tell me then, just why can I not set these tasks to run on only 2 threads? Every task running on only 2 has failed. That has never happened before.
Now, the tasks will not run unless I give them 4 threads. Why?
This does not happen with Atlas; those run just fine with however many threads I allow them to have, from 1 to 8. Why can CMS tasks not be configured the same way?
And no, I was not naive when all those tasks were completing with no real work being done. I just assumed that was because they were testing things to make sure they had got it right. Back then, I could set CMS tasks to run on any number of threads between 1 and 4, and they completed just fine. Now they will not -- they must be given 4 threads, or they will fail.
If all you have to offer is "like it or leave", then I think your help desk "expert" credentials are in serious need of review.
ID: 50513 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1419
Credit: 9,474,701
RAC: 2,980
Message 50514 - Posted: 26 Jul 2024, 17:31:29 UTC - in response to Message 50512.  

This has now been stopped.
Since then it seems I only get this when requesting CMS-tasks:

LHC@home 26 Jul 19:21:48 Requesting new tasks for CPU
LHC@home 26 Jul 19:21:49 Scheduler request completed: got 0 new tasks
LHC@home 26 Jul 19:21:49 No tasks sent
LHC@home 26 Jul 19:21:49 No tasks are available for CMS Simulation

Even reset and detaching the project did not solve this. No app_config.xml in use and project setting 1 job and CPUs no limit or # of CPUs 4.
ID: 50514 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2534
Credit: 254,130,347
RAC: 54,019
Message 50515 - Posted: 26 Jul 2024, 17:42:31 UTC - in response to Message 50514.  

The CMS patch activated last night affects the process inside the VM.
It has nothing to do with BOINC (especially the work fetch).
Hence, BOINC related issues are not caused by the CMS patch.
ID: 50515 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 91
Credit: 15,858,970
RAC: 20,710
Message 50516 - Posted: 26 Jul 2024, 20:44:35 UTC - in response to Message 50515.  

The CMS patch activated last night affects the process inside the VM.
It has nothing to do with BOINC (especially the work fetch).
Hence, BOINC related issues are not caused by the CMS patch.

That is less than useless. This is not a BOINC-related issue; this is a CMS issue.

Once again: CMS tasks now will not run on less than 4 threads. Why? Maybe Ivan can offer some light on this? Please?
ID: 50516 · Report as offensive     Reply Quote
Emmanuel Mar
Avatar

Send message
Joined: 9 Feb 09
Posts: 25
Credit: 2,399,496
RAC: 4,230
Message 50517 - Posted: 26 Jul 2024, 21:10:45 UTC - in response to Message 50516.  
Last modified: 26 Jul 2024, 21:26:39 UTC

Hello,
the points don't matter to me, I offer my computer to help with scientific tasks when I don't use it, I don't expect anything in return, the hardware in computers unfortunately becomes obsolete in a matter of years, a multitude of computers have simply been wasted use that for video games.
You as an individual can decide how you want your machine to work, but what you cannot do is demand that others who want to use your machine adapt to your preferences if they do not match what they are looking for for their results and program. There will be cases where certain Relatively old hardware cannot support those minimum levels of the work that is sent to them and there will be other more modern hardware that could possibly do it better, faster and with better results.

In your case and that of some others, you take tasks where a task was executed without any calculation work, in the case of others, they basically get more than 5000 tasks that all become aborted, giving enormous work to the servers and day after day day and day after day 24 hours/7 days.

I think this needs to set some limits in both cases.

What I don't understand is why the difference in bonus points for the same CPU work time in CMS, if you run it on Linux they give you 30,000 points, if you run it on Windows they give you 3,000 points... I don't care but to others....
ID: 50517 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1173
Credit: 54,863,223
RAC: 15,797
Message 50518 - Posted: 26 Jul 2024, 21:55:45 UTC - in response to Message 50517.  


What I don't understand is why the difference in bonus points for the same CPU work time in CMS, if you run it on Linux they give you 30,000 points, if you run it on Windows they give you 3,000 points... I don't care but to others....


Yeah my Windows 10 version CMS run the same time and my Credit average is 1,300

Never got ones like this of course

ID: 50518 · Report as offensive     Reply Quote
Saturn911

Send message
Joined: 3 Nov 12
Posts: 59
Credit: 141,713,931
RAC: 84,767
Message 50519 - Posted: 26 Jul 2024, 22:18:31 UTC - in response to Message 50518.  
Last modified: 26 Jul 2024, 22:23:06 UTC

Give it some days for aligning.
This will stabilize at 1000 to 3000 credits per wu.
It's a boinc-thing...
There is no difference between operating systems
ID: 50519 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2243
Credit: 173,902,375
RAC: 1,652
Message 50520 - Posted: 26 Jul 2024, 22:19:16 UTC
Last modified: 26 Jul 2024, 22:20:33 UTC

For Atlas a limit of 24 Tasks is set from the project,
when the venue is no limit (not 1-8).
The default venue for the .xml from the project is the problem here.
ID: 50520 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 91
Credit: 15,858,970
RAC: 20,710
Message 50521 - Posted: 26 Jul 2024, 22:27:17 UTC - in response to Message 50517.  

Hello,
the points don't matter to me, I offer my computer to help with scientific tasks when I don't use it, I don't expect anything in return, the hardware in computers unfortunately becomes obsolete in a matter of years, a multitude of computers have simply been wasted use that for video games.

Thank you for your input. Points don't matter to me either. I only use them to determine if things are running smoothly. If the recent average for a project is fairly constant over time, then things are probably OK, and don't need much attention from me; if it starts to drop unexpectedly, then I am interested in knowing why.
As for my computer, it is all less than 2 years old. The CPU and system board are Ryzen 9/AM4, so this is certainly not an issue.

You as an individual can decide how you want your machine to work, but what you cannot do is demand that others who want to use your machine adapt to your preferences if they do not match what they are looking for for their results and program.

Of course they have a right to determine minimum requirements. I would simply like to be advised well in advance if those requirements are going to change. CMS tasks used to run on fewer than 4 threads; now they will not. There was, as far as I am aware, no advance warning of this change. It would have been nice to know; then I would not have had over 60 tasks fail with no clear indication of why.

In your case and that of some others, you take tasks where a task was executed without any calculation work, in the case of others, they basically get more than 5000 tasks that all become aborted, giving enormous work to the servers and day after day day and day after day 24 hours/7 days.

I already addressed this: I assume they were doing final testing to make sure everything worked properly before sending out any real tasks that performed meaningful scientific work.

What I don't understand is why the difference in bonus points for the same CPU work time in CMS, if you run it on Linux they give you 30,000 points, if you run it on Windows they give you 3,000 points... I don't care but to others....

That is very strange. This is the first I've known about this. Credit for work done should be based on the amount of actual work done, not on which operating system one is using.
ID: 50521 · Report as offensive     Reply Quote
Emmanuel Mar
Avatar

Send message
Joined: 9 Feb 09
Posts: 25
Credit: 2,399,496
RAC: 4,230
Message 50522 - Posted: 26 Jul 2024, 22:56:32 UTC - in response to Message 50521.  
Last modified: 26 Jul 2024, 23:14:38 UTC

Nombre CMS_501246_1721941877.683423_0
Unidad de trabajo 224447234
Creado 25 de julio de 2024, 21:11:22 UTC
Enviado 25 de julio de 2024, 21:58:17 UTC
Límite de tiempo para informar 25 de agosto de 2024, 21:58:17 UTC
Coger 26 de julio de 2024, 12:45:19 UTC
Estado del servidor Acerca de
Resultado Con éxito
Estado del cliente Hecho
Estado de salida 0 (0x00000000)
Identificación de la computadora 10815183
Tiempo de ejecución 13 horas 51 min 11 seg
Tiempo de CPU 2 días 4 horas 7 min 44 seg
Estado de validación Válido
Crédito 25.611,60
Pico de FLOPS del dispositivo 32,17 GFLOPS
Versión de la aplicación Simulación de CMS v70.30 (vbox64_mt_mcore_cms)
x86_64-pc-linux-gnu
Tamaño máximo del conjunto de trabajo 1,86 GB
Tamaño máximo de intercambio 3,56 GB
Uso máximo del disco 1,91 GB

The thing is that looking a little more I realize that in both cms they are equal in the point bonuses in both linux/windoms (1500 to 3000 credits) and I was surprised by the point bonus in his linux of this user from the previous posts when he really starts to finish work. with calculation in CMS.

hadron it's your computer. 25611 credits
In all your CMS tasks they do not go below 25,000 credits

But not only that, with the previous CMS finished with 150-second CPU calculations, you already had 1150 credits. Your topic comes from a long time ago, not from now.

Your computer receives a x10 credit bonus for the same task performed on another computer.
ID: 50522 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 91
Credit: 15,858,970
RAC: 20,710
Message 50523 - Posted: 26 Jul 2024, 23:51:27 UTC - in response to Message 50522.  

hadron it's your computer. 25611 credits
In all your CMS tasks they do not go below 25,000 credits

But not only that, with the previous CMS finished with 150-second CPU calculations, you already had 1150 credits. Your topic comes from a long time ago, not from now.

Your computer receives a x10 credit bonus for the same task performed on another computer.

No, it is not my computer. Credit is handed out by the LHC@H servers, I believe. I think at most, it may be my BOINC client needing to re-calibrate the credit it claims now that a) the tasks require 4 threads and take 12 or 13 hours to complete, and b) for a rather long time, the do-nothing tasks were running on 2 threads and taking only minutes.
Once again, this is not an old issue; it is completely new. I have already said I don't care about what was going on before; I have said more than once my belief that the do-nothing tasks were just the CMS project making sure all the kinks and bugs were out of the system before they restarted the real work.
ID: 50523 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 105
Credit: 32,824,862
RAC: 290
Message 50530 - Posted: 27 Jul 2024, 15:38:51 UTC - in response to Message 50523.  

I agree that a more prominent notice of the new CMS requirement would be a good idea.

What is somewhat misleading is that under project setup for CMS you can choose different numbers of CPU’s to use without being aware that 4-CPU’s is the only configuration that will return valid CMS results. It is different than the other LHC projects.
Regards,
Bob P.
ID: 50530 · Report as offensive     Reply Quote

Message boards : CMS Application : What is this nonsense?


©2024 CERN