Message boards : CMS Application : Short CMS-Tasks ok?
Message board moderation

To post messages, you must log in.

AuthorMessage
maeax

Send message
Joined: 2 May 07
Posts: 2260
Credit: 175,581,097
RAC: 9,467
Message 51243 - Posted: 5 Dec 2024, 6:26:20 UTC

2024-12-05 05:24:50 (15036): Guest Log: [INFO] CMS application starting. Check log files.
2024-12-05 05:49:33 (15036): Guest Log: [INFO] glidein exited with return value 0.
2024-12-05 05:49:33 (15036): Guest Log: [INFO] Shutting Down.
2024-12-05 05:49:33 (15036): VM Completion File Detected.
2024-12-05 05:49:33 (15036): VM Completion Message: glidein exited with return value 0.

Are this short CMS-Tasks doing good work?
Is there someone in Cern-IT to get us an answer?
ID: 51243 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 859
Credit: 704,340,953
RAC: 184,017
Message 51244 - Posted: 5 Dec 2024, 8:29:02 UTC - in response to Message 51243.  

They are not doing any productive work based on the discussion in the other thread. They give a little credit to the crunchers.

They are working on it there is a few post from Ivan in the other recient threads.
ID: 51244 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1843
Credit: 126,961,044
RAC: 132,850
Message 51245 - Posted: 5 Dec 2024, 10:06:44 UTC - in response to Message 51244.  

They are not doing any productive work based on the discussion in the other thread. They give a little credit to the crunchers.

They are working on it there is a few post from Ivan in the other recient threads.
whereas the question still is (and I know I am repeating myself): why are these useless tasks being sent out, instead of being stopped from the distribution queue ???
ID: 51245 · Report as offensive     Reply Quote
Profile Guy
Avatar

Send message
Joined: 9 Feb 08
Posts: 55
Credit: 1,528,489
RAC: 2,661
Message 51246 - Posted: 5 Dec 2024, 11:06:54 UTC
Last modified: 5 Dec 2024, 11:29:10 UTC

The server hardware is being swapped around. Major reconfigurations are taking place. In other words - the machines the volunteers connect to are taken off line sporadically during this.
There is no work is available while the entire LHC@home crew are all busy doing this There's just no need to cater for generating the BOINC volunteers work while the system is mostly off line.
The work units that 'pop out' are just empty data transport vehicles with no actual LHC@home data for crunching in them. The data transport system is functioning but there are no "passengers", or in this case data in them.

There are gaps but often there is work available from the LHC@home project during this maintenance. I still have a very long Theory job running from last week. Last week saw ATLAS jobs available too. Sometimes you get all three at the same time. CMS, ATLAS & Theory!

These empty tasks are, some think, a bit of a waste of time. Stop them if you want.
I stopped pulling CMS work units a few days ago.
To do this:
Click on the "Project" item in the menu bar at the top of any LHC@home web page.
In the drop down list select "Preferences".
Click "Edit preferences".
Un-check the "CMS Simulation".
Un-check "If no work for selected applications is available, accept work from other applications?" (Leave everything else alone!)
Click "Update preferences".

At this point all you can do is keep an eye on the CMS Application (this) message board for news of new work available.
You could look at the "Computing -> Server status" page - but it doesn't say if the jobs are hollow or not. Check the message boards.

On the technical side, for example -
The errors I found logged in the stderr output generated by the various CMS simulations I downloaded revealed one LHC@home server after another going off line and coming back on again while the crew worked. Each job generates this stderr on the Cern servers upon completion.
To find this particular stderr output - yes, there's more than one for your task - (It's best to do this in another browser tab while you read instructions here)
Click on the "Project" item in the menu bar at the top of any LHC@home web page.
In the drop down list select "Account" to open your account page.
Click Tasks View
In the page that opens is a table of your current and recent tasks.
(IMHO it's not easy to tell which job you want in this list. You have to click each one's Task number and look at the "Name" or the "Date" to identify it.)
Find the job you're interested in examining and click on it's number in the first column - that's its Task number.
And the stderr output is only available for completed tasks. Error or not.

This example snippet shows the error logged at Cern's stderr by my computer when one of those functioning but empty transport vehicles arrived last week.
It shows that a server called "HTCondor" was off line. - Yes - All this for just one server off line!
...
2024-11-17 20:03:37 (14664): Guest Log: [INFO] Testing connection to HTCondor
2024-11-17 20:03:53 (14664): Guest Log: [DEBUG] Status run 1 of up to 3: 1
2024-11-17 20:04:14 (14664): Guest Log: [DEBUG] Status run 2 of up to 3: 1
2024-11-17 20:04:39 (14664): Guest Log: [DEBUG] Status run 3 of up to 3: 1
2024-11-17 20:04:39 (14664): Guest Log: [DEBUG] run 1
2024-11-17 20:04:39 (14664): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat )
2024-11-17 20:04:39 (14664): Guest Log: Ncat: Connection timed out.
2024-11-17 20:04:39 (14664): Guest Log: run 2
2024-11-17 20:04:39 (14664): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat )
2024-11-17 20:04:39 (14664): Guest Log: Ncat: Connection timed out.
2024-11-17 20:04:39 (14664): Guest Log: run 3
2024-11-17 20:04:39 (14664): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat )
2024-11-17 20:04:39 (14664): Guest Log: NCAT DEBUG: Using system default trusted CA certificates and those in /usr/share/ncat/ca-bundle.crt.
2024-11-17 20:04:39 (14664): Guest Log: NCAT DEBUG: Unable to load trusted CA certificates from /usr/share/ncat/ca-bundle.crt: error:02001002:system library:fopen:No such file or directory
2024-11-17 20:04:39 (14664): Guest Log: libnsock nsi_new2(): nsi_new (IOD #1)
2024-11-17 20:04:39 (14664): Guest Log: libnsock nsock_connect_tcp(): TCP connection requested to 137.138.156.85:9618 (IOD #1) EID 8
2024-11-17 20:04:39 (14664): Guest Log: libnsock nsock_trace_handler_callback(): Callback: CONNECT TIMEOUT for EID 8 [137.138.156.85:9618]
2024-11-17 20:04:39 (14664): Guest Log: Ncat: Connection timed out.
2024-11-17 20:04:39 (14664): Guest Log: [ERROR] Could not connect to vocms0840.cern.ch on port 9618
2024-11-17 20:04:39 (14664): Guest Log: [INFO] Testing connection to WMAgent
2024-11-17 20:04:39 (14664): Guest Log: [INFO] Testing connection to EOSCMS
2024-11-17 20:04:40 (14664): Guest Log: [INFO] Testing connection to CMS-Factory
2024-11-17 20:04:40 (14664): Guest Log: [INFO] Testing connection to CMS-Frontier
2024-11-17 20:04:40 (14664): Guest Log: [INFO] Testing connection to Frontier
2024-11-17 20:04:40 (14664): Guest Log: [DEBUG] Check your firewall and your network load
2024-11-17 20:04:40 (14664): Guest Log: [ERROR] Could not connect to all required network services
...

So it's just a matter of time before we see a completion of the maintenance upgrades.
It's a big old system y'all. Patience needed by all.
ID: 51246 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 859
Credit: 704,340,953
RAC: 184,017
Message 51249 - Posted: 5 Dec 2024, 19:54:29 UTC - in response to Message 51245.  

I can only imagine its challenging to stop the generation of work unit even if they do nothing. As guy said it it boths you don't take work from CMS for the timebeing.
ID: 51249 · Report as offensive     Reply Quote
mmonnin

Send message
Joined: 22 Mar 17
Posts: 66
Credit: 25,047,948
RAC: 35,030
Message 51252 - Posted: 6 Dec 2024, 0:24:56 UTC - in response to Message 51246.  

The server hardware is being swapped around. Major reconfigurations are taking place. In other words - the machines the volunteers connect to are taken off line sporadically during this.
There is no work is available while the entire LHC@home crew are all busy doing this There's just no need to cater for generating the BOINC volunteers work while the system is mostly off line.
The work units that 'pop out' are just empty data transport vehicles with no actual LHC@home data for crunching in them. The data transport system is functioning but there are no "passengers", or in this case data in them.

There are gaps but often there is work available from the LHC@home project during this maintenance. I still have a very long Theory job running from last week. Last week saw ATLAS jobs available too. Sometimes you get all three at the same time. CMS, ATLAS & Theory!

These empty tasks are, some think, a bit of a waste of time. Stop them if you want.
I stopped pulling CMS work units a few days ago.
To do this:
Click on the "Project" item in the menu bar at the top of any LHC@home web page.
In the drop down list select "Preferences".
Click "Edit preferences".
Un-check the "CMS Simulation".
Un-check "If no work for selected applications is available, accept work from other applications?" (Leave everything else alone!)
Click "Update preferences".
--------
So it's just a matter of time before we see a completion of the maintenance upgrades.
It's a big old system y'all. Patience needed by all.


Bro, you're replying to the LHC overall #3, #11 and #18 users with nearly 1b credit. They know how to edit BOINC preferences.
ID: 51252 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1193
Credit: 59,394,728
RAC: 71,828
Message 51259 - Posted: 8 Dec 2024, 17:05:20 UTC - in response to Message 51252.  

Not doing any actual work but they still use the usual memory to run them.......here is when one just finished and a new one starts in the first 4 minutes

ID: 51259 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2260
Credit: 175,581,097
RAC: 9,467
Message 51262 - Posted: 10 Dec 2024, 4:28:46 UTC - in response to Message 51259.  

Have reduced to one Atlas-Task for one day.
Testing this every 24 hour. All other projects deselected.
ID: 51262 · Report as offensive     Reply Quote

Message boards : CMS Application : Short CMS-Tasks ok?


©2025 CERN