Short CMS-Tasks ok?

Author	Message
maeax Send message Joined: 2 May 07 Posts: 2278 Credit: 178,775,457 RAC: 1,891	Message 51243 - Posted: 5 Dec 2024, 6:26:20 UTC 2024-12-05 05:24:50 (15036): Guest Log: [INFO] CMS application starting. Check log files. 2024-12-05 05:49:33 (15036): Guest Log: [INFO] glidein exited with return value 0. 2024-12-05 05:49:33 (15036): Guest Log: [INFO] Shutting Down. 2024-12-05 05:49:33 (15036): VM Completion File Detected. 2024-12-05 05:49:33 (15036): VM Completion Message: glidein exited with return value 0. Are this short CMS-Tasks doing good work? Is there someone in Cern-IT to get us an answer? ID: 51243 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 888 Credit: 760,919,945 RAC: 349,222	Message 51244 - Posted: 5 Dec 2024, 8:29:02 UTC - in response to Message 51243. They are not doing any productive work based on the discussion in the other thread. They give a little credit to the crunchers. They are working on it there is a few post from Ivan in the other recient threads. ID: 51244 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1923 Credit: 149,489,638 RAC: 144,096	Message 51245 - Posted: 5 Dec 2024, 10:06:44 UTC - in response to Message 51244. They are not doing any productive work based on the discussion in the other thread. They give a little credit to the crunchers. They are working on it there is a few post from Ivan in the other recient threads. whereas the question still is (and I know I am repeating myself): why are these useless tasks being sent out, instead of being stopped from the distribution queue ??? ID: 51245 · Reply Quote

Guy Send message Joined: 9 Feb 08 Posts: 61 Credit: 2,161,811 RAC: 3,668	Message 51246 - Posted: 5 Dec 2024, 11:06:54 UTC Last modified: 5 Dec 2024, 11:29:10 UTC The server hardware is being swapped around. Major reconfigurations are taking place. In other words - the machines the volunteers connect to are taken off line sporadically during this. There is no work is available while the entire LHC@home crew are all busy doing this There's just no need to cater for generating the BOINC volunteers work while the system is mostly off line. The work units that 'pop out' are just empty data transport vehicles with no actual LHC@home data for crunching in them. The data transport system is functioning but there are no "passengers", or in this case data in them. There are gaps but often there is work available from the LHC@home project during this maintenance. I still have a very long Theory job running from last week. Last week saw ATLAS jobs available too. Sometimes you get all three at the same time. CMS, ATLAS & Theory! These empty tasks are, some think, a bit of a waste of time. Stop them if you want. I stopped pulling CMS work units a few days ago. To do this: Click on the "Project" item in the menu bar at the top of any LHC@home web page. In the drop down list select "Preferences". Click "Edit preferences". Un-check the "CMS Simulation". Un-check "If no work for selected applications is available, accept work from other applications?" (Leave everything else alone!) Click "Update preferences". At this point all you can do is keep an eye on the CMS Application (this) message board for news of new work available. You could look at the "Computing -> Server status" page - but it doesn't say if the jobs are hollow or not. Check the message boards. On the technical side, for example - The errors I found logged in the stderr output generated by the various CMS simulations I downloaded revealed one LHC@home server after another going off line and coming back on again while the crew worked. Each job generates this stderr on the Cern servers upon completion. To find this particular stderr output - yes, there's more than one for your task - (It's best to do this in another browser tab while you read instructions here) Click on the "Project" item in the menu bar at the top of any LHC@home web page. In the drop down list select "Account" to open your account page. Click Tasks View In the page that opens is a table of your current and recent tasks. (IMHO it's not easy to tell which job you want in this list. You have to click each one's Task number and look at the "Name" or the "Date" to identify it.) Find the job you're interested in examining and click on it's number in the first column - that's its Task number. And the stderr output is only available for completed tasks. Error or not. This example snippet shows the error logged at Cern's stderr by my computer when *one* of those functioning but empty transport vehicles arrived last week. It shows that a server called "HTCondor" was off line. - Yes - All this for just one server off line! ... 2024-11-17 20:03:37 (14664): Guest Log: [INFO] Testing connection to HTCondor 2024-11-17 20:03:53 (14664): Guest Log: [DEBUG] Status run 1 of up to 3: 1 2024-11-17 20:04:14 (14664): Guest Log: [DEBUG] Status run 2 of up to 3: 1 2024-11-17 20:04:39 (14664): Guest Log: [DEBUG] Status run 3 of up to 3: 1 2024-11-17 20:04:39 (14664): Guest Log: [DEBUG] run 1 2024-11-17 20:04:39 (14664): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat ) 2024-11-17 20:04:39 (14664): Guest Log: Ncat: Connection timed out. 2024-11-17 20:04:39 (14664): Guest Log: run 2 2024-11-17 20:04:39 (14664): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat ) 2024-11-17 20:04:39 (14664): Guest Log: Ncat: Connection timed out. 2024-11-17 20:04:39 (14664): Guest Log: run 3 2024-11-17 20:04:39 (14664): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat ) 2024-11-17 20:04:39 (14664): Guest Log: NCAT DEBUG: Using system default trusted CA certificates and those in /usr/share/ncat/ca-bundle.crt. 2024-11-17 20:04:39 (14664): Guest Log: NCAT DEBUG: Unable to load trusted CA certificates from /usr/share/ncat/ca-bundle.crt: error:02001002:system library:fopen:No such file or directory 2024-11-17 20:04:39 (14664): Guest Log: libnsock nsi_new2(): nsi_new (IOD #1) 2024-11-17 20:04:39 (14664): Guest Log: libnsock nsock_connect_tcp(): TCP connection requested to 137.138.156.85:9618 (IOD #1) EID 8 2024-11-17 20:04:39 (14664): Guest Log: libnsock nsock_trace_handler_callback(): Callback: CONNECT TIMEOUT for EID 8 [137.138.156.85:9618] 2024-11-17 20:04:39 (14664): Guest Log: Ncat: Connection timed out. 2024-11-17 20:04:39 (14664): Guest Log: [ERROR] Could not connect to vocms0840.cern.ch on port 9618 2024-11-17 20:04:39 (14664): Guest Log: [INFO] Testing connection to WMAgent 2024-11-17 20:04:39 (14664): Guest Log: [INFO] Testing connection to EOSCMS 2024-11-17 20:04:40 (14664): Guest Log: [INFO] Testing connection to CMS-Factory 2024-11-17 20:04:40 (14664): Guest Log: [INFO] Testing connection to CMS-Frontier 2024-11-17 20:04:40 (14664): Guest Log: [INFO] Testing connection to Frontier 2024-11-17 20:04:40 (14664): Guest Log: [DEBUG] Check your firewall and your network load 2024-11-17 20:04:40 (14664): Guest Log: [ERROR] Could not connect to all required network services ... So it's just a matter of time before we see a completion of the maintenance upgrades. It's a big old system y'all. Patience needed by all. ID: 51246 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 888 Credit: 760,919,945 RAC: 349,222	Message 51249 - Posted: 5 Dec 2024, 19:54:29 UTC - in response to Message 51245. I can only imagine its challenging to stop the generation of work unit even if they do nothing. As guy said it it boths you don't take work from CMS for the timebeing. ID: 51249 · Reply Quote

mmonnin Send message Joined: 22 Mar 17 Posts: 82 Credit: 29,783,045 RAC: 4,180	Message 51252 - Posted: 6 Dec 2024, 0:24:56 UTC - in response to Message 51246. The server hardware is being swapped around. Major reconfigurations are taking place. In other words - the machines the volunteers connect to are taken off line sporadically during this. There is no work is available while the entire LHC@home crew are all busy doing this There's just no need to cater for generating the BOINC volunteers work while the system is mostly off line. The work units that 'pop out' are just empty data transport vehicles with no actual LHC@home data for crunching in them. The data transport system is functioning but there are no "passengers", or in this case data in them. There are gaps but often there is work available from the LHC@home project during this maintenance. I still have a very long Theory job running from last week. Last week saw ATLAS jobs available too. Sometimes you get all three at the same time. CMS, ATLAS & Theory! These empty tasks are, some think, a bit of a waste of time. Stop them if you want. I stopped pulling CMS work units a few days ago. To do this: Click on the "Project" item in the menu bar at the top of any LHC@home web page. In the drop down list select "Preferences". Click "Edit preferences". Un-check the "CMS Simulation". Un-check "If no work for selected applications is available, accept work from other applications?" (Leave everything else alone!) Click "Update preferences". -------- So it's just a matter of time before we see a completion of the maintenance upgrades. It's a big old system y'all. Patience needed by all. Bro, you're replying to the LHC overall #3, #11 and #18 users with nearly 1b credit. They know how to edit BOINC preferences. ID: 51252 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1242 Credit: 85,089,983 RAC: 136,034	Message 51259 - Posted: 8 Dec 2024, 17:05:20 UTC - in response to Message 51252. Not doing any actual work but they still use the usual memory to run them.......here is when one just finished and a new one starts in the first 4 minutes ID: 51259 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2278 Credit: 178,775,457 RAC: 1,891	Message 51262 - Posted: 10 Dec 2024, 4:28:46 UTC - in response to Message 51259. Have reduced to one Atlas-Task for one day. Testing this every 24 hour. All other projects deselected. ID: 51262 · Reply Quote

LHC@home