Message boards :
CMS Application :
Short CMS-Tasks ok?
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 May 07 Posts: 2260 Credit: 175,581,097 RAC: 9,467 ![]() ![]() ![]() |
2024-12-05 05:24:50 (15036): Guest Log: [INFO] CMS application starting. Check log files. 2024-12-05 05:49:33 (15036): Guest Log: [INFO] glidein exited with return value 0. 2024-12-05 05:49:33 (15036): Guest Log: [INFO] Shutting Down. 2024-12-05 05:49:33 (15036): VM Completion File Detected. 2024-12-05 05:49:33 (15036): VM Completion Message: glidein exited with return value 0. Are this short CMS-Tasks doing good work? Is there someone in Cern-IT to get us an answer? |
Send message Joined: 27 Sep 08 Posts: 859 Credit: 704,340,953 RAC: 184,017 ![]() ![]() ![]() |
They are not doing any productive work based on the discussion in the other thread. They give a little credit to the crunchers. They are working on it there is a few post from Ivan in the other recient threads. |
Send message Joined: 18 Dec 15 Posts: 1843 Credit: 126,961,044 RAC: 132,850 ![]() ![]() ![]() |
They are not doing any productive work based on the discussion in the other thread. They give a little credit to the crunchers.whereas the question still is (and I know I am repeating myself): why are these useless tasks being sent out, instead of being stopped from the distribution queue ??? |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 55 Credit: 1,528,489 RAC: 2,661 ![]() ![]() ![]() |
The server hardware is being swapped around. Major reconfigurations are taking place. In other words - the machines the volunteers connect to are taken off line sporadically during this. There is no work is available while the entire LHC@home crew are all busy doing this There's just no need to cater for generating the BOINC volunteers work while the system is mostly off line. The work units that 'pop out' are just empty data transport vehicles with no actual LHC@home data for crunching in them. The data transport system is functioning but there are no "passengers", or in this case data in them. There are gaps but often there is work available from the LHC@home project during this maintenance. I still have a very long Theory job running from last week. Last week saw ATLAS jobs available too. Sometimes you get all three at the same time. CMS, ATLAS & Theory! These empty tasks are, some think, a bit of a waste of time. Stop them if you want. I stopped pulling CMS work units a few days ago. To do this: Click on the "Project" item in the menu bar at the top of any LHC@home web page. In the drop down list select "Preferences". Click "Edit preferences". Un-check the "CMS Simulation". Un-check "If no work for selected applications is available, accept work from other applications?" (Leave everything else alone!) Click "Update preferences". At this point all you can do is keep an eye on the CMS Application (this) message board for news of new work available. You could look at the "Computing -> Server status" page - but it doesn't say if the jobs are hollow or not. Check the message boards. On the technical side, for example - The errors I found logged in the stderr output generated by the various CMS simulations I downloaded revealed one LHC@home server after another going off line and coming back on again while the crew worked. Each job generates this stderr on the Cern servers upon completion. To find this particular stderr output - yes, there's more than one for your task - (It's best to do this in another browser tab while you read instructions here) Click on the "Project" item in the menu bar at the top of any LHC@home web page. In the drop down list select "Account" to open your account page. Click Tasks View In the page that opens is a table of your current and recent tasks. (IMHO it's not easy to tell which job you want in this list. You have to click each one's Task number and look at the "Name" or the "Date" to identify it.) Find the job you're interested in examining and click on it's number in the first column - that's its Task number. And the stderr output is only available for completed tasks. Error or not. This example snippet shows the error logged at Cern's stderr by my computer when one of those functioning but empty transport vehicles arrived last week. It shows that a server called "HTCondor" was off line. - Yes - All this for just one server off line! ... 2024-11-17 20:03:37 (14664): Guest Log: [INFO] Testing connection to HTCondor 2024-11-17 20:03:53 (14664): Guest Log: [DEBUG] Status run 1 of up to 3: 1 2024-11-17 20:04:14 (14664): Guest Log: [DEBUG] Status run 2 of up to 3: 1 2024-11-17 20:04:39 (14664): Guest Log: [DEBUG] Status run 3 of up to 3: 1 2024-11-17 20:04:39 (14664): Guest Log: [DEBUG] run 1 2024-11-17 20:04:39 (14664): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat ) 2024-11-17 20:04:39 (14664): Guest Log: Ncat: Connection timed out. 2024-11-17 20:04:39 (14664): Guest Log: run 2 2024-11-17 20:04:39 (14664): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat ) 2024-11-17 20:04:39 (14664): Guest Log: Ncat: Connection timed out. 2024-11-17 20:04:39 (14664): Guest Log: run 3 2024-11-17 20:04:39 (14664): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat ) 2024-11-17 20:04:39 (14664): Guest Log: NCAT DEBUG: Using system default trusted CA certificates and those in /usr/share/ncat/ca-bundle.crt. 2024-11-17 20:04:39 (14664): Guest Log: NCAT DEBUG: Unable to load trusted CA certificates from /usr/share/ncat/ca-bundle.crt: error:02001002:system library:fopen:No such file or directory 2024-11-17 20:04:39 (14664): Guest Log: libnsock nsi_new2(): nsi_new (IOD #1) 2024-11-17 20:04:39 (14664): Guest Log: libnsock nsock_connect_tcp(): TCP connection requested to 137.138.156.85:9618 (IOD #1) EID 8 2024-11-17 20:04:39 (14664): Guest Log: libnsock nsock_trace_handler_callback(): Callback: CONNECT TIMEOUT for EID 8 [137.138.156.85:9618] 2024-11-17 20:04:39 (14664): Guest Log: Ncat: Connection timed out. 2024-11-17 20:04:39 (14664): Guest Log: [ERROR] Could not connect to vocms0840.cern.ch on port 9618 2024-11-17 20:04:39 (14664): Guest Log: [INFO] Testing connection to WMAgent 2024-11-17 20:04:39 (14664): Guest Log: [INFO] Testing connection to EOSCMS 2024-11-17 20:04:40 (14664): Guest Log: [INFO] Testing connection to CMS-Factory 2024-11-17 20:04:40 (14664): Guest Log: [INFO] Testing connection to CMS-Frontier 2024-11-17 20:04:40 (14664): Guest Log: [INFO] Testing connection to Frontier 2024-11-17 20:04:40 (14664): Guest Log: [DEBUG] Check your firewall and your network load 2024-11-17 20:04:40 (14664): Guest Log: [ERROR] Could not connect to all required network services ... So it's just a matter of time before we see a completion of the maintenance upgrades. It's a big old system y'all. Patience needed by all. |
Send message Joined: 27 Sep 08 Posts: 859 Credit: 704,340,953 RAC: 184,017 ![]() ![]() ![]() |
I can only imagine its challenging to stop the generation of work unit even if they do nothing. As guy said it it boths you don't take work from CMS for the timebeing. |
Send message Joined: 22 Mar 17 Posts: 66 Credit: 25,047,948 RAC: 35,030 ![]() ![]() |
The server hardware is being swapped around. Major reconfigurations are taking place. In other words - the machines the volunteers connect to are taken off line sporadically during this. Bro, you're replying to the LHC overall #3, #11 and #18 users with nearly 1b credit. They know how to edit BOINC preferences. |
![]() ![]() Send message Joined: 24 Oct 04 Posts: 1193 Credit: 59,394,728 RAC: 71,828 ![]() ![]() |
Not doing any actual work but they still use the usual memory to run them.......here is when one just finished and a new one starts in the first 4 minutes ![]() |
Send message Joined: 2 May 07 Posts: 2260 Credit: 175,581,097 RAC: 9,467 ![]() ![]() ![]() |
Have reduced to one Atlas-Task for one day. Testing this every 24 hour. All other projects deselected. |
©2025 CERN