Message boards :
CMS Application :
no new WUs available
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 24 · Next
Author | Message |
---|---|
Send message Joined: 18 Dec 15 Posts: 1831 Credit: 119,660,110 RAC: 48,822 |
so then at least it would not hurt if there's some kind of announcement that it's not sure whether those tasks are faulty or not.... To be fair, I've not seen any announcement from CMS suggesting the mass of us re-attach...but for which other reason would they fill the queue with new tasks? So every volunteer can decide whether to participate in the testing or not. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
so then at least it would not hurt if there's some kind of announcement that it's not sure whether those tasks are faulty or not.... To be fair, I've not seen any announcement from CMS suggesting the mass of us re-attach...but for which other reason would they fill the queue with new tasks? True, it would not hurt. The question is this.... would it help? Would it help even 50% of volunteers? Even 25%? How many volunteers come here every day and read every post or news item? Aside from that, they did post warnings some time ago. I don't know what you got from that warning but it seemed obvious to me that they could not locate the problem any other way and that CMS would be a problem until further notice. I don't know how you can claim that tasks in a queue is equivalent to an "OK, it's fixed" notice. |
Send message Joined: 18 Dec 15 Posts: 1831 Credit: 119,660,110 RAC: 48,822 |
I don't know how you can claim that tasks in a queue is equivalent to an "OK, it's fixed" notice.sorry, but to me this would be logical. Maybe though that my logic is wrong. At any rate, with these recent tasks the very bad thing was that they were running for more than 12 hours, and then they failed. If you now calculate 12+ hours times so and so many tasks crunched by so and so many users: most probably thousands of hours of CPU time for nothing. Not really the best method to find out whether tasks are okay or faulty. Sorry, that's my modest opinion. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
I don't know how you can claim that tasks in a queue is equivalent to an "OK, it's fixed" notice.sorry, but to me this would be logical. Maybe though that my logic is wrong. But if it's the only way then it's the only way. That's not an opinion. That's a fact. |
Send message Joined: 18 Dec 15 Posts: 1831 Credit: 119,660,110 RAC: 48,822 |
But if it's the only way then it's the only way..this may be right. However, it would definitely be sufficient to send out 100 or 200 tasks for the beginning and watch the results. And if all 100 or 200 fail, then everything is clear. It definitely doesn't need to waste thousands of hours of CPU time (according to the info from the Project Status Page, 1.118 tasks are still in at this moment). |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
It definitely doesn't need to waste thousands of hours of CPU time (according to the info from the Project Status Page, 1.118 tasks are still in at this moment). Definitely does not? If I thought there was a possibility that you know all the details of why CMS isn't working and knowledge of all the possible failure modes and how to test those modes then I might agree with you. But I think you don't have that knowledge. |
Send message Joined: 18 Dec 15 Posts: 1831 Credit: 119,660,110 RAC: 48,822 |
sorry, I fully disagree. For sure one doesn't need thousands of tasks for testing by the volunteers, in order to find out whether they work or fail. Sending out a few hundred and looking into what comes back should be more than sufficient.It definitely doesn't need to waste thousands of hours of CPU time (according to the info from the Project Status Page, 1.118 tasks are still in at this moment). BTW, right now there are new tasks for download again. And I'll keep my fingers away from them. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
sorry, I fully disagree. For sure one doesn't need thousands of tasks for testing by the volunteers, in order to find out whether they work or fail.But they've known for months that they fail. The problem they are trying to solve is not "do they fail?" but rather "why do they fail" and for that it seems they need to examine thousands of failures. Sending out a few hundred and looking into what comes back should be more than sufficient. Sufficient maybe for what you think the problem might be but perhaps the problem is even more complicated than you are able to envision. BTW, right now there are new tasks for download again. And I'll keep my fingers away from them. Awww come on, be a sport. You have more than enough credits already. Give up some CPU cycles and help them crack this tough nut. |
Send message Joined: 18 Dec 15 Posts: 1831 Credit: 119,660,110 RAC: 48,822 |
:-) :-) :-) |
Send message Joined: 18 Dec 15 Posts: 1831 Credit: 119,660,110 RAC: 48,822 |
a few hours ago, new CMS tasks were made available. However, after about 10 minutes, they fail with 206 (0x000000CE) EXIT_INIT_FAILURE excerpt from stderr: 2018-10-04 15:07:36 (7608): Guest Log: AUTHENTICATE:1003:Failed to authenticate with any method 2018-10-04 15:07:36 (7608): Guest Log: AUTHENTICATE:1004:Failed to authenticate using GSI 2018-10-04 15:07:36 (7608): Guest Log: GSI:5004:Failed to authenticate. Globus is reporting error (655360:13) 2018-10-04 15:07:36 (7608): Guest Log: 10/04/18 15:07:31 recognized DC_NOP as command name, using command 60011. 2018-10-04 15:07:36 (7608): Guest Log: 10/04/18 15:07:32 Condor GSI authentication failure 2018-10-04 15:07:36 (7608): Guest Log: GSS Major Status: Authentication Failed 2018-10-04 15:07:36 (7608): Guest Log: GSS Minor Status Error Chain: 2018-10-04 15:07:36 (7608): Guest Log: globus_gss_assist: Error during context initialization 2018-10-04 15:07:36 (7608): Guest Log: globus_gsi_gssapi: Unable to verify remote side's credentials 2018-10-04 15:07:36 (7608): Guest Log: globus_gsi_gssapi: SSLv3 handshake problems: Couldn't do ssl handshake 2018-10-04 15:07:36 (7608): Guest Log: OpenSSL Error: s3_pkt.c:1259: in library: SSL routines, function SSL3_READ_BYTES: tlsv1 alert unknown ca SSL alert number 48 2018-10-04 15:07:36 (7608): Guest Log: 10/04/18 15:07:32 SECMAN: required authentication with local collector failed, so aborting command DC_SEC_QUERY. 2018-10-04 15:07:36 (7608): Guest Log: [ERROR] Could not ping HTCondor. 2018-10-04 15:07:36 (7608): Guest Log: [INFO] Shutting Down. |
Send message Joined: 18 Dec 15 Posts: 1831 Credit: 119,660,110 RAC: 48,822 |
one day later, I now noticed in the Server Status page that the queue is still being filled with new tasks. So I was wondering whether the error I experienced yesterday has been eliminated. I downloaded one task, and 10 minutes later it failed again: ...2018-10-05 08:21:50 (3056): Guest Log: AUTHENTICATE:1003:Failed to authenticate with any method 2018-10-05 08:21:50 (3056): Guest Log: AUTHENTICATE:1004:Failed to authenticate using GSI 2018-10-05 08:21:50 (3056): Guest Log: GSI:5004:Failed to authenticate. Globus is reporting error (655360:13) ... I am really questioning what's the reasoning behind issuing that many faulty tasks. |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
I am really questioning what's the reasoning behind issuing that many faulty tasks. There should not be any tasks. Yesterday there was an issue with the hypervisor running the server VM. We switched to the backup server but the original is active but not accessible. It has gone rogue. I have deprecated the CMS app to stop the Tasks being download. |
Send message Joined: 18 Dec 15 Posts: 1831 Credit: 119,660,110 RAC: 48,822 |
can anyone from LHC@home tell us when - if at all - CMS will be back? Thanks in advance for any information. |
Send message Joined: 18 Dec 15 Posts: 1831 Credit: 119,660,110 RAC: 48,822 |
can anyone from LHC@home tell us when - if at all - CMS will be back?would be grateful for any kind of reply |
Send message Joined: 18 Dec 15 Posts: 1831 Credit: 119,660,110 RAC: 48,822 |
The CMS queue ran dry. Same is true for Theory. Any major problem over there about which we should know? |
Send message Joined: 29 Aug 05 Posts: 1065 Credit: 7,942,611 RAC: 14,859 |
The CMS queue ran dry. Oh, yes, sorry; I see that now. No reason I'm aware of. I'll alert CERN IT. My monitors say there are jobs available so the question is why are we not sending BOINC tasks? [Edit] emails sent. Tasks are still being sent from the -dev platform. so something's gone awry with LHC@Home. [/Edit] |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,464,569 RAC: 67,494 |
Got a few between 22:09 and 23:29 UTC last night but since then the queue seems to be empty again. |
Send message Joined: 29 Aug 05 Posts: 1065 Credit: 7,942,611 RAC: 14,859 |
|
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,464,569 RAC: 67,494 |
Thanks. Back in the game since 9:12 UTC. |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,464,569 RAC: 67,494 |
Since this morning all CMS tasks fail with EXIT_NO_SUB_TASKS |
©2025 CERN