Message boards : CMS Application : no new WUs available
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 20 · Next

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1689
Credit: 103,908,217
RAC: 121,874
Message 36836 - Posted: 23 Sep 2018, 14:12:55 UTC - in response to Message 36834.  

... To be fair, I've not seen any announcement from CMS suggesting the mass of us re-attach...
but for which other reason would they fill the queue with new tasks?

To test stuff that can be tested only when the queue is full.
so then at least it would not hurt if there's some kind of announcement that it's not sure whether those tasks are faulty or not.
So every volunteer can decide whether to participate in the testing or not.
ID: 36836 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36840 - Posted: 23 Sep 2018, 16:31:35 UTC - in response to Message 36836.  

... To be fair, I've not seen any announcement from CMS suggesting the mass of us re-attach...
but for which other reason would they fill the queue with new tasks?

To test stuff that can be tested only when the queue is full.
so then at least it would not hurt if there's some kind of announcement that it's not sure whether those tasks are faulty or not.
So every volunteer can decide whether to participate in the testing or not.

True, it would not hurt. The question is this.... would it help? Would it help even 50% of volunteers? Even 25%? How many volunteers come here every day and read every post or news item?

Aside from that, they did post warnings some time ago. I don't know what you got from that warning but it seemed obvious to me that they could not locate the problem any other way and that CMS would be a problem until further notice. I don't know how you can claim that tasks in a queue is equivalent to an "OK, it's fixed" notice.
ID: 36840 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1689
Credit: 103,908,217
RAC: 121,874
Message 36842 - Posted: 23 Sep 2018, 17:32:12 UTC - in response to Message 36840.  

I don't know how you can claim that tasks in a queue is equivalent to an "OK, it's fixed" notice.
sorry, but to me this would be logical. Maybe though that my logic is wrong.

At any rate, with these recent tasks the very bad thing was that they were running for more than 12 hours, and then they failed.
If you now calculate 12+ hours times so and so many tasks crunched by so and so many users: most probably thousands of hours of CPU time for nothing.
Not really the best method to find out whether tasks are okay or faulty.
Sorry, that's my modest opinion.
ID: 36842 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36845 - Posted: 23 Sep 2018, 19:28:06 UTC - in response to Message 36842.  

I don't know how you can claim that tasks in a queue is equivalent to an "OK, it's fixed" notice.
sorry, but to me this would be logical. Maybe though that my logic is wrong.

At any rate, with these recent tasks the very bad thing was that they were running for more than 12 hours, and then they failed.
If you now calculate 12+ hours times so and so many tasks crunched by so and so many users: most probably thousands of hours of CPU time for nothing.
Not really the best method to find out whether tasks are okay or faulty.
Sorry, that's my modest opinion.

But if it's the only way then it's the only way. That's not an opinion. That's a fact.
ID: 36845 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1689
Credit: 103,908,217
RAC: 121,874
Message 36848 - Posted: 23 Sep 2018, 20:16:40 UTC - in response to Message 36845.  

But if it's the only way then it's the only way..
this may be right. However, it would definitely be sufficient to send out 100 or 200 tasks for the beginning and watch the results. And if all 100 or 200 fail, then everything is clear.
It definitely doesn't need to waste thousands of hours of CPU time (according to the info from the Project Status Page, 1.118 tasks are still in at this moment).
ID: 36848 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36849 - Posted: 23 Sep 2018, 20:39:22 UTC - in response to Message 36848.  

It definitely doesn't need to waste thousands of hours of CPU time (according to the info from the Project Status Page, 1.118 tasks are still in at this moment).

Definitely does not? If I thought there was a possibility that you know all the details of why CMS isn't working and knowledge of all the possible failure modes and how to test those modes then I might agree with you. But I think you don't have that knowledge.
ID: 36849 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1689
Credit: 103,908,217
RAC: 121,874
Message 36889 - Posted: 26 Sep 2018, 10:43:50 UTC - in response to Message 36849.  

It definitely doesn't need to waste thousands of hours of CPU time (according to the info from the Project Status Page, 1.118 tasks are still in at this moment).

Definitely does not? If I thought there was a possibility that you know all the details of why CMS isn't working and knowledge of all the possible failure modes and how to test those modes then I might agree with you. But I think you don't have that knowledge.
sorry, I fully disagree. For sure one doesn't need thousands of tasks for testing by the volunteers, in order to find out whether they work or fail. Sending out a few hundred and looking into what comes back should be more than sufficient.

BTW, right now there are new tasks for download again. And I'll keep my fingers away from them.
ID: 36889 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36893 - Posted: 26 Sep 2018, 16:14:07 UTC - in response to Message 36889.  

sorry, I fully disagree. For sure one doesn't need thousands of tasks for testing by the volunteers, in order to find out whether they work or fail.
But they've known for months that they fail. The problem they are trying to solve is not "do they fail?" but rather "why do they fail" and for that it seems they need to examine thousands of failures.

Sending out a few hundred and looking into what comes back should be more than sufficient.

Sufficient maybe for what you think the problem might be but perhaps the problem is even more complicated than you are able to envision.

BTW, right now there are new tasks for download again. And I'll keep my fingers away from them.

Awww come on, be a sport. You have more than enough credits already. Give up some CPU cycles and help them crack this tough nut.
ID: 36893 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1689
Credit: 103,908,217
RAC: 121,874
Message 36896 - Posted: 26 Sep 2018, 16:40:38 UTC - in response to Message 36893.  

:-) :-) :-)
ID: 36896 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1689
Credit: 103,908,217
RAC: 121,874
Message 36949 - Posted: 4 Oct 2018, 13:25:55 UTC

a few hours ago, new CMS tasks were made available. However, after about 10 minutes, they fail with 206 (0x000000CE) EXIT_INIT_FAILURE

excerpt from stderr:

2018-10-04 15:07:36 (7608): Guest Log: AUTHENTICATE:1003:Failed to authenticate with any method
2018-10-04 15:07:36 (7608): Guest Log: AUTHENTICATE:1004:Failed to authenticate using GSI
2018-10-04 15:07:36 (7608): Guest Log: GSI:5004:Failed to authenticate. Globus is reporting error (655360:13)
2018-10-04 15:07:36 (7608): Guest Log: 10/04/18 15:07:31 recognized DC_NOP as command name, using command 60011.
2018-10-04 15:07:36 (7608): Guest Log: 10/04/18 15:07:32 Condor GSI authentication failure
2018-10-04 15:07:36 (7608): Guest Log: GSS Major Status: Authentication Failed
2018-10-04 15:07:36 (7608): Guest Log: GSS Minor Status Error Chain:
2018-10-04 15:07:36 (7608): Guest Log: globus_gss_assist: Error during context initialization
2018-10-04 15:07:36 (7608): Guest Log: globus_gsi_gssapi: Unable to verify remote side's credentials
2018-10-04 15:07:36 (7608): Guest Log: globus_gsi_gssapi: SSLv3 handshake problems: Couldn't do ssl handshake
2018-10-04 15:07:36 (7608): Guest Log: OpenSSL Error: s3_pkt.c:1259: in library: SSL routines, function SSL3_READ_BYTES: tlsv1 alert unknown ca SSL alert number 48
2018-10-04 15:07:36 (7608): Guest Log: 10/04/18 15:07:32 SECMAN: required authentication with local collector failed, so aborting command DC_SEC_QUERY.
2018-10-04 15:07:36 (7608): Guest Log: [ERROR] Could not ping HTCondor.
2018-10-04 15:07:36 (7608): Guest Log: [INFO] Shutting Down.
ID: 36949 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1689
Credit: 103,908,217
RAC: 121,874
Message 36956 - Posted: 5 Oct 2018, 6:51:05 UTC

one day later, I now noticed in the Server Status page that the queue is still being filled with new tasks.
So I was wondering whether the error I experienced yesterday has been eliminated.
I downloaded one task, and 10 minutes later it failed again:

...2018-10-05 08:21:50 (3056): Guest Log: AUTHENTICATE:1003:Failed to authenticate with any method
2018-10-05 08:21:50 (3056): Guest Log: AUTHENTICATE:1004:Failed to authenticate using GSI
2018-10-05 08:21:50 (3056): Guest Log: GSI:5004:Failed to authenticate. Globus is reporting error (655360:13)

...
I am really questioning what's the reasoning behind issuing that many faulty tasks.
ID: 36956 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 374
Credit: 238,712
RAC: 0
Message 36958 - Posted: 5 Oct 2018, 8:07:28 UTC - in response to Message 36956.  

I am really questioning what's the reasoning behind issuing that many faulty tasks.


There should not be any tasks. Yesterday there was an issue with the hypervisor running the server VM. We switched to the backup server but the original is active but not accessible. It has gone rogue. I have deprecated the CMS app to stop the Tasks being download.
ID: 36958 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1689
Credit: 103,908,217
RAC: 121,874
Message 37839 - Posted: 27 Jan 2019, 15:08:43 UTC

can anyone from LHC@home tell us when - if at all - CMS will be back?

Thanks in advance for any information.
ID: 37839 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1689
Credit: 103,908,217
RAC: 121,874
Message 38010 - Posted: 14 Feb 2019, 11:55:55 UTC - in response to Message 37839.  

can anyone from LHC@home tell us when - if at all - CMS will be back?

Thanks in advance for any information.
would be grateful for any kind of reply
ID: 38010 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1689
Credit: 103,908,217
RAC: 121,874
Message 39112 - Posted: 13 Jun 2019, 6:47:24 UTC

The CMS queue ran dry.
Same is true for Theory.

Any major problem over there about which we should know?
ID: 39112 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1006
Credit: 6,272,230
RAC: 352
Message 39122 - Posted: 13 Jun 2019, 20:37:06 UTC - in response to Message 39112.  
Last modified: 13 Jun 2019, 20:53:53 UTC

The CMS queue ran dry.
Same is true for Theory.

Any major problem over there about which we should know?

Oh, yes, sorry; I see that now. No reason I'm aware of. I'll alert CERN IT. My monitors say there are jobs available so the question is why are we not sending BOINC tasks?
[Edit] emails sent. Tasks are still being sent from the -dev platform. so something's gone awry with LHC@Home. [/Edit]
ID: 39122 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2413
Credit: 226,499,689
RAC: 131,832
Message 39124 - Posted: 14 Jun 2019, 5:30:57 UTC

Got a few between 22:09 and 23:29 UTC last night but since then the queue seems to be empty again.
ID: 39124 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1006
Credit: 6,272,230
RAC: 352
Message 39126 - Posted: 14 Jun 2019, 10:41:02 UTC - in response to Message 39124.  

Got a few between 22:09 and 23:29 UTC last night but since then the queue seems to be empty again.

OK, a change of password, inter alia, had stopped the task submission script from working. Tasks are available again now.
ID: 39126 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2413
Credit: 226,499,689
RAC: 131,832
Message 39127 - Posted: 14 Jun 2019, 11:32:02 UTC - in response to Message 39126.  

Thanks.
Back in the game since 9:12 UTC.
ID: 39127 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2413
Credit: 226,499,689
RAC: 131,832
Message 39297 - Posted: 6 Jul 2019, 9:25:48 UTC

Since this morning all CMS tasks fail with EXIT_NO_SUB_TASKS
ID: 39297 · Report as offensive     Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 20 · Next

Message boards : CMS Application : no new WUs available


©2024 CERN