Message boards : CMS Application : Jobs in the Queue / VirtualBox 5.1.14
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,464,258
RAC: 5,837
Message 28556 - Posted: 19 Jan 2017, 15:07:50 UTC
Last modified: 19 Jan 2017, 15:08:59 UTC

As part of the race I'm running CMS-Tasks.

On some machines it looks like they are working, but one machine produces only errors.

Are you out of jobs in the moment ?

The machine had VirtualBox 5.0.x installed, I have upgraded it to 5.1.14. TheoryTasks are running fine on this box.

So, the question is, what is the reason? Out of Jobs or VirtualBox 5.1.14 ?

Any idea ?

EDIT: This is the box: https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10338055


Supporting BOINC, a great concept !
ID: 28556 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1004
Credit: 6,268,179
RAC: 264
Message 28557 - Posted: 19 Jan 2017, 15:36:00 UTC - in response to Message 28556.  

There are definitely jobs. I'm running 5.1.10 on my Linux boxes; I can try an upgrade if no-one else has any suggestions. I'm presuming you've checked its firewall? :-)
ID: 28557 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,328,050
RAC: 122,965
Message 28558 - Posted: 19 Jan 2017, 15:46:15 UTC - in response to Message 28556.  

The new process (wmagent) communicates via TCP port 4080.
Is it open?
ID: 28558 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1004
Credit: 6,268,179
RAC: 264
Message 28559 - Posted: 19 Jan 2017, 15:54:54 UTC - in response to Message 28558.  

The new process (wmagent) communicates via TCP port 4080.
Is it open?

The volunteers don't talk directly to wmagent, do they? In the machine logs it's successfully contacted the condor server.
ID: 28559 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,328,050
RAC: 122,965
Message 28560 - Posted: 19 Jan 2017, 16:04:28 UTC - in response to Message 28559.  

The new process (wmagent) communicates via TCP port 4080.
Is it open?

The volunteers don't talk directly to wmagent, do they? In the machine logs it's successfully contacted the condor server.

My firewall currently shows open connections between the VMs and 188.184.82.11 on TCP port 4080.
Must be either CMS or ATLAS.
ID: 28560 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1004
Credit: 6,268,179
RAC: 264
Message 28561 - Posted: 19 Jan 2017, 16:12:04 UTC - in response to Message 28560.  
Last modified: 19 Jan 2017, 16:23:25 UTC

The new process (wmagent) communicates via TCP port 4080.
Is it open?

The volunteers don't talk directly to wmagent, do they? In the machine logs it's successfully contacted the condor server.

My firewall currently shows open connections between the VMs and 188.184.82.11 on TCP port 4080.
Must be either CMS or ATLAS.

vocms0159.cern.ch: vocms0159 (Special condor pool, CMS@HOME project)

YLSNED! Perhaps we should be testing this link at startup too?

BTW, I updated to 5.1.14 on one box and the tasks resumed nicely. Hmm, that box doesn't have any connections to vocms1059 but the other one does. Perhaps it's something needed at start-up that can then go away if the task pauses. OK, there it is in StarterLog, seems to be at the start of each job:
01/19/17 09:00:15 (pid:4142) Submitting machine is "vocms0159.cern.ch"
01/19/17 09:00:15 (pid:4142) setting the orig job name in starter

ID: 28561 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,464,258
RAC: 5,837
Message 28562 - Posted: 19 Jan 2017, 17:18:07 UTC

on my personal machine I see (on ALT-F5):



meanwhile this error has gone

On ALT-F2 now I can see it running a job again


Supporting BOINC, a great concept !
ID: 28562 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,464,258
RAC: 5,837
Message 28563 - Posted: 19 Jan 2017, 17:24:54 UTC - in response to Message 28558.  
Last modified: 19 Jan 2017, 17:25:25 UTC

The new process (wmagent) communicates via TCP port 4080.
Is it open?

No, it was not really open.

When I configured last time my firewall this port was not in the official list.

How can other PCs of my network crunch successfull CMS as long as this port was closed ?


Supporting BOINC, a great concept !
ID: 28563 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,328,050
RAC: 122,965
Message 28568 - Posted: 19 Jan 2017, 18:46:43 UTC

I also noticed the HTTP 404 temporarily during the upload of a job result.
When a new job starts the console shows the log of this new job.

TCP port 4080 was introduced with CMS version 47.80 in the dev-project.
See: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=329&postid=4542
ID: 28568 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,464,258
RAC: 5,837
Message 28569 - Posted: 19 Jan 2017, 18:52:57 UTC - in response to Message 28568.  

TCP port 4080 was introduced with CMS version 47.80 in the dev-project.
See: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=329&postid=4542

Here we are running 47.60, so it should not be this port.

Regarding this port I have made an enhancement-request on the forum.

At the moment my box below mentioned crunches Theory without problems, so would be interesting to find out what the problem really is


Supporting BOINC, a great concept !
ID: 28569 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,328,050
RAC: 122,965
Message 28570 - Posted: 19 Jan 2017, 19:13:51 UTC - in response to Message 28569.  

TCP port 4080 was introduced with CMS version 47.80 in the dev-project.
See: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=329&postid=4542

Here we are running 47.60, so it should not be this port.

Regarding this port I have made an enhancement-request on the forum.

At the moment my box below mentioned crunches Theory without problems, so would be interesting to find out what the problem really is

The WU-internal job distribution process changed (as I understand) from CRAB to WMAgent. This affects also version 47.60.
WMAgent needs port 4080.

The dev-project uses WMAgent since version 47.80.
Theory uses a different job distribution process and is therefore not affected.

The port is listed on the FAQ page since 2016-12-20.
ID: 28570 · Report as offensive     Reply Quote
BITLab Argo

Send message
Joined: 16 Jul 05
Posts: 24
Credit: 35,251,537
RAC: 0
Message 28623 - Posted: 23 Jan 2017, 0:09:31 UTC - in response to Message 28569.  

At the moment my box below mentioned crunches Theory without problems, so would be interesting to find out what the problem really is


I have that too: 4 identical boxes - 4 crunch Theory, but only 3 will crunch CMS.
And the last again simply gives the impression that Condor has no jobs available, no concrete error reported anywhere.
ID: 28623 · Report as offensive     Reply Quote

Message boards : CMS Application : Jobs in the Queue / VirtualBox 5.1.14


©2024 CERN