Author | Message |
computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert

Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,886,869 RAC: 54,760
|
I am currently seeing dozens of downloads like the following while the CPU is nearly idle:
[03/Feb/2017:11:44:44 +0100] "GET http://vc-cms-output.cs3.cern.ch/unmerged/logs/prod/2017/2/3/ireid_MonteCarlo_eff_IDR_CMS_Home_170129_122247_3856/Production/0023/0/5c914ef8-e9d5-11e6-9115-02163e018309-623-0-logArchive.tar.gz? HTTP/1.1" 200 162500 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:11:44:50 +0100] "GET http://vc-cms-output.cs3.cern.ch/unmerged/logs/prod/2017/2/3/ireid_MonteCarlo_eff_IDR_CMS_Home_170129_122247_3856/Production/0023/0/5c914ef8-e9d5-11e6-9115-02163e018309-912-0-logArchive.tar.gz? HTTP/1.1" 200 164637 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:11:44:56 +0100] "GET http://vc-cms-output.cs3.cern.ch/unmerged/logs/prod/2017/2/3/ireid_MonteCarlo_eff_IDR_CMS_Home_170129_122247_3856/Production/0023/0/5c914ef8-e9d5-11e6-9115-02163e018309-702-0-logArchive.tar.gz? HTTP/1.1" 200 163641 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:11:45:02 +0100] "GET http://vc-cms-output.cs3.cern.ch/unmerged/logs/prod/2017/2/3/ireid_MonteCarlo_eff_IDR_CMS_Home_170129_122247_3856/Production/0024/0/d852627a-e9e4-11e6-9115-02163e018309-122-0-logArchive.tar.gz? HTTP/1.1" 200 162555 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:11:45:08 +0100] "GET http://vc-cms-output.cs3.cern.ch/unmerged/logs/prod/2017/2/3/ireid_MonteCarlo_eff_IDR_CMS_Home_170129_122247_3856/Production/0023/0/5c914ef8-e9d5-11e6-9115-02163e018309-620-0-logArchive.tar.gz? HTTP/1.1" 200 163602 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:11:45:14 +0100] "GET http://vc-cms-output.cs3.cern.ch/unmerged/logs/prod/2017/2/3/ireid_MonteCarlo_eff_IDR_CMS_Home_170129_122247_3856/Production/0023/0/5c914ef8-e9d5-11e6-9115-02163e018309-532-0-logArchive.tar.gz? HTTP/1.1" 200 165181 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:11:45:20 +0100] "GET http://vc-cms-output.cs3.cern.ch/unmerged/logs/prod/2017/2/3/ireid_MonteCarlo_eff_IDR_CMS_Home_170129_122247_3856/Production/0023/0/5c914ef8-e9d5-11e6-9115-02163e018309-512-0-logArchive.tar.gz? HTTP/1.1" 200 161051 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:11:45:26 +0100] "GET http://vc-cms-output.cs3.cern.ch/unmerged/logs/prod/2017/2/3/ireid_MonteCarlo_eff_IDR_CMS_Home_170129_122247_3856/Production/0023/0/5c914ef8-e9d5-11e6-9115-02163e018309-613-0-logArchive.tar.gz? HTTP/1.1" 200 165700 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:11:45:32 +0100] "GET http://vc-cms-output.cs3.cern.ch/unmerged/logs/prod/2017/2/3/ireid_MonteCarlo_eff_IDR_CMS_Home_170129_122247_3856/Production/0024/0/d852627a-e9e4-11e6-9115-02163e018309-57-0-logArchive.tar.gz? HTTP/1.1" 200 162480 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:11:45:38 +0100] "GET http://vc-cms-output.cs3.cern.ch/unmerged/logs/prod/2017/2/3/ireid_MonteCarlo_eff_IDR_CMS_Home_170129_122247_3856/Production/0023/0/5c914ef8-e9d5-11e6-9115-02163e018309-197-0-logArchive.tar.gz? HTTP/1.1" 200 163014 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:11:45:44 +0100] "GET http://vc-cms-output.cs3.cern.ch/unmerged/logs/prod/2017/2/3/ireid_MonteCarlo_eff_IDR_CMS_Home_170129_122247_3856/Production/0023/0/5c914ef8-e9d5-11e6-9115-02163e018309-989-0-logArchive.tar.gz? HTTP/1.1" 200 162294 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:11:45:50 +0100] "GET http://vc-cms-output.cs3.cern.ch/unmerged/logs/prod/2017/2/3/ireid_MonteCarlo_eff_IDR_CMS_Home_170129_122247_3856/Production/0023/0/5c914ef8-e9d5-11e6-9115-02163e018309-984-0-logArchive.tar.gz? HTTP/1.1" 200 161827 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:11:45:56 +0100] "GET http://vc-cms-output.cs3.cern.ch/unmerged/logs/prod/2017/2/3/ireid_MonteCarlo_eff_IDR_CMS_Home_170129_122247_3856/Production/0023/0/5c914ef8-e9d5-11e6-9115-02163e018309-713-0-logArchive.tar.gz? HTTP/1.1" 200 166123 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:11:46:01 +0100] "GET http://vc-cms-output.cs3.cern.ch/unmerged/logs/prod/2017/2/3/ireid_MonteCarlo_eff_IDR_CMS_Home_170129_122247_3856/Production/0023/0/5c914ef8-e9d5-11e6-9115-02163e018309-606-0-logArchive.tar.gz? HTTP/1.1" 200 162029 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:11:46:08 +0100] "GET http://vc-cms-output.cs3.cern.ch/unmerged/logs/prod/2017/2/3/ireid_MonteCarlo_eff_IDR_CMS_Home_170129_122247_3856/Production/0024/0/d852627a-e9e4-11e6-9115-02163e018309-39-0-logArchive.tar.gz? HTTP/1.1" 200 165348 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
They seem to belong to a merge job.
Shouldn´t those jobs kept inside the CERN network?
|
|
computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert

Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,886,869 RAC: 54,760
|
After the download of more than 500 files the job got stuck.
I noticed that the VM requested access to TCP port 1094.
This was previously not documented.
Any comments from the developers?
|
|
ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist

Send message Joined: 29 Aug 05 Posts: 1110 Credit: 9,448,850 RAC: 8,830
|
Yes, those are certainly unmerged jobs in the downloads, and no, they shouldn't be getting out. I'll make sure the crew is aware.
[Edit] Oh, hang on, those are log files! What's going on here? [/Edit]
|
|
ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist

Send message Joined: 29 Aug 05 Posts: 1110 Credit: 9,448,850 RAC: 8,830
|
After the download of more than 500 files the job got stuck.
I noticed that the VM requested access to TCP port 1094.
This was previously not documented.
Any comments from the developers?
Port 1094 is for rootd access, directly accessing remote files from ROOT programmes (and others). It's used in CMSSW to access data (.root) files stored on remote systems.
This is all very strange.
|
|
computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert

Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,886,869 RAC: 54,760
|
After the download of more than 500 files the job got stuck.
I noticed that the VM requested access to TCP port 1094.
This was previously not documented.
Any comments from the developers?
Port 1094 is for rootd access, directly accessing remote files from ROOT programmes (and others). It's used in CMSSW to access data (.root) files stored on remote systems.
This is all very strange.
My VMs produce .root URLs as output of every job.
Normally they upload completely although TCP port 1094 is closed - except the highlighted one below.
That one would have been an error 151 some weeks before (you already explained that).
[03/Feb/2017:08:32:56 +0100] "PUT http://vc-cms-output.cs3.cern.ch/unmerged/DMWM_Test/QCD_Pt-40toInf_fwdJet_bwdJet_Tune4C_2p76TeV-pythia8/GEN-SIM/MonteCarlo_eff_CMS_Home_IDR_v2-v11/00022/9AE1FACC-D5E9-E611-A20A-080027DA302A.root? HTTP/1.1" 0 75423829 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS_ABORTED:HIER_DIRECT
[03/Feb/2017:08:58:25 +0100] "PUT http://vc-cms-output.cs3.cern.ch/unmerged/DMWM_Test/QCD_Pt-40toInf_fwdJet_bwdJet_Tune4C_2p76TeV-pythia8/GEN-SIM/MonteCarlo_eff_CMS_Home_IDR_v2-v11/00022/001B67D8-D7E9-E611-9243-080027AFEF7C.root? HTTP/1.1" 200 68874340 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:10:06:12 +0100] "PUT http://vc-cms-output.cs3.cern.ch/unmerged/DMWM_Test/QCD_Pt-40toInf_fwdJet_bwdJet_Tune4C_2p76TeV-pythia8/GEN-SIM/MonteCarlo_eff_CMS_Home_IDR_v2-v11/00023/E2348D5F-E3E9-E611-A954-080027DA302A.root? HTTP/1.1" 200 75157412 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
I would like to close TCP port 1094 if it is not necessary for normal tasks.
|
|
ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist

Send message Joined: 29 Aug 05 Posts: 1110 Credit: 9,448,850 RAC: 8,830
|
After the download of more than 500 files the job got stuck.
I noticed that the VM requested access to TCP port 1094.
This was previously not documented.
Any comments from the developers?
Port 1094 is for rootd access, directly accessing remote files from ROOT programmes (and others). It's used in CMSSW to access data (.root) files stored on remote systems.
This is all very strange.
My VMs produce .root URLs as output of every job.
Normally they upload completely although TCP port 1094 is closed - except the highlighted one below.
That one would have been an error 151 some weeks before (you already explained that).
As far as I'm aware, we don't use (x)rootd to write files, we stage-out with gfal-cp. rootd is usually used to read files which are to be further processed, it's the mechanism by which we can access our data files on remotes systems, and by using xrootd "redirectors" we don't even need to know where they are.
[03/Feb/2017:08:32:56 +0100] "PUT http://vc-cms-output.cs3.cern.ch/unmerged/DMWM_Test/QCD_Pt-40toInf_fwdJet_bwdJet_Tune4C_2p76TeV-pythia8/GEN-SIM/MonteCarlo_eff_CMS_Home_IDR_v2-v11/00022/9AE1FACC-D5E9-E611-A20A-080027DA302A.root? HTTP/1.1" 0 75423829 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS_ABORTED:HIER_DIRECT
[03/Feb/2017:08:58:25 +0100] "PUT http://vc-cms-output.cs3.cern.ch/unmerged/DMWM_Test/QCD_Pt-40toInf_fwdJet_bwdJet_Tune4C_2p76TeV-pythia8/GEN-SIM/MonteCarlo_eff_CMS_Home_IDR_v2-v11/00022/001B67D8-D7E9-E611-9243-080027AFEF7C.root? HTTP/1.1" 200 68874340 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
[03/Feb/2017:10:06:12 +0100] "PUT http://vc-cms-output.cs3.cern.ch/unmerged/DMWM_Test/QCD_Pt-40toInf_fwdJet_bwdJet_Tune4C_2p76TeV-pythia8/GEN-SIM/MonteCarlo_eff_CMS_Home_IDR_v2-v11/00023/E2348D5F-E3E9-E611-A954-080027DA302A.root? HTTP/1.1" 200 75157412 "-" "gfal2-util/1.3.2 gfal2/2.11.1 neon/0.0.29" TCP_MISS:HIER_DIRECT
I would like to close TCP port 1094 if it is not necessary for normal tasks.
As your examples show, the gfal/davs protocols use HTTP (I believe over port 80) so I can't think of a reason for 1094 to be open.
|
|
ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist

Send message Joined: 29 Aug 05 Posts: 1110 Credit: 9,448,850 RAC: 8,830
|
Yes, those are certainly unmerged jobs in the downloads, and no, they shouldn't be getting out. I'll make sure the crew is aware.
[Edit] Oh, hang on, those are log files! What's going on here? [/Edit]
Ah! I've just found out that as well as Merge jobs there are also LogCollect jobs. I'll ask Laurence to try to identify them so they can be kept within CERN as well.
|
|
computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert

Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,886,869 RAC: 54,760
|
... to try to identify them so they can be kept within CERN as well.
+1
|
|
ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist

Send message Joined: 29 Aug 05 Posts: 1110 Credit: 9,448,850 RAC: 8,830
|
OK, Laurence has just told me that LogCollect jobs are now being excluded from Volunteer machines as well as Merge jobs. From a scan of the Condor queue on Friday afternoon, I believe these are the only two categories of CMS_JobType apart from Production (which is the category Volunteers can process). If anyone spots another category, please let me know!
|
|
computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert

Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,886,869 RAC: 54,760
|
|
|