Message boards : Theory Application : New version 300.00
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 336
Credit: 237,918
RAC: 0
Message 40395 - Posted: 11 Nov 2019, 14:25:03 UTC

With this new version, the real Theory job goes via the BOINC server rather than being pulled into the VM with HTCondor. Jobs will last on average 2 hours rather than 12.
ID: 40395 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 951
Credit: 6,323,990
RAC: 1,942
Message 40397 - Posted: 11 Nov 2019, 14:54:15 UTC - in response to Message 40395.  

It seems 32-bit is no longer supported !?
ID: 40397 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1443
Credit: 76,490,486
RAC: 100,214
Message 40398 - Posted: 11 Nov 2019, 14:59:00 UTC
Last modified: 11 Nov 2019, 15:04:55 UTC

I get nothing but:
Mo 11 Nov 2019 15:53:23 CET | LHC@home | No tasks are available for Theory Simulation

Sure the server has been restarted?
Does the app_version from the preferences page point to the new app_version?


<edit>
Just got a task.
</edit>
ID: 40398 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 336
Credit: 237,918
RAC: 0
Message 40399 - Posted: 11 Nov 2019, 15:39:39 UTC - in response to Message 40397.  

Will be added tomorrow.
ID: 40399 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 951
Credit: 6,323,990
RAC: 1,942
Message 40401 - Posted: 11 Nov 2019, 22:21:48 UTC

I did not see this during testing on LHCathome-dev and here for the first time.
The INFO-line with job-name normally shown after the job started in Console ALT-F1 was empty.
Job name was: ===> [runRivet] Mon Nov 11 17:42:44 UTC 2019 [boinc pp jets 7000 80,-,1760 - pythia6 6.428 z2 100000 157]
and this INFO was also missing in the result: https://lhcathome.cern.ch/lhcathome/result.php?resultid=251561245
ID: 40401 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1443
Credit: 76,490,486
RAC: 100,214
Message 40402 - Posted: 12 Nov 2019, 9:11:11 UTC

Since this is a singlecore app I expect the credit points to be more realistic.
This helps volunteers running hosts with lesser cores.

++1
ID: 40402 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 951
Credit: 6,323,990
RAC: 1,942
Message 40403 - Posted: 12 Nov 2019, 15:07:30 UTC

I saw you made vbox32 version available.

The tasks are ready after 2 minutes and valid ??
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10416365

Missing file or directory:

ID: 40403 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 951
Credit: 6,323,990
RAC: 1,942
Message 40412 - Posted: 12 Nov 2019, 20:45:33 UTC

Error while computing: https://lhcathome.cern.ch/lhcathome/result.php?resultid=251612143
2019-11-12 11:16:20 (5988): Guest Log: 11:16:26 CET +01:00 2019-11-12: cranky: [INFO] ===> [runRivet] Tue Nov 12 10:16:24 UTC 2019 [boinc pp jets 8000 25 - pythia6 6.428 365 100000 174]

2019-11-12 12:57:43 (5988): Status Report: Job Duration: '691200.000000'
2019-11-12 12:57:43 (5988): Status Report: Elapsed Time: '6000.951002'
2019-11-12 12:57:43 (5988): Status Report: CPU Time: '6121.385639'
2019-11-12 13:36:00 (5988): Guest Log: 13:34:05 CET +01:00 2019-11-12: cranky: [ERROR] Container 'runc' terminated with status code 1.

2019-11-12 13:36:00 (5988): Guest Log: [ERROR] Job Failed
ID: 40412 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 336
Credit: 237,918
RAC: 0
Message 40413 - Posted: 12 Nov 2019, 21:14:01 UTC - in response to Message 40403.  

I saw you made vbox32 version available.

The tasks are ready after 2 minutes and valid ??
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10416365

Missing file or directory:


Thanks for the alert. Will fix it tomorrow.
ID: 40413 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 336
Credit: 237,918
RAC: 0
Message 40421 - Posted: 13 Nov 2019, 9:01:50 UTC - in response to Message 40413.  
Last modified: 13 Nov 2019, 9:01:57 UTC

Thanks for the alert. Will fix it tomorrow.

Should be fixed. Need to check.
ID: 40421 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 951
Credit: 6,323,990
RAC: 1,942
Message 40429 - Posted: 13 Nov 2019, 11:48:31 UTC - in response to Message 40421.  
Last modified: 13 Nov 2019, 15:13:00 UTC

Thanks for the alert. Will fix it tomorrow.

Should be fixed. Need to check.

Testing Theory vbox32:
The good thing: There is a job running now

Remarks:
- Console ALT-F2 only shows: Running job output should appear here - no events shown, although I see with ALT-F3 that agile-runmc and rivetvm.exe are running
- In BOINC Manager 'Show graphics' localhost:xxxxx (webapi address) not reachable, so no Theory pictures and no access to logs.
- Console ALT-F1 missing some lines usual in 64bits version:



Edit: Task finished.
ID: 40429 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 939
Credit: 40,262,835
RAC: 20,032
Message 40442 - Posted: 14 Nov 2019, 10:28:05 UTC

I just d/l the new 300.02 version 333.08MB vdi on my old faithful X86 and see it has that new faster estimation time so I will start that up and see how it goes and maybe the X64 on another one since this is still the 2nd 24hrs of the start of a new month on my satellite isp account and it only took about 6.5mins instead of the usual 10hrs.
ID: 40442 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1283
Credit: 23,078,312
RAC: 2,804
Message 40459 - Posted: 14 Nov 2019, 20:03:18 UTC - in response to Message 40442.  

for unknown reason, on one of the PCs (Windows 10) the new version (300.00 as well as 300.02) does not work.
The time count in the BOINC manager is working, also the progress percentage bar. However, no CPU usage.
When I push "Show VM Console" in the BOINC manager, part of the text shown is:

Warning: "cvmfs_config probe sft.cern.ch' failed
ERROR: Could not source logging functions ...


The stderr of such a task which I then aborted after 1 day 3 hours is here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=251670948

Before, various VM tasks have been running properly for years.

Anyone any idea what the problem is?
ID: 40459 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 336
Credit: 237,918
RAC: 0
Message 40466 - Posted: 15 Nov 2019, 8:58:21 UTC - in response to Message 40459.  

for unknown reason, on one of the PCs (Windows 10) the new version (300.00 as well as 300.02) does not work.
The time count in the BOINC manager is working, also the progress percentage bar. However, no CPU usage.
When I push "Show VM Console" in the BOINC manager, part of the text shown is:

Warning: "cvmfs_config probe sft.cern.ch' failed
ERROR: Could not source logging functions ...


The stderr of such a task which I then aborted after 1 day 3 hours is here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=251670948

Before, various VM tasks have been running properly for years.

Anyone any idea what the problem is?


It suggests that the VM does not have network access.
ID: 40466 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,027,000
RAC: 0
Message 40467 - Posted: 15 Nov 2019, 9:24:55 UTC - in response to Message 40466.  

Theory Simulation
Unsent 1939
In progress 13821

My notebook gets no more than 1 task.

Tried an app_config with max_project_concurrent = 4 and max_concurrent (for Theory) = 4.
Tried ncpus > 1000.
Tried work queue = 10 days.

Always "No tasks are available for Theory Simulation".

Tried 4 boinc clients.
3 clients got 1 task, 1 client got 0 task...
ID: 40467 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 336
Credit: 237,918
RAC: 0
Message 40468 - Posted: 15 Nov 2019, 12:13:29 UTC - in response to Message 40467.  

Theory Simulation
Unsent 1939
In progress 13821

My notebook gets no more than 1 task.

Tried an app_config with max_project_concurrent = 4 and max_concurrent (for Theory) = 4.
Tried ncpus > 1000.
Tried work queue = 10 days.

Always "No tasks are available for Theory Simulation".

Tried 4 boinc clients.
3 clients got 1 task, 1 client got 0 task...


I changed the server settings to reduce the number of tasks given out to avoid large work queues on hosts with permanent errors. I thought the setting was per ncpus but this may not be the case. I have updated the configuration, please try again.
ID: 40468 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,027,000
RAC: 0
Message 40471 - Posted: 15 Nov 2019, 13:31:15 UTC - in response to Message 40468.  

All right, 4 clients got 8 tasks. Now I can go back to 1 client.
ID: 40471 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1283
Credit: 23,078,312
RAC: 2,804
Message 40475 - Posted: 16 Nov 2019, 7:23:57 UTC - in response to Message 40466.  

for unknown reason, on one of the PCs (Windows 10) the new version (300.00 as well as 300.02) does not work.
The time count in the BOINC manager is working, also the progress percentage bar. However, no CPU usage.
When I push "Show VM Console" in the BOINC manager, part of the text shown is:

Warning: "cvmfs_config probe sft.cern.ch' failed
ERROR: Could not source logging functions ...


The stderr of such a task which I then aborted after 1 day 3 hours is here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=251670948

Before, various VM tasks have been running properly for years.

Anyone any idea what the problem is?


Laurence wrote:
It suggests that the VM does not have network access.

which is really strange.
As mentioned before, up to v263.98 there was no problem at all. I now tried CMS - no problem either.
So the question is: what is "special" with Theory v300.00 and v300.02 ?
ID: 40475 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1443
Credit: 76,490,486
RAC: 100,214
Message 40476 - Posted: 16 Nov 2019, 8:46:06 UTC

May not directly related, but a few weeks ago Nils explained:
- The project runs more than 1 main servers under lhcathome.cern.ch
- not all of them serve tasks from all subprojects

Even if all servers would hand out tasks from all subprojects, an arbitrary server can only serve a task that is already in it's (limited) shared memory.


Recent DNS replies
dig lhcathome.cern.ch.
;; ANSWER SECTION:
lhcathome.cern.ch.      60      IN      A       137.138.156.174
lhcathome.cern.ch.      60      IN      A       188.184.84.156


and reverse DNS
dig -x 137.138.156.174
;; ANSWER SECTION:
174.156.138.137.in-addr.arpa. 10800 IN  PTR     boincai01.cern.ch.

dig -x 188.184.84.156
;; ANSWER SECTION:
156.84.184.188.in-addr.arpa. 10800 IN   PTR     boincai12.cern.ch.



So, guess a client requests Theory work from boincai01.cern.ch (137.138.156.174) and this server has only SixTrack and ATLAS available it makes no sense hammering requests on it for the next 60 s as every request would go to the very same IP.

Even after the DNS record times out (after 60 s) there's a 50% chance to get the same IP again:
- the IP list is randomly sorted
- the clients use the first list entry


The only chance to get Theory work from this server within the 60 s period would be that other clients get work from there and the server manages to refill it's shared memory with Theory tasks.


Due to the DNS TTL of 60 s, wouldn't it be a good idea to also extend the BOINC server's request delay to 60 s?
ID: 40476 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 110
Credit: 6,711,735
RAC: 1,061
Message 40477 - Posted: 16 Nov 2019, 12:44:59 UTC - in response to Message 40475.  
Last modified: 16 Nov 2019, 12:56:51 UTC

I have the same problem. First reported with the dev tasks here.
It occurs from both Windows and Linux hosts - all of them and every task fails. The Linux hosts have CVMFS installed. If run from the command line (the local) CVMFS works, so the system here isn't blocking access, but it seems not from the VM. Fails both using the local proxy and direct. VBox is v5.2.x. Haven't (yet) tried different VBox versions.
ID: 40477 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Theory Application : New version 300.00


©2020 CERN