Message boards :
Theory Application :
New version 300.00
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
With this new version, the real Theory job goes via the BOINC server rather than being pulled into the VM with HTCondor. Jobs will last on average 2 hours rather than 12. |
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,470,586 RAC: 3,147 |
It seems 32-bit is no longer supported !? |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,918,691 RAC: 41,269 |
I get nothing but: Mo 11 Nov 2019 15:53:23 CET | LHC@home | No tasks are available for Theory Simulation Does the app_version from the preferences page point to the new app_version? <edit> Just got a task. </edit> |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
Will be added tomorrow. |
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,470,586 RAC: 3,147 |
I did not see this during testing on LHCathome-dev and here for the first time. The INFO-line with job-name normally shown after the job started in Console ALT-F1 was empty. Job name was: ===> [runRivet] Mon Nov 11 17:42:44 UTC 2019 [boinc pp jets 7000 80,-,1760 - pythia6 6.428 z2 100000 157] and this INFO was also missing in the result: https://lhcathome.cern.ch/lhcathome/result.php?resultid=251561245 |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,918,691 RAC: 41,269 |
Since this is a singlecore app I expect the credit points to be more realistic. This helps volunteers running hosts with lesser cores. ++1 |
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,470,586 RAC: 3,147 |
I saw you made vbox32 version available. The tasks are ready after 2 minutes and valid ?? https://lhcathome.cern.ch/lhcathome/results.php?hostid=10416365 Missing file or directory: |
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,470,586 RAC: 3,147 |
Error while computing: https://lhcathome.cern.ch/lhcathome/result.php?resultid=251612143 2019-11-12 11:16:20 (5988): Guest Log: 11:16:26 CET +01:00 2019-11-12: cranky: [INFO] ===> [runRivet] Tue Nov 12 10:16:24 UTC 2019 [boinc pp jets 8000 25 - pythia6 6.428 365 100000 174] 2019-11-12 12:57:43 (5988): Status Report: Job Duration: '691200.000000' 2019-11-12 12:57:43 (5988): Status Report: Elapsed Time: '6000.951002' 2019-11-12 12:57:43 (5988): Status Report: CPU Time: '6121.385639' 2019-11-12 13:36:00 (5988): Guest Log: 13:34:05 CET +01:00 2019-11-12: cranky: [ERROR] Container 'runc' terminated with status code 1. 2019-11-12 13:36:00 (5988): Guest Log: [ERROR] Job Failed |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
I saw you made vbox32 version available. Thanks for the alert. Will fix it tomorrow. |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
Thanks for the alert. Will fix it tomorrow. Should be fixed. Need to check. |
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,470,586 RAC: 3,147 |
Thanks for the alert. Will fix it tomorrow. Testing Theory vbox32: The good thing: There is a job running now Remarks: - Console ALT-F2 only shows: Running job output should appear here - no events shown, although I see with ALT-F3 that agile-runmc and rivetvm.exe are running - In BOINC Manager 'Show graphics' localhost:xxxxx (webapi address) not reachable, so no Theory pictures and no access to logs. - Console ALT-F1 missing some lines usual in 64bits version: Edit: Task finished. |
Send message Joined: 24 Oct 04 Posts: 1173 Credit: 54,842,560 RAC: 16,395 |
I just d/l the new 300.02 version 333.08MB vdi on my old faithful X86 and see it has that new faster estimation time so I will start that up and see how it goes and maybe the X64 on another one since this is still the 2nd 24hrs of the start of a new month on my satellite isp account and it only took about 6.5mins instead of the usual 10hrs. |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,527,999 RAC: 32,122 |
for unknown reason, on one of the PCs (Windows 10) the new version (300.00 as well as 300.02) does not work. The time count in the BOINC manager is working, also the progress percentage bar. However, no CPU usage. When I push "Show VM Console" in the BOINC manager, part of the text shown is: Warning: "cvmfs_config probe sft.cern.ch' failed ERROR: Could not source logging functions ... The stderr of such a task which I then aborted after 1 day 3 hours is here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=251670948 Before, various VM tasks have been running properly for years. Anyone any idea what the problem is? |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
for unknown reason, on one of the PCs (Windows 10) the new version (300.00 as well as 300.02) does not work. It suggests that the VM does not have network access. |
Send message Joined: 7 Feb 14 Posts: 99 Credit: 5,180,005 RAC: 0 |
Theory Simulation Unsent 1939 In progress 13821 My notebook gets no more than 1 task. Tried an app_config with max_project_concurrent = 4 and max_concurrent (for Theory) = 4. Tried ncpus > 1000. Tried work queue = 10 days. Always "No tasks are available for Theory Simulation". Tried 4 boinc clients. 3 clients got 1 task, 1 client got 0 task... |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
Theory Simulation I changed the server settings to reduce the number of tasks given out to avoid large work queues on hosts with permanent errors. I thought the setting was per ncpus but this may not be the case. I have updated the configuration, please try again. |
Send message Joined: 7 Feb 14 Posts: 99 Credit: 5,180,005 RAC: 0 |
All right, 4 clients got 8 tasks. Now I can go back to 1 client. |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,527,999 RAC: 32,122 |
for unknown reason, on one of the PCs (Windows 10) the new version (300.00 as well as 300.02) does not work. which is really strange. As mentioned before, up to v263.98 there was no problem at all. I now tried CMS - no problem either. So the question is: what is "special" with Theory v300.00 and v300.02 ? |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,918,691 RAC: 41,269 |
May not directly related, but a few weeks ago Nils explained: - The project runs more than 1 main servers under lhcathome.cern.ch - not all of them serve tasks from all subprojects Even if all servers would hand out tasks from all subprojects, an arbitrary server can only serve a task that is already in it's (limited) shared memory. Recent DNS replies dig lhcathome.cern.ch. ;; ANSWER SECTION: lhcathome.cern.ch. 60 IN A 137.138.156.174 lhcathome.cern.ch. 60 IN A 188.184.84.156 and reverse DNS dig -x 137.138.156.174 ;; ANSWER SECTION: 174.156.138.137.in-addr.arpa. 10800 IN PTR boincai01.cern.ch. dig -x 188.184.84.156 ;; ANSWER SECTION: 156.84.184.188.in-addr.arpa. 10800 IN PTR boincai12.cern.ch. So, guess a client requests Theory work from boincai01.cern.ch (137.138.156.174) and this server has only SixTrack and ATLAS available it makes no sense hammering requests on it for the next 60 s as every request would go to the very same IP. Even after the DNS record times out (after 60 s) there's a 50% chance to get the same IP again: - the IP list is randomly sorted - the clients use the first list entry The only chance to get Theory work from this server within the 60 s period would be that other clients get work from there and the server manages to refill it's shared memory with Theory tasks. Due to the DNS TTL of 60 s, wouldn't it be a good idea to also extend the BOINC server's request delay to 60 s? |
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,569,612 RAC: 946 |
I have the same problem. First reported with the dev tasks here. It occurs from both Windows and Linux hosts - all of them and every task fails. The Linux hosts have CVMFS installed. If run from the command line (the local) CVMFS works, so the system here isn't blocking access, but it seems not from the VM. Fails both using the local proxy and direct. VBox is v5.2.x. Haven't (yet) tried different VBox versions. |
©2024 CERN