New version 263.95

Author	Message
Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 39276 - Posted: 4 Jul 2019, 15:35:04 UTC - in response to Message 39275. As mentioned a couple of times this is caused by the fact that ATLAS doesn't correctly respect the #cores parameter (as it was originally introduced). This is an ongoing problem that can be compensated for but it's confusing for a lot of volunteers. It really needs to be fixed. It is because their accountants insist that they be counted wrong. I am not making this up. Don't ask again, or they might make it worse. ID: 39276 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1115 Credit: 49,710,908 RAC: 14,193	Message 39282 - Posted: 4 Jul 2019, 20:03:58 UTC 14 Valids and just one of these so far https://lhcathome.cern.ch/lhcathome/result.php?resultid=236599310 ID: 39282 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1115 Credit: 49,710,908 RAC: 14,193	Message 39285 - Posted: 5 Jul 2019, 9:59:42 UTC - in response to Message 39282. Well I got 23 Valids and many more running BUT I got 3 of these https://lhcathome.cern.ch/lhcathome/result.php?resultid=236609765 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED And just one of these for some reason https://lhcathome.cern.ch/lhcathome/result.php?resultid=236599310 194 (0x000000C2) EXIT_ABORTED_BY_CLIENT And then two of those typical [ERROR] Condor ended after 1032 seconds. but the next 2 on that host are running now. 3am ID: 39285 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1115 Credit: 49,710,908 RAC: 14,193	Message 39291 - Posted: 6 Jul 2019, 5:40:11 UTC Last modified: 6 Jul 2019, 6:31:42 UTC https://lhcathome.cern.ch/lhcathome/result.php?resultid=236608596 12 of these now and I have seen hundreds for other members with these. (Just took a quick look and the single core and 2-core tasks are working but not the 4-core tasks for me and it isn't because or Ram (but I saw quite a few on other members that failed with single core but didn't look at what VB version they used and now we only have a few running this VB Theory version and of course for some reason most of them have the pc's hidden so we can't see if they have the same problems or other types) I can think of another name for this error right now ID: 39291 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1272 Credit: 8,479,164 RAC: 2,361	Message 39293 - Posted: 6 Jul 2019, 7:45:04 UTC - in response to Message 39291. 12 of these now and I have seen hundreds for other members with these. The best solution is to tenfold the value of <rsc_fpops_bound>2000000000000000.000000</rsc_fpops_bound>, what should be done server-wise by the admins. Your temporary solution is to decrease the # of cores or could be reducing the value of <job_duration>64800</job_duration> in the file Theory_2017_05_29.xml (project directory) to 43200 or even lower if needed. ID: 39293 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,352,862 RAC: 123,063	Message 39294 - Posted: 6 Jul 2019, 7:57:49 UTC - in response to Message 39293. ... reducing the value of <job_duration>64800</job_duration> in the file Theory_2017_05_29.xml (project directory) to 43200 or even lower if needed. Not good as it will kill the running job when it hits the limit. Reducing the #cores at the web preferences page would be better until the admins will have raised the rsc_fpops_bound. ID: 39294 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1272 Credit: 8,479,164 RAC: 2,361	Message 39295 - Posted: 6 Jul 2019, 8:07:08 UTC - in response to Message 39294. Not good as it will kill the running job when it hits the limit. That's true, but it's doing now too, but a bit later causing an error task for the user without credits, although he/she has done ~13 hours of useful work. ID: 39295 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1115 Credit: 49,710,908 RAC: 14,193	Message 39310 - Posted: 7 Jul 2019, 9:21:57 UTC - in response to Message 39308. Last modified: 7 Jul 2019, 9:23:52 UTC Edit: link on URL added for you computezmle and unhidden host and for MAGIC Quantum Mechanic as you unlike hidden host for theory. Thank you PurpleHat It helps when us members who run hundreds of these tasks to let us take a look at other members hosts to see if we are all having the same errors and many times compare similar computers when we have any problems here or where we test them before they get here. It doesn't help us with these projects to have hidden computers and any time I have any problem at all I prefer to look and see if it is just me or if the same problem happens for many others running the same tasks on the same OS's and to see how many cores per tasks also for the same reasons. ID: 39310 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,523,451 RAC: 15,579	Message 39323 - Posted: 9 Jul 2019, 18:03:40 UTC I just started to do Theory tasks with the new app version on one of my hosts after one week break. All tasks are failing after about 13.5 hours with error 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED. Why these tasks are not allowed to continue to the 18 hour mark like they used to on previous app version? Here's one https://lhcathome.cern.ch/lhcathome/result.php?resultid=236942503 ID: 39323 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1687 Credit: 102,936,740 RAC: 125,148	Message 39324 - Posted: 9 Jul 2019, 19:41:16 UTC Harri, what is the #CPUs in your Web Settings? ID: 39324 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1272 Credit: 8,479,164 RAC: 2,361	Message 39325 - Posted: 9 Jul 2019, 19:43:17 UTC - in response to Message 39324. Harri, what is the #CPUs in your Web Settings? A part of the answer: Setting Memory Size for VM. (1730MB) 11 cores! ID: 39325 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,523,451 RAC: 15,579	Message 39329 - Posted: 9 Jul 2019, 20:48:52 UTC - in response to Message 39325. Harri, what is the #CPUs in your Web Settings? A part of the answer: Setting Memory Size for VM. (1730MB) 11 cores! The number of CPUs in web settings is 4 but I run all tasks just with 1 CPU, I use 4 CPUs when I run Atlas tasks. These settings come from app_config.xml. The computer has 64GB memory so that is not a problem (20 GB now in use when 10 Theory tasks are running concurrently, also 1 CPDN task and 2 Einstein GPU tasks and 2 Seti GPU tasks are running). So should I change the number of CPUs on web settings to 1? ID: 39329 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,523,451 RAC: 15,579	Message 39332 - Posted: 10 Jul 2019, 9:12:29 UTC During the night a few tasks were valid. The runtime for them was < 51309 seconds that seems to be the limit for this 197 error (2000000.00G/38.72G). I don't know where the 38.72G comes from. Problem with these failed tasks is that they are not cleared from VM Manager but they stay there until manually removed (or maybe Boinc restart might clear them, I haven't tried). This has also created a problem with some tasks that did run for 43 seconds and are now postponed. I will just abort them and set the web preferences to 1 CPU before down loading any new Theory tasks. It is win 10 patch tuesday, so I'll just wait for the updates and then restart my computer. ID: 39332 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1272 Credit: 8,479,164 RAC: 2,361	Message 39345 - Posted: 13 Jul 2019, 5:27:25 UTC Even VM's with a single core setting can suffer from too low defined <rsc_fpops_bound>. Admins please increase that value (tenfold). https://lhcathome.cern.ch/lhcathome/result.php?resultid=237057862 LHC@home 13 Jul 03:07:03 Aborting task Theory_3495168_1562869652.901284_0: exceeded elapsed time limit 103033.71 (2000000.00G/19.34G) This time it was a long running Sherpa needing more time: ===> [runRivet] Fri Jul 12 07:17:07 CEST 2019 [boinc pp ue 200 4 - sherpa 2.2.4 default 7000 78] ID: 39345 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2090 Credit: 158,733,549 RAC: 128,321	Message 39346 - Posted: 13 Jul 2019, 5:48:21 UTC Crystal there is a new Version 263.96 active since yesterday? Do you know the changes? ID: 39346 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1115 Credit: 49,710,908 RAC: 14,193	Message 39347 - Posted: 13 Jul 2019, 6:50:59 UTC - in response to Message 39346. Crystal there is a new Version 263.96 active since yesterday? Do you know the changes? I was about to try several Theory tasks again since my new month of high-speed satellite starts in 15 minutes but since it is a new vdi I will have to wait and see if anyone else has these working on the Win 10 OS before I do that so I don't end up having to do it again if they don't work. Instead I am going to run several CMS-dev and see if they decide to work just by having them start faster since they were not working last week on the 10 OS (and run sixtracks here for now since they don't depend on internet speed or even connection) ID: 39347 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,352,862 RAC: 123,063	Message 39348 - Posted: 13 Jul 2019, 7:21:05 UTC The server still sends out v263.95. I guess it has not been restarted. A check of the manually downloaded vdi shows that the CVMFS typo has been corrected that I mentioned here: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=479&postid=6435 It might solve the X509 error when it is available. ID: 39348 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1687 Credit: 102,936,740 RAC: 125,148	Message 39363 - Posted: 16 Jul 2019, 5:15:49 UTC - in response to Message 39348. The server still sends out v263.95. I guess it has not been restarted. ... It might solve the X509 error when it is available. In fact, I havn't had this error with v263.95 (only with v263.90) ID: 39363 · Reply Quote

LHC@home