Confused

Author	Message
keputnam Send message Joined: 27 Sep 04 Posts: 111 Credit: 8,602,669 RAC: 1	Message 43305 - Posted: 4 Sep 2020, 17:44:29 UTC Last modified: 4 Sep 2020, 17:49:23 UTC Can anyone explain this to me Run Time CPU Time Credit 120,886.49 140,595.50 590.26 107,254.38 36,754.30 2,149.71 Also I have another another complaint about the Scheduler. I had five WUs cancelled by the server because I would have returned them late Fair enough But it sent me another 15 with an 8 day return date ! There is no way in hell I'll get through them all on time What gives? ID: 43305 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2278 Credit: 178,775,457 RAC: 1,550	Message 43306 - Posted: 4 Sep 2020, 19:08:50 UTC - in response to Message 43305. Can anyone explain this to me Run Time CPU Time Credit 120,886.49 140,595.50 590.26 107,254.38 36,754.30 2,149.71 This is the good news: 2020-09-04 08:39:58 (11796): Guest Log: Looking for outputfile HITS.22420244._023297.pool.root.1 2020-09-04 08:39:59 (11796): Guest Log: HITS file was successfully produced You have started this Atlas many times again, so it will begin from the first collision up to 200. The Server abording is, because of a small window from only three days. If the Atlas-Task is not send back to the Server, Cern have the Agile Boincers to finish them in a short time. They need a result in less than a few days, not in a week the task starting with. You can tray a app_config to get only ONE Atlas a time to have time to finish it, but no interrrupt when the task is running. If this is not possible from your side, don't let ATLAS running, sorry. ID: 43306 · Reply Quote

Greger Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0	Message 43307 - Posted: 4 Sep 2020, 19:32:48 UTC Would like to point out "Cancelled by server" is common factor on many project and it would only affect task not started before valid task is done to complete workunits requirement of total valid. This is a way to reduce waste of computation that is not needed and your client would be happy to fetch other task. This happen every day to me for my host on other projects. This would happen if other host have short "Average turnaround time" while others host have long time. If you have set Network to be active to "always" and have decent bandwidth try to reduce it. To do that reduce in settings at computing -> "Store up to an additional x days of work. Would suggest it to max 1 day. Your host report around 3 days and many task would be cancelled as it not needed and waste bandwidth and storage on your host. If client handle several project you could do lower as others projects would work as backup project if servers is down or empty. ID: 43307 · Reply Quote

keputnam Send message Joined: 27 Sep 04 Posts: 111 Credit: 8,602,669 RAC: 1	Message 43308 - Posted: 4 Sep 2020, 19:57:05 UTC - in response to Message 43306. Already have an app_config with max_concurrent set to 1 We took three power hits over about 19 hours, so that would account for three restarts I think I did one for Windows maintenance, too As for the aborted tasks, and ridiculous number of new tasks sent I haven't changed relative resource share in over 6 months, nor added any new projects and the scheduler SHOULD be smart enough to not send me work I will never complete on time ID: 43308 · Reply Quote

keputnam Send message Joined: 27 Sep 04 Posts: 111 Credit: 8,602,669 RAC: 1	Message 43309 - Posted: 4 Sep 2020, 19:59:09 UTC - in response to Message 43307. Oh, i realize the purpose, but this has only ever happened on ATLAS as far as I can remember And this is the third go round on the circus for me ID: 43309 · Reply Quote

Yeti Volunteer moderator Send message Joined: 2 Sep 04 Posts: 468 Credit: 215,197,406 RAC: 1,322	Message 43665 - Posted: 21 Nov 2020, 22:59:09 UTC Just saw this thread. The scheduler was designed in a time, where only Single-Core-WUs exist and with this it works very fine. The scheduler has really problems to balance with Multi-Core-WUs; if you like to run these, it may be neccessary to help the scheduler. At LHC you have the possibility to tell "Give me only 1 Workunit". This is setup here: https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project Choose 'Max # jobs' and set it to one or two or whatever you would like Supporting BOINC, a great concept ! ID: 43665 · Reply Quote

Henry Nebrensky Send message Joined: 13 Jul 05 Posts: 170 Credit: 15,020,549 RAC: 0	Message 43787 - Posted: 4 Dec 2020, 23:27:31 UTC - in response to Message 43665. The scheduler has really problems to balance with Multi-Core-WUs; if you like to run these, it may be neccessary to help the scheduler. I don't have too many problems with mixing multi-core Atlas with Theory or SixTrack on 10617965. Keeping a steady stream of jobs on hand helps. Making changes such as to #Cores will upset the client - I just grit my teeth and wait a couple of days while it sorts itself out. I find that trying to micro-manage BOINC makes things worse, not better. :( ID: 43787 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 43788 - Posted: 5 Dec 2020, 0:11:43 UTC - in response to Message 43787. Making changes such as to #Cores will upset the client - I just grit my teeth and wait a couple of days while it sorts itself out. I find that trying to micro-manage BOINC makes things worse, not better. :( Right. But sometimes it fixes things. I could only get four native ATLAS at a time for my 12-core Ryzen 3600, until I changed Max # CPUs from 1 to 2. Now it has sent me four more, for a total of eight. You never know how it will react. ID: 43788 · Reply Quote

greg_be Send message Joined: 28 Dec 08 Posts: 346 Credit: 5,969,786 RAC: 15,959	Message 43799 - Posted: 6 Dec 2020, 18:19:14 UTC - in response to Message 43788. Last modified: 6 Dec 2020, 18:26:43 UTC Making changes such as to #Cores will upset the client - I just grit my teeth and wait a couple of days while it sorts itself out. I find that trying to micro-manage BOINC makes things worse, not better. :( Right. But sometimes it fixes things. I could only get four native ATLAS at a time for my 12-core Ryzen 3600, until I changed Max # CPUs from 1 to 2. Now it has sent me four more, for a total of eight. You never know how it will react. I have 16 core unit and I run a bunch of different projects. So in order to keep ATLAS from crashing I set in the web settings to use only 4 cores, but also in my app_config I have a restriction for 4 cores only. Memory is set for 6600MB and I have it load 8 jobs into the queue. This works just fine. ID: 43799 · Reply Quote

LHC@home