Message boards : Number crunching : Host messing up tons of results
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 27374 - Posted: 12 Apr 2015, 6:56:14 UTC - in response to Message 27373.  

If units failed due to timeout you put other users job in waste bin as they lost their already executed tasks due to your timeout I think. Admins, correct me if it's not like that.

No, users don't loose their Tasks. The WU will be sent out again to a different user, but the Project Needs it urgent back and they have to wait for this and wait and wait. Not good for the Project because it Needs the results in a short mannor


Supporting BOINC, a great concept !
ID: 27374 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,180,005
RAC: 0
Message 27375 - Posted: 12 Apr 2015, 6:56:16 UTC - in response to Message 27373.  
Last modified: 12 Apr 2015, 7:02:36 UTC

32 tasks is the limit for my host. 32 tasks enduring ~80s (like this) would terminate in 320s. A great number reduces the probability of getting only flash-tasks. Is there a method to know how much time will a task (before running) get?

Another reason is also because I've often seen there are not many available tasks. Although I do a little "bunker", I'm finishing work before deadline (except that time).


P.S. the other machine (ID: 10356455) errors is cause of win8 failure after update, so no chance to cancel them. ;)

[/OT]
ID: 27375 · Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 29 Nov 13
Posts: 58
Credit: 4,010,807
RAC: 28
Message 27376 - Posted: 12 Apr 2015, 10:12:03 UTC - in response to Message 27375.  

Setting a weeks worth of work is too long really, as you discovered when you got more 8hr WUs. I usually set a cache of 3-4 days.

The only way I know to see how long a WU is going to take is to look at the time 'remaining' in the tasks list. So no you can't 'cherry pick' the longer WUs.

And yea LHC often runs out of WUs, that's normal ;), which is why I have it running alongside other projects.
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP, CPDN, E@H.
Main rig - Ryzen 3600, MSI B450 Gm Pro C AC, 32GB DDR4 3200, RX580 8GB, Win10 64bit
2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win7 64bit
ID: 27376 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 27377 - Posted: 12 Apr 2015, 10:13:41 UTC - in response to Message 27375.  

32 tasks is the limit for my host. 32 tasks enduring ~80s (like this) would terminate in 320s. A great number reduces the probability of getting only flash-tasks. Is there a method to know how much time will a task (before running) get?

I am afraid not; the point of these studies is to find out whether the particles
will circulate for a long time or not.

Another reason is also because I've often seen there are not many available tasks. Although I do a little "bunker", I'm finishing work before deadline (except that time).


P.S. the other machine (ID: 10356455) errors is cause of win8 failure after update, so no chance to cancel them. ;)

[/OT]


Don't worry too much, it is just that as already discussed, cancelling WUs slows down the study because of rerun tasks going to the end of the queue.
Eric.
ID: 27377 · Report as offensive     Reply Quote
alvin
Avatar

Send message
Joined: 12 Mar 12
Posts: 128
Credit: 20,013,377
RAC: 0
Message 27384 - Posted: 15 Apr 2015, 19:43:23 UTC - in response to Message 27377.  
Last modified: 15 Apr 2015, 19:48:20 UTC

new host generates inconclusives, mostly 2-10 seconds
http://lhcathomeclassic.cern.ch/sixtrack/results.php?hostid=9934863

Grubix and this is yours
http://lhcathomeclassic.cern.ch/sixtrack/results.php?hostid=9874961&offset=0&show_names=0&state=3&appid=
lets see what could we do about it
ID: 27384 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,859,285
RAC: 1
Message 27385 - Posted: 16 Apr 2015, 8:21:57 UTC - in response to Message 27369.  

14 errors for me on this host apparently caused by CMS-dev. Sixtrack wu fails with LHC@home 1.0 | Aborting task.....exceeded disk limit: 6860.42MB > 572.20MB before it gets a chance to start.
CMS appears to run fine and I'm not seeing any debris left behind after it finishes.
ID: 27385 · Report as offensive     Reply Quote
Profile Grubix

Send message
Joined: 3 Jul 08
Posts: 20
Credit: 8,281,604
RAC: 0
Message 27386 - Posted: 16 Apr 2015, 9:28:46 UTC - in response to Message 27384.  

Hi Costa, thanks for the hint. I have also seen it last night and I was very surprised. Yesterday I could not do anything, because the computer is not at my home. This morning I have turned the computer off for a few minutes. The error has remained.

The invalid tasks have almost not a "pni" nor a "sse". The tasks with "pni" or "sse" are valid.

Eric, if the host is interesting for your diagnostics, feel free to give me appropriate instructions. But I have only Weekdays access to the host.

Bye, Grubix.
ID: 27386 · Report as offensive     Reply Quote
alvin
Avatar

Send message
Joined: 12 Mar 12
Posts: 128
Credit: 20,013,377
RAC: 0
Message 27387 - Posted: 16 Apr 2015, 9:38:36 UTC - in response to Message 27386.  

check that you have enough space
check there are no other cpu-consuming tasks
after Eric replied to you you might also reset project, reinstall BOINC, etc.
ID: 27387 · Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 29 Nov 13
Posts: 58
Credit: 4,010,807
RAC: 28
Message 27388 - Posted: 16 Apr 2015, 17:57:17 UTC - in response to Message 27387.  

Other CPU intensive tasks shouldn't cause any errors.
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP, CPDN, E@H.
Main rig - Ryzen 3600, MSI B450 Gm Pro C AC, 32GB DDR4 3200, RX580 8GB, Win10 64bit
2nd rig - i7 4930k @4.1 GHz, 16 GB DDR3 1866, HD 7870XT 3GB(DS), Win7 64bit
ID: 27388 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 27390 - Posted: 18 Apr 2015, 2:42:18 UTC - in response to Message 27385.  

Hi Ray; it seems more than a coincidence that at least one, maybe
two other clients, have reported the same error. Seems to be
cross talk with CMS or CMS dev............. Eric.
ID: 27390 · Report as offensive     Reply Quote
alvin
Avatar

Send message
Joined: 12 Mar 12
Posts: 128
Credit: 20,013,377
RAC: 0
Message 27391 - Posted: 18 Apr 2015, 3:38:40 UTC
Last modified: 18 Apr 2015, 3:38:55 UTC

half inconclusives, quarter invalids, only 10% valid
http://lhcathomeclassic.cern.ch/sixtrack/results.php?hostid=10137504
ID: 27391 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,859,285
RAC: 1
Message 27392 - Posted: 18 Apr 2015, 17:08:21 UTC - in response to Message 27390.  
Last modified: 18 Apr 2015, 17:27:55 UTC

I have noticed that when CMS finishes, it doesn't quite do a full cleanup and leaves a disk image .vdi in the slot. I guess that Sixtrack tries to use that slot, thinking it is empty but finds the .vdi which pushes it over the permitted size, causing the error.
I have reset LHC 1.0 just in case anything else may have become corrupted but I'll need to wait for the next batch of work to see if that fixes it. Until a fix can be found I'll only let CMS and sixtrack run when the other isn't and will delete the relevant slot to clear the debris. Where I did this on one machine this morning after a CMS wu finished, Sixtrack is running fine again.
For now, the two projects don't seem to play well together so I will keep them separated.
ID: 27392 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 27393 - Posted: 19 Apr 2015, 2:36:58 UTC - in response to Message 27392.  

Thanks Ray; I have forwarded your message to admins. Eric.
ID: 27393 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 27394 - Posted: 19 Apr 2015, 2:46:12 UTC - in response to Message 27368.  

See Message 27392 from Ray and my reply. Eric.
ID: 27394 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 27395 - Posted: 19 Apr 2015, 2:50:34 UTC - in response to Message 27369.  

Thanks Magic; somehow missed the significance of your message.
I had rather a busy time when it arrived. Ray confirms. Eric.
ID: 27395 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 27396 - Posted: 19 Apr 2015, 2:50:35 UTC - in response to Message 27369.  

Thanks Magic; somehow missed the significance of your message.
I had rather a busy time when it arrived. Ray confirms. Eric.
ID: 27396 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 27397 - Posted: 19 Apr 2015, 2:50:36 UTC - in response to Message 27369.  

Thanks Magic; somehow missed the significance of your message.
I had rather a busy time when it arrived. Ray confirms. Eric.
ID: 27397 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,503,310
RAC: 3,828
Message 27398 - Posted: 19 Apr 2015, 19:39:46 UTC - in response to Message 27397.  

Thanks Magic; somehow missed the significance of your message.
I had rather a busy time when it arrived. Ray confirms. Eric.


Any time Eric,

I did wonder if anyone would catch that and just now remembered to check back here.

As far as those .avi's hanging around in the VB that also happens often with Atlas tasks but since I am used to that I clean it up every day since I run Atlas/VB and vLHC/VB 24/7

I did run some CMS and the first 3 tasks worked but started to mess around with all my other tasks (except the vLHC for some reason)

Then with a new wrapper the next 2 CMS tasks failed when I started them with my other tasks running so I decided to stop testing CMS for now but it does appear that CMS will run if you aren't running other tasks like I do.

When I stopped running CMS I was still getting other Atlas tasks and even GPU's to crash and what I ended up doing to fix that was a complete remove of all files connected to CMS on this pc and after that it all went back to normal and in fact I added some LHC tasks (since this is an 8-core) and ran those at the same time and it was all back to normal.

(I guess you have to email me to get me to stay up to date sometimes )
Volunteer Mad Scientist For Life
ID: 27398 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 27399 - Posted: 20 Apr 2015, 2:54:40 UTC - in response to Message 27398.  

Thanks AGAIN, and especially for the additional info.
I was just too lazy/busy to mail you personally... :-)
Will do so in future. Eric.
ID: 27399 · Report as offensive     Reply Quote
William C Wilson
Avatar

Send message
Joined: 11 Sep 08
Posts: 25
Credit: 384,225
RAC: 0
Message 27412 - Posted: 27 Apr 2015, 10:51:53 UTC - in response to Message 27356.  

Thanks for a hint. Wiill be trying this in a few days as now overloaded with work. Strange, CPU Bios setting are the same. but will try. Machine now is all 64 bit including apps. Will try compat settings. Hope it Works and will get back to you with results.
William C Wilson
São Paulo Brazil
ID: 27412 · Report as offensive     Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : Host messing up tons of results


©2024 CERN