Two instantaneous crashes

Author	Message
mo.v Send message Joined: 16 Oct 07 Posts: 8 Credit: 344 RAC: 0	Message 18485 - Posted: 31 Oct 2007, 22:31:54 UTC Last modified: 31 Oct 2007, 22:32:26 UTC Hi I've just downloaded two SixTrack 4.67 tasks onto this single-core AMD with boinc 5.10.20. Both appear to have started simultaneously (I've never seen this phenomenon on the single-core before) and both crashed simultaneously with 1 sec of computing time. Has anyone any idea how two tasks can start together on a single-core? (Both still show in my account as in progress as this has only just occurred.) When I looked at the tasks in my account I noticed a second problem. I reckon I must have crunched a total of about 20 tasks for LHC, all recently, yet only 4 are shown - the 2 that have just crashed and 2 previous successes. The credit of 176 is, I think, correct, but isn't of course consistent with only 2 completed tasks. Any ideas? http://lhcathome.cern.ch/lhcathome/hosts_user.php?show_all=1&sort=rpc_time ID: 18485 · Reply Quote

Richard Haselgrove Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0	Message 18487 - Posted: 31 Oct 2007, 22:58:26 UTC - in response to Message 18485. Hi I've just downloaded two SixTrack 4.67 tasks onto this single-core AMD with boinc 5.10.20. Both appear to have started simultaneously (I've never seen this phenomenon on the single-core before) and both crashed simultaneously with 1 sec of computing time. Has anyone any idea how two tasks can start together on a single-core? (Both still show in my account as in progress as this has only just occurred.) When I looked at the tasks in my account I noticed a second problem. I reckon I must have crunched a total of about 20 tasks for LHC, all recently, yet only 4 are shown - the 2 that have just crashed and 2 previous successes. The credit of 176 is, I think, correct, but isn't of course consistent with only 2 completed tasks. Any ideas? http://lhcathome.cern.ch/lhcathome/hosts_user.php?show_all=1&sort=rpc_time re fewer results shown than you reckon you've crunched: Yes, I was surprised by that when I joined LHC at the weekend. What I reckon is that the process called 'purging' - removing interim results from the BOINC database once validated - is set even more aggressively here than on other BOINC projects. I'm used to Einstein results remaining visible for a week or more, and SETI results for at least a day: but my first LHC was history in well under 12 hours. And the simultaneous start - are you sure that they didn't run consecutively, but for less than a second and within the same 'quantized' 1 second reporting interval? PS Did you mean these results? hosts_user.php shows the reader's results, not the poster's. ID: 18487 · Reply Quote

PovAddict Send message Joined: 14 Jul 05 Posts: 275 Credit: 49,291 RAC: 0	Message 18488 - Posted: 31 Oct 2007, 23:19:57 UTC - in response to Message 18487. PS Did you mean these results? hosts_user.php shows the reader's results, not the poster's. Unless it has the userid explicitly. http://lhcathome.cern.ch/lhcathome/hosts_user.php?userid=86197 ID: 18488 · Reply Quote

mo.v Send message Joined: 16 Oct 07 Posts: 8 Credit: 344 RAC: 0	Message 18489 - Posted: 1 Nov 2007, 0:26:56 UTC Sorry about giving the generic link and not my own. Yes, that's the right computer. On investigation, the 2 tasks do seem to have run consecutively and not together. Two models over in 3 seconds according to the messages. In the Tasks window both crashes appeared to happen simultaneously. You can see I'm accustomed to crunching in slow motion on CPDN; I've never previously seen anything happen like this in a flash of lightning. Anyway, at least it means boinc didn't misbehave (in this respect!). Thanks for the help. ID: 18489 · Reply Quote

mo.v Send message Joined: 16 Oct 07 Posts: 8 Credit: 344 RAC: 0	Message 18497 - Posted: 1 Nov 2007, 4:27:59 UTC The LHC server seems to consider these two 1-second results a great success. Over - Success - Done. Exit code 0. I can find no crash or error messages or codes and the tasks didn't go through the Ready to report stage. Maybe they were bewitched on Hallowe'en. ID: 18497 · Reply Quote

Brian Silvers Send message Joined: 3 Jan 07 Posts: 124 Credit: 7,065 RAC: 0	Message 18498 - Posted: 1 Nov 2007, 4:34:19 UTC - in response to Message 18497. Last modified: 1 Nov 2007, 4:41:15 UTC The LHC server seems to consider these two 1-second results a great success. Over - Success - Done. Exit code 0. I can find no crash or error messages or codes and the tasks didn't go through the Ready to report stage. Maybe they were bewitched on Hallowe'en. These are just results where the simulation determined that the beam could not make it around the track. Sometimes that happens in less than a second, other times it may take a minute or two... I had 4 of these out of my allotted 10 so far for today... Now, if only they could get them to validate instead of sitting as pending... ID: 18498 · Reply Quote

mo.v Send message Joined: 16 Oct 07 Posts: 8 Credit: 344 RAC: 0	Message 18499 - Posted: 1 Nov 2007, 4:44:56 UTC Thanks for the info, Brian. I'm a newbie on LHC and had no idea what could cause this. I've looked at the results from other members who received the same workunit and exactly the same thing happened to them. I now know that my computer didn't behave badly, which is a relief. ID: 18499 · Reply Quote

Jon Boy UK - Wales Send message Joined: 12 Sep 06 Posts: 13 Credit: 47,187 RAC: 0	Message 18503 - Posted: 1 Nov 2007, 12:44:12 UTC Hello, Quite remarkable i should say.. The only strange event i ever recall is that my 8 x cpu compaq server crunched 9 x Wu's successfully. :0) The wu's were all from different projects. My server somehow gained a phantom physical cpu rather than a phantom/ghost WU. How i dont know ... Kind regards, John Gray ID: 18503 · Reply Quote

KAMasud Send message Joined: 7 Oct 06 Posts: 114 Credit: 23,192 RAC: 0	Message 18504 - Posted: 1 Nov 2007, 14:30:32 UTC :) LoL! mo.v you should have kept a back up copy ;) like we do at CPDN. Regards Masud. ID: 18504 · Reply Quote

mo.v Send message Joined: 16 Oct 07 Posts: 8 Credit: 344 RAC: 0	Message 18506 - Posted: 1 Nov 2007, 14:54:42 UTC Jon Boy, if you mean an extra record of your computer on your server web pages for the project, this extra record is generated when you restore a backup. To get rid of the superfluous record, you go to the detailed page for one of the computer records and at the bottom you'll see the Merge button. This merges the different records of the same computer into one. Merge is better than Delete. When you've done that you need to update all your projects on all your computers to help the project servers work out again who you are. Superluous records of the same computers were also generated with one of the new boinc versions. These can't always be merged. Only records with identical descriptions will merge. I have 2 descriptions of this same old computer; I expect they were generated by different versions of boinc. Because the descriptions aren't identical it's a waste of time trying to merge them. ID: 18506 · Reply Quote

mo.v Send message Joined: 16 Oct 07 Posts: 8 Credit: 344 RAC: 0	Message 18507 - Posted: 1 Nov 2007, 14:56:31 UTC Last modified: 1 Nov 2007, 14:58:09 UTC KAMasud I've never before seen anything so different from CPDN! Next time I'll know not to panic. My first 20 or so tasks from LHC ran perfectly. ID: 18507 · Reply Quote

Jon Boy UK - Wales Send message Joined: 12 Sep 06 Posts: 13 Credit: 47,187 RAC: 0	Message 18508 - Posted: 1 Nov 2007, 16:00:45 UTC Hello, No i don't mean that at all...(or anything remotely near to what your on about)- Even though i understand what your on about. My server has 8x physical cpus installed... So my server can run/crunch/work on 8x wu's at a time... ( 1x cpu = 1x running instance ) But for some reason unknown to me it was working on 9 wu's..all at the same time. Technically this is impossible - thats why its strange. I could have 20 or 30 or 100 wu's in a list waiting * ( BUT it can only run 8 at atime ) * as the server is ( 8x way...) Kind regards, John ID: 18508 · Reply Quote

mo.v Send message Joined: 16 Oct 07 Posts: 8 Credit: 344 RAC: 0	Message 18510 - Posted: 1 Nov 2007, 18:16:07 UTC Last modified: 1 Nov 2007, 18:23:33 UTC At first I thought both tasks were active together on the single core here. But when I looked in the BM messages at the exact second when each event took place, it turned out that the first task crashed in one second, then the second task got the CPU to itself for the following second. Did you actually see 9 tasks listed as running in the Tasks tab, all at the same time? Did any of the tasks crash or terminate in a flash? If so, I wonder whether your BM messages might reveal something similar to what happened here. But if you saw 9 X Running, that's most bizarre. (Maybe the manufacturer added a 9th core as a freebie!) ID: 18510 · Reply Quote

Jon Boy UK - Wales Send message Joined: 12 Sep 06 Posts: 13 Credit: 47,187 RAC: 0	Message 18511 - Posted: 1 Nov 2007, 18:59:03 UTC - in response to Message 18510. At first I thought both tasks were active together on the single core here. But when I looked in the BM messages at the exact second when each event took place, it turned out that the first task crashed in one second, then the second task got the CPU to itself for the following second. Did you actually see 9 tasks listed as running in the Tasks tab, all at the same time? Did any of the tasks crash or terminate in a flash? If so, I wonder whether your BM messages might reveal something similar to what happened here. But if you saw 9 X Running, that's most bizarre. (Maybe the manufacturer added a 9th core as a freebie!) Hello, Yes indeed 9 WU's were indeed running in the tasks tab all at the same time. They all completed successfully also.Uploaded without problems and recieved the customary credit also. The 9 wu's concerened were all from different projects also.( lhc,seti,rosetta bla bla bla) The only reason i even noticed was that - i was waiting for them all to finish so i could to my annual maintenance on them and shut them down for a bit of an economy drive (electric) :0) I have: 2 x 8 cpu Compaq data centre servers 7 x dual core Compaq workstations 1 x quad (2x physical & 2x virtual-hyper threading) core compaq workstation 34 CPU's alltogether.. ID: 18511 · Reply Quote

zombie67 [MM] Send message Joined: 24 Nov 06 Posts: 76 Credit: 10,211,769 RAC: 30	Message 18513 - Posted: 1 Nov 2007, 19:40:04 UTC - in response to Message 18511. The 9 wu's concerened were all from different projects also.( lhc,seti,rosetta bla bla bla) Was one of the projects DepSpid? Dublin, California Team: SETI.USA ID: 18513 · Reply Quote

Jon Boy UK - Wales Send message Joined: 12 Sep 06 Posts: 13 Credit: 47,187 RAC: 0	Message 18515 - Posted: 1 Nov 2007, 20:29:57 UTC - in response to Message 18513. The 9 wu's concerened were all from different projects also.( lhc,seti,rosetta bla bla bla) Was one of the projects DepSpid? Hello, No sorry .I dont recall exactly what 9 they were but it was around the time i started crucnching/added superlink. Kind Regards, John ID: 18515 · Reply Quote

zombie67 [MM] Send message Joined: 24 Nov 06 Posts: 76 Credit: 10,211,769 RAC: 30	Message 18519 - Posted: 2 Nov 2007, 0:46:38 UTC - in response to Message 18515. The 9 wu's concerened were all from different projects also.( lhc,seti,rosetta bla bla bla) Was one of the projects DepSpid? No sorry .I dont recall exactly what 9 they were but it was around the time i started crucnching/added superlink. No it wasn't DepSpid? Or No you don't remember? I ask because DepSpid is a non-CPU Intensive application, which means you can run up to 10 DepSpid tasks at the same time as other projects run one per core. So with an 8-core machine, you can be running up to 18 total tasks at the same time (8 normal + 10 DepSpid). Dublin, California Team: SETI.USA ID: 18519 · Reply Quote

LHC@home