Message boards : Number crunching : Two instantaneous crashes
Message board moderation

To post messages, you must log in.

AuthorMessage
mo.v

Send message
Joined: 16 Oct 07
Posts: 8
Credit: 344
RAC: 0
Message 18485 - Posted: 31 Oct 2007, 22:31:54 UTC
Last modified: 31 Oct 2007, 22:32:26 UTC

Hi

I've just downloaded two SixTrack 4.67 tasks onto this single-core AMD with boinc 5.10.20. Both appear to have started simultaneously (I've never seen this phenomenon on the single-core before) and both crashed simultaneously with 1 sec of computing time. Has anyone any idea how two tasks can start together on a single-core? (Both still show in my account as in progress as this has only just occurred.)

When I looked at the tasks in my account I noticed a second problem. I reckon I must have crunched a total of about 20 tasks for LHC, all recently, yet only 4 are shown - the 2 that have just crashed and 2 previous successes. The credit of 176 is, I think, correct, but isn't of course consistent with only 2 completed tasks. Any ideas?

http://lhcathome.cern.ch/lhcathome/hosts_user.php?show_all=1&sort=rpc_time

ID: 18485 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 18487 - Posted: 31 Oct 2007, 22:58:26 UTC - in response to Message 18485.  

Hi

I've just downloaded two SixTrack 4.67 tasks onto this single-core AMD with boinc 5.10.20. Both appear to have started simultaneously (I've never seen this phenomenon on the single-core before) and both crashed simultaneously with 1 sec of computing time. Has anyone any idea how two tasks can start together on a single-core? (Both still show in my account as in progress as this has only just occurred.)

When I looked at the tasks in my account I noticed a second problem. I reckon I must have crunched a total of about 20 tasks for LHC, all recently, yet only 4 are shown - the 2 that have just crashed and 2 previous successes. The credit of 176 is, I think, correct, but isn't of course consistent with only 2 completed tasks. Any ideas?

http://lhcathome.cern.ch/lhcathome/hosts_user.php?show_all=1&sort=rpc_time


re fewer results shown than you reckon you've crunched:

Yes, I was surprised by that when I joined LHC at the weekend. What I reckon is that the process called 'purging' - removing interim results from the BOINC database once validated - is set even more aggressively here than on other BOINC projects. I'm used to Einstein results remaining visible for a week or more, and SETI results for at least a day: but my first LHC was history in well under 12 hours.

And the simultaneous start - are you sure that they didn't run consecutively, but for less than a second and within the same 'quantized' 1 second reporting interval?

PS Did you mean these results? hosts_user.php shows the reader's results, not the poster's.
ID: 18487 · Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 14 Jul 05
Posts: 275
Credit: 49,291
RAC: 0
Message 18488 - Posted: 31 Oct 2007, 23:19:57 UTC - in response to Message 18487.  

PS Did you mean these results? hosts_user.php shows the reader's results, not the poster's.

Unless it has the userid explicitly.
http://lhcathome.cern.ch/lhcathome/hosts_user.php?userid=86197
ID: 18488 · Report as offensive     Reply Quote
mo.v

Send message
Joined: 16 Oct 07
Posts: 8
Credit: 344
RAC: 0
Message 18489 - Posted: 1 Nov 2007, 0:26:56 UTC

Sorry about giving the generic link and not my own. Yes, that's the right computer.

On investigation, the 2 tasks do seem to have run consecutively and not together. Two models over in 3 seconds according to the messages. In the Tasks window both crashes appeared to happen simultaneously. You can see I'm accustomed to crunching in slow motion on CPDN; I've never previously seen anything happen like this in a flash of lightning.

Anyway, at least it means boinc didn't misbehave (in this respect!).

Thanks for the help.

ID: 18489 · Report as offensive     Reply Quote
mo.v

Send message
Joined: 16 Oct 07
Posts: 8
Credit: 344
RAC: 0
Message 18497 - Posted: 1 Nov 2007, 4:27:59 UTC

The LHC server seems to consider these two 1-second results a great success. Over - Success - Done. Exit code 0. I can find no crash or error messages or codes and the tasks didn't go through the Ready to report stage.

Maybe they were bewitched on Hallowe'en.
ID: 18497 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 3 Jan 07
Posts: 124
Credit: 7,065
RAC: 0
Message 18498 - Posted: 1 Nov 2007, 4:34:19 UTC - in response to Message 18497.  
Last modified: 1 Nov 2007, 4:41:15 UTC

The LHC server seems to consider these two 1-second results a great success. Over - Success - Done. Exit code 0. I can find no crash or error messages or codes and the tasks didn't go through the Ready to report stage.

Maybe they were bewitched on Hallowe'en.


These are just results where the simulation determined that the beam could not make it around the track. Sometimes that happens in less than a second, other times it may take a minute or two... I had 4 of these out of my allotted 10 so far for today...

Now, if only they could get them to validate instead of sitting as pending...
ID: 18498 · Report as offensive     Reply Quote
mo.v

Send message
Joined: 16 Oct 07
Posts: 8
Credit: 344
RAC: 0
Message 18499 - Posted: 1 Nov 2007, 4:44:56 UTC

Thanks for the info, Brian. I'm a newbie on LHC and had no idea what could cause this. I've looked at the results from other members who received the same workunit and exactly the same thing happened to them.

I now know that my computer didn't behave badly, which is a relief.

ID: 18499 · Report as offensive     Reply Quote
Jon Boy UK - Wales

Send message
Joined: 12 Sep 06
Posts: 13
Credit: 47,187
RAC: 0
Message 18503 - Posted: 1 Nov 2007, 12:44:12 UTC

Hello,

Quite remarkable i should say..

The only strange event i ever recall is that my 8 x cpu compaq server crunched 9 x Wu's successfully. :0)

The wu's were all from different projects.

My server somehow gained a phantom physical cpu rather than a phantom/ghost WU.

How i dont know ...

Kind regards,

John Gray
ID: 18503 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 7 Oct 06
Posts: 114
Credit: 23,192
RAC: 0
Message 18504 - Posted: 1 Nov 2007, 14:30:32 UTC


:) LoL! mo.v you should have kept a back up copy ;) like we do at CPDN.
Regards
Masud.
ID: 18504 · Report as offensive     Reply Quote
mo.v

Send message
Joined: 16 Oct 07
Posts: 8
Credit: 344
RAC: 0
Message 18506 - Posted: 1 Nov 2007, 14:54:42 UTC

Jon Boy, if you mean an extra record of your computer on your server web pages for the project, this extra record is generated when you restore a backup. To get rid of the superfluous record, you go to the detailed page for one of the computer records and at the bottom you'll see the Merge button. This merges the different records of the same computer into one. Merge is better than Delete.

When you've done that you need to update all your projects on all your computers to help the project servers work out again who you are.

Superluous records of the same computers were also generated with one of the new boinc versions. These can't always be merged.

Only records with identical descriptions will merge. I have 2 descriptions of this same old computer; I expect they were generated by different versions of boinc. Because the descriptions aren't identical it's a waste of time trying to merge them.
ID: 18506 · Report as offensive     Reply Quote
mo.v

Send message
Joined: 16 Oct 07
Posts: 8
Credit: 344
RAC: 0
Message 18507 - Posted: 1 Nov 2007, 14:56:31 UTC
Last modified: 1 Nov 2007, 14:58:09 UTC

KAMasud

I've never before seen anything so different from CPDN! Next time I'll know not to panic. My first 20 or so tasks from LHC ran perfectly.
ID: 18507 · Report as offensive     Reply Quote
Jon Boy UK - Wales

Send message
Joined: 12 Sep 06
Posts: 13
Credit: 47,187
RAC: 0
Message 18508 - Posted: 1 Nov 2007, 16:00:45 UTC

Hello,

No i don't mean that at all...(or anything remotely near to what your on about)-
Even though i understand what your on about.

My server has 8x physical cpus installed...
So my server can run/crunch/work on 8x wu's at a time...

( 1x cpu = 1x running instance )

But for some reason unknown to me it was working on 9 wu's..all at the same time.

Technically this is impossible - thats why its strange.

I could have 20 or 30 or 100 wu's in a list waiting
*** ( BUT it can only run 8 at atime ) *** as the server is ( 8x way...)

Kind regards,

John
ID: 18508 · Report as offensive     Reply Quote
mo.v

Send message
Joined: 16 Oct 07
Posts: 8
Credit: 344
RAC: 0
Message 18510 - Posted: 1 Nov 2007, 18:16:07 UTC
Last modified: 1 Nov 2007, 18:23:33 UTC

At first I thought both tasks were active together on the single core here. But when I looked in the BM messages at the exact second when each event took place, it turned out that the first task crashed in one second, then the second task got the CPU to itself for the following second.

Did you actually see 9 tasks listed as running in the Tasks tab, all at the same time? Did any of the tasks crash or terminate in a flash? If so, I wonder whether your BM messages might reveal something similar to what happened here. But if you saw 9 X Running, that's most bizarre.

(Maybe the manufacturer added a 9th core as a freebie!)
ID: 18510 · Report as offensive     Reply Quote
Jon Boy UK - Wales

Send message
Joined: 12 Sep 06
Posts: 13
Credit: 47,187
RAC: 0
Message 18511 - Posted: 1 Nov 2007, 18:59:03 UTC - in response to Message 18510.  

At first I thought both tasks were active together on the single core here. But when I looked in the BM messages at the exact second when each event took place, it turned out that the first task crashed in one second, then the second task got the CPU to itself for the following second.

Did you actually see 9 tasks listed as running in the Tasks tab, all at the same time? Did any of the tasks crash or terminate in a flash? If so, I wonder whether your BM messages might reveal something similar to what happened here. But if you saw 9 X Running, that's most bizarre.

(Maybe the manufacturer added a 9th core as a freebie!)



Hello,

Yes indeed 9 WU's were indeed running in the tasks tab all at the same time.
They all completed successfully also.Uploaded without problems and recieved the customary credit also.

The 9 wu's concerened were all from different projects also.( lhc,seti,rosetta bla bla bla)

The only reason i even noticed was that - i was waiting for them all to finish so i could to my annual maintenance on them and shut them down for a bit of an economy drive (electric) :0)

I have:
2 x 8 cpu Compaq data centre servers
7 x dual core Compaq workstations
1 x quad (2x physical & 2x virtual-hyper threading) core compaq workstation

34 CPU's alltogether..
ID: 18511 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 24 Nov 06
Posts: 76
Credit: 7,953,478
RAC: 1
Message 18513 - Posted: 1 Nov 2007, 19:40:04 UTC - in response to Message 18511.  

The 9 wu's concerened were all from different projects also.( lhc,seti,rosetta bla bla bla)

Was one of the projects DepSpid?
Dublin, California
Team: SETI.USA

ID: 18513 · Report as offensive     Reply Quote
Jon Boy UK - Wales

Send message
Joined: 12 Sep 06
Posts: 13
Credit: 47,187
RAC: 0
Message 18515 - Posted: 1 Nov 2007, 20:29:57 UTC - in response to Message 18513.  

The 9 wu's concerened were all from different projects also.( lhc,seti,rosetta bla bla bla)

Was one of the projects DepSpid?



Hello,

No sorry .I dont recall exactly what 9 they were but it was around the time i started crucnching/added superlink.

Kind Regards,

John
ID: 18515 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 24 Nov 06
Posts: 76
Credit: 7,953,478
RAC: 1
Message 18519 - Posted: 2 Nov 2007, 0:46:38 UTC - in response to Message 18515.  

The 9 wu's concerened were all from different projects also.( lhc,seti,rosetta bla bla bla)

Was one of the projects DepSpid?

No sorry .I dont recall exactly what 9 they were but it was around the time i started crucnching/added superlink.


No it wasn't DepSpid? Or No you don't remember?

I ask because DepSpid is a non-CPU Intensive application, which means you can run up to 10 DepSpid tasks at the same time as other projects run one per core. So with an 8-core machine, you can be running up to 18 total tasks at the same time (8 normal + 10 DepSpid).
Dublin, California
Team: SETI.USA

ID: 18519 · Report as offensive     Reply Quote

Message boards : Number crunching : Two instantaneous crashes


©2024 CERN