Message boards : Number crunching : All tasks end with error
Message board moderation

To post messages, you must log in.

AuthorMessage
Mattia Verga

Send message
Joined: 27 Sep 04
Posts: 20
Credit: 23,880
RAC: 0
Message 25783 - Posted: 6 Sep 2013, 17:15:59 UTC

All tasks with the new SixTrack 446.03 are erroring out on my PC, I checked and I saw they're crashing also on some other users, but not for all:
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=9028119
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=9032177
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=9112361

The error is "-226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS"

Already tried to reset LHC project in Boinc... what's going on? (other Bpoinc projects are working fine on my machine)
ID: 25783 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 25784 - Posted: 6 Sep 2013, 17:54:00 UTC - in response to Message 25783.  

Thanks for the feedback. I am at home now so it would
be good if you could just tell me whi Linux you are running.
I found your AMD description. Thanks. Eric.
ID: 25784 · Report as offensive     Reply Quote
z0ny

Send message
Joined: 17 Feb 06
Posts: 2
Credit: 1,686,368
RAC: 0
Message 25785 - Posted: 6 Sep 2013, 18:14:18 UTC
Last modified: 6 Sep 2013, 18:32:36 UTC

same problem... currently 1009 tasks 766 with errors, i tried reset project, but nothing changed.
debian jessie: Linux server 3.9-1-amd64 #1 SMP Debian 3.9.8-1 x86_64 GNU/Linux

debian wheeze (older boinc) seems to be ok
ID: 25785 · Report as offensive     Reply Quote
SeersantLoom

Send message
Joined: 4 Jan 07
Posts: 3
Credit: 2,197,570
RAC: 0
Message 25787 - Posted: 6 Sep 2013, 20:27:02 UTC

I've noticed this for a couple of days. Task error logs (stderr.txt) are filling with "... No heartbeat from client for 30 sec - exiting" and end with "Compute error" (-226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS). Resetting project had no effect.

Unpacked one of the failing tasks .zip file into separate directory and ran sixtrack there - .fort files are being updated and growing, it hasn't crashed yet.


Linux: Linux 3.9.6-gentoo-Intel-Core-i7 #1 SMP Thu Jun 20 21:48:50 EEST 2013 x86_64 Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz GenuineIntel GNU/Linux

BOINC: 7.2.0 (x64)
SixTrack: 446.03 (pni)
sixtrack_lin64_4463_pni.exe (in projects dir)
'file' reports it as "sixtrack_lin64_4463_pni.exe: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.6.9, not stripped", guess it isn't really a 64-bit app?

Had a look at my tasks:
Inclonclusive (1):
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=9018280
Error (306):
first one: http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=8972676
Also, some tasks seem to be OK:
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=9086803

All tasks in "Error" list belong to SixTrack v446.03 (pni)
ID: 25787 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 7,704,381
RAC: 253
Message 25788 - Posted: 6 Sep 2013, 20:53:57 UTC - in response to Message 25787.  
Last modified: 6 Sep 2013, 20:55:25 UTC

Yep. That is the same problem as discussed in this thread:
http://lhcathomeclassic.cern.ch/sixtrack/forum_thread.php?id=3767

The errors and error messages described here are the same
ID: 25788 · Report as offensive     Reply Quote
SeersantLoom

Send message
Joined: 4 Jan 07
Posts: 3
Credit: 2,197,570
RAC: 0
Message 25795 - Posted: 7 Sep 2013, 9:51:57 UTC

Update: in standalone mode, sixtrack completed in ~6h as expected.

Communication problem(s) between sixtrack and boinc_client, something wrong in init_data.xml?
ID: 25795 · Report as offensive     Reply Quote
Mattia Verga

Send message
Joined: 27 Sep 04
Posts: 20
Credit: 23,880
RAC: 0
Message 25799 - Posted: 7 Sep 2013, 13:22:55 UTC - in response to Message 25784.  

Thanks for the feedback. I am at home now so it would
be good if you could just tell me whi Linux you are running.
I found your AMD description. Thanks. Eric.


I'm running Fedora 19 64bit. Here are starting lines in Boinc output:

sab 07 set 2013 15:08:15 CEST |  | No config file found - using defaults
sab 07 set 2013 15:08:15 CEST |  | Starting BOINC client version 7.0.65 for x86_64-pc-linux-gnu
sab 07 set 2013 15:08:15 CEST |  | log flags: file_xfer, sched_ops, task
sab 07 set 2013 15:08:15 CEST |  | Libraries: libcurl/7.29.0 NSS/3.15.1 zlib/1.2.7 libidn/1.26 libssh2/1.4.3
sab 07 set 2013 15:08:15 CEST |  | Data directory: /home/marvin/Boinc
sab 07 set 2013 15:08:15 CEST |  | Processor: 4 AuthenticAMD AMD Phenom(tm) II X4 965 Processor [Family 16 Model 4 Stepping 3]
sab 07 set 2013 15:08:15 CEST |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save
sab 07 set 2013 15:08:15 CEST |  | OS: Linux: 3.10.10-200.fc19.x86_64
sab 07 set 2013 15:08:15 CEST |  | Memory: 7.80 GB physical, 7.78 GB virtual
sab 07 set 2013 15:08:15 CEST |  | Disk: 86.89 GB total, 80.09 GB free
sab 07 set 2013 15:08:15 CEST |  | Local time is UTC +2 hours
sab 07 set 2013 15:08:15 CEST |  | No usable GPUs found
sab 07 set 2013 15:08:15 CEST | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 6981088; resource share 100
sab 07 set 2013 15:08:15 CEST | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 7130990; resource share 100
sab 07 set 2013 15:08:15 CEST | SETI@home Beta Test | URL http://setiweb.ssl.berkeley.edu/beta/; Computer ID 63147; resource share 100
sab 07 set 2013 15:08:15 CEST | Asteroids@home | URL http://asteroidsathome.net/boinc/; Computer ID 23986; resource share 100
sab 07 set 2013 15:08:15 CEST | LHC@home 1.0 | URL http://lhcathomeclassic.cern.ch/sixtrack/; Computer ID 10286524; resource share 100
sab 07 set 2013 15:08:15 CEST | malariacontrol.net | URL http://www.malariacontrol.net/; Computer ID 628816; resource share 100
sab 07 set 2013 15:08:15 CEST | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 515018; resource share 100
sab 07 set 2013 15:08:15 CEST | SimOne@home | URL http://mmgboinc.unimi.it/; Computer ID 3627; resource share 100
sab 07 set 2013 15:08:15 CEST | rosetta@home | URL http://boinc.bakerlab.org/rosetta/; Computer ID 1612749; resource share 100
sab 07 set 2013 15:08:15 CEST | SETI@home | General prefs: from SETI@home (last modified 19-Mar-2013 23:43:11)
sab 07 set 2013 15:08:15 CEST | SETI@home | Computer location: home
sab 07 set 2013 15:08:15 CEST | SETI@home | General prefs: no separate prefs for home; using your defaults
sab 07 set 2013 15:08:15 CEST |  | Reading preferences override file
sab 07 set 2013 15:08:15 CEST |  | Preferences:
sab 07 set 2013 15:08:15 CEST |  | max memory usage when active: 5589.97MB
sab 07 set 2013 15:08:15 CEST |  | max memory usage when idle: 7586.39MB
sab 07 set 2013 15:08:15 CEST |  | max disk usage: 5.00GB
sab 07 set 2013 15:08:15 CEST |  | max CPUs used: 2
sab 07 set 2013 15:08:15 CEST |  | suspend work if non-BOINC CPU load exceeds 50 %
sab 07 set 2013 15:08:15 CEST |  | (to change preferences, visit a project web site or select Preferences in the Manager)
sab 07 set 2013 15:08:15 CEST |  | Not using a proxy

ID: 25799 · Report as offensive     Reply Quote
Mattia Verga

Send message
Joined: 27 Sep 04
Posts: 20
Credit: 23,880
RAC: 0
Message 25808 - Posted: 8 Sep 2013, 8:47:34 UTC - in response to Message 25787.  

'file' reports it as "sixtrack_lin64_4463_pni.exe: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.6.9, not stripped", guess it isn't really a 64-bit app?


I think that's a good point... the executable seems to be a 32bit one, not 64bit, and the 'statically linked' stuff makes me wonder if can be a library problem... maybe for those who have also 32bit libraries installed SixTrack works, for whose who have only 64bit libraries SixTrack crashes.

At least there's something wrong in the compiler, it outputs 32bit executables for 64bit systems.
ID: 25808 · Report as offensive     Reply Quote
z0ny

Send message
Joined: 17 Feb 06
Posts: 2
Credit: 1,686,368
RAC: 0
Message 25818 - Posted: 9 Sep 2013, 17:39:02 UTC - in response to Message 25785.  

I downgraded boinc-client from 7.2.7 to 7.0.27 (same version as second computer) and it seems to be ok
ID: 25818 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 25819 - Posted: 9 Sep 2013, 20:48:58 UTC

Just to be clear; all our executables are 32-bit. They run
on 64-bit systems OK. (In theory a 64-bit executable should be
numerically compatible. However. this is not likely to be the case
in practice. I will look at this one day, but right now need to sort out
the problems with our Linux executable.)
Thanks for all the feedback. Eric.
ID: 25819 · Report as offensive     Reply Quote
Profile Tom95134

Send message
Joined: 4 May 07
Posts: 250
Credit: 826,541
RAC: 0
Message 25821 - Posted: 11 Sep 2013, 3:04:54 UTC - in response to Message 25819.  

FYI...

Running Windows 7 (x64), BOINC 7.0.64 (x64), SixTrack 446.03.

No problems... yet.

ID: 25821 · Report as offensive     Reply Quote
computerguy09

Send message
Joined: 26 Oct 04
Posts: 6
Credit: 1,696,248
RAC: 0
Message 25824 - Posted: 12 Sep 2013, 17:28:13 UTC

Tried making sure any of the 32-bit libraries that could help were installed on one of my Ubuntu boxes. It didn't help.

Both of the Ubuntu 12.04 boxes get the failed WU's right away. They are running BOINC 7.0.65.

These also have the most cores, so I'm just trying to run LHC on my Windows boxes for now...
ID: 25824 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 25825 - Posted: 13 Sep 2013, 5:54:36 UTC

Our executables are all statically linked; should be NO dependence on
libraries. They do depend on system calls though, and the BOINC lib,
api lib and fortran api. Working hard on fixing this.
ID: 25825 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 25827 - Posted: 13 Sep 2013, 20:06:59 UTC

No problem running LHC on my Linux box, but my BOINC client is good old 6.10.58.
Tullio
ID: 25827 · Report as offensive     Reply Quote
Tex1954

Send message
Joined: 24 Apr 11
Posts: 37
Credit: 1,295,012
RAC: 0
Message 25828 - Posted: 14 Sep 2013, 6:18:14 UTC

I get nothing but errors running Linux Mint Cinnamon 14.

Processor AMD 1045T.

8-)
ID: 25828 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 25830 - Posted: 14 Sep 2013, 9:00:02 UTC

I had a system crash while running SuSE Linux 12.3 and went back to SuSE Linux 12.1. My CPU is Opteron 1210 at 1.8 GHz. I made a full check on CPU, RAM memory and disks using the 1.8 version of diagnostic tools supplied by SUN when I acquired the SUN workstation in 2008. All seem OK.
Tullio
ID: 25830 · Report as offensive     Reply Quote

Message boards : Number crunching : All tasks end with error


©2024 CERN