21) Message boards : Sixtrack Application : EXIT_DISK_LIMIT_EXCEEDED (Message 40086)
Posted 7 Oct 2019 by Alessio Mereghetti
Post:
An update on this issue - I managed to grant credit to tasks failing because of the EXIT_DISK_LIMIT_EXCEEDED issue on the specific study due to the specific inconsistent setting value.
The credit does not represent the full credit that would be acknowledge if the task was run till the end and validated to avoid cheating - in the end, all the tasks failed before coming to conclusion and there was no way to validate the partial result.

Please post here if something odd related to this study happens.

Happy crunching,
A.
22) Message boards : Sixtrack Application : EXIT_DISK_LIMIT_EXCEEDED (Message 40065)
Posted 2 Oct 2019 by Alessio Mereghetti
Post:
Thanks, PDW, for spotting this problem again.

The failures are due to a (log) file growing beyond the DISK request.
The user was not aware that he should have increased the request if he was submitting extremely long jobs (1e7 turns, when we tipically simulate a factor 10 less).

On the code side, the next release won't generate this (log) file unless explicitly requested by the user.
For the affected tasks, I am looking into the possibility of anyway granting some credit for the CPU time even if the results are not going to be valdated...

I apology for the inconvenience, and thanks again for the support!
A.
23) Message boards : News : The SixTrack team welcomes the LHC@Home volunteers at the CERN open days (Message 39918)
Posted 13 Sep 2019 by Alessio Mereghetti
Post:
I'll be joining saturday to meet you and understand a bit more why my cpu is burning :-) Could you tell us where is located the meeting point ? I read that it's the building 504… but on which site is that ? 

The meeting point is restaurant number 2 on the main CERN site (i.e. Meyrin). I think that the LHC@Home stand should be located at ground floor, in the hall.
See you tomorrow!
A.
24) Message boards : News : The SixTrack team welcomes the LHC@Home volunteers at the CERN open days (Message 39917)
Posted 13 Sep 2019 by Alessio Mereghetti
Post:
I would make the 5,200 mile drive if my car could cross the Atlantic and North America or just fly and get a Hilton room for a month if I got 25 cents per credit here and that might even get the wife to go

Many many thanks!
25) Message boards : News : The SixTrack team at the LHC@Home desk for the CERN open days (Message 39908)
Posted 12 Sep 2019 by Alessio Mereghetti
Post:
Dear volunteers,

thanks to those who have filled in the doodle we circulated last week:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5130&postid=39794#39794

We decided to deliver a presentation every day in the most populated time slots out of the doodle poll, i.e. on Sat. 14th Sep, between 03:00 and 04:00 PM, and on Sun 15th Sep, between 02:00 and 03:00 PM.
The meeting point will be the LHC@Home desk in R2 (building 504), at the beginning of the time slot. We will have to walk few minutes to a meeting room where there will be the presentations. We will be back at the meeting point by the end of the time slot at the latest.

Looking forward to shaking hands and meeting you,
Alessio and Massimo, for the SixTrack team
26) Message boards : Number crunching : LHC on android (Message 39843)
Posted 6 Sep 2019 by Alessio Mereghetti
Post:
Thanks, Crystal Pellet, I was too fast :)

Here are the public ones :)
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10604711
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10604713

Happy crunching!
A.
27) Message boards : Sixtrack Application : SIXTRACKTEST (Message 39839)
Posted 6 Sep 2019 by Alessio Mereghetti
Post:
Hi Ray,
thanks for the suggestion. We are looking into something of this kind, i.e. we split the job in sub-steps, and we collect the partial results together with the checkpoint-restart files.
The machinery in SixTrack is almost ready, for BOINC the process will be completely transparent; we are now in the process of modifying the software on the side of the scientists - it will come in few months, with a major code re-writing. Slowly but moving :)
Happy crunching
28) Message boards : Number crunching : LHC on android (Message 39838)
Posted 6 Sep 2019 by Alessio Mereghetti
Post:
@H. Miersch: Full steam crunching, thanks a lot!!!
https://lhcathome.cern.ch/lhcathome_ops/db_action.php?table=result&detail=low&hostid=10604711&sort_by=sent_time
https://lhcathome.cern.ch/lhcathome_ops/db_action.php?table=result&detail=low&hostid=10604713&sort_by=sent_time

@nicodemusw: yes, error 31 is our starting point

Thanks for the quick feedback, and happy crunching
29) Message boards : Number crunching : LHC on android (Message 39818)
Posted 4 Sep 2019 by Alessio Mereghetti
Post:
sorry for the late reply.

@H. Miersch and @arizonadeux : thanks for posting the issue. The problem is not that we do not send out tasks for android - at job submission we do not target specific app plan classes. There must be something preventing the feeder to accept the requests from your hosts - I will contact IT for that, maybe the log files can tell us something more. In the meanwhile, could you please check that you flagged the android app as test?

@nicodemusw : your android host actually crunched tasks:
https://lhcathome.cern.ch/lhcathome_ops/db_action.php?table=result&hostid=10609518&sort_by=sent_time&detail=low&nresults=40
The point is that, being the version of your android >8, the process is killed immediately by the OS. James is looking into this on our side (though not full time), together with the BOINC devel team - it is not a problem of SixTrack per se.

The fact of having an app flagged as test for the production SixTrack is just to make volunteers aware that something can go wrong with that app. Since it is the case for the android app as the app runs only on OS versions <8 (again the OS kills the process), we let people with an appropriate OS version to contribute, and let know the others that the app may fail.
30) Message boards : News : The SixTrack team welcomes the LHC@Home volunteers at the CERN open days (Message 39794)
Posted 2 Sep 2019 by Alessio Mereghetti
Post:
Dear volunteers,

following Nils's post on the MBs:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5129&postid=39763#39763
the SixTrack team is looking into welcoming you at CERN and greet you for the CPU time you make available to us. To do so in the best way, we would like to know when you will be most likely passing by the IT stand, such that we concentrate our efforts on the time when most of you can be there. Hence, please find below a doodle that we will use to target the optimal time window:
https://doodle.com/poll/qpw36awgspufawi7

Thanks a lot in advance, and happy crunching!
Alessio and Massimo, for the SixTrack team
31) Message boards : Sixtrack Application : Inconclusive results (Message 39736)
Posted 26 Aug 2019 by Alessio Mereghetti
Post:
I have re-run two (randomly picked-up) WUs out of the 19 inconclusive - my CPU times suggest that the linux machine is correct (I don't have access to the result files produced by the volunteers though).

The windows host is going crazy:
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10584523
The tasks labelled as 'validation inconclusive' actually exit with an error, eg:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=241473594

I am contacting the owner - the machine seems to be very powerful and valuable.
Thanks for pointing this out.
A.
32) Message boards : Sixtrack Application : Inconclusive results (Message 39735)
Posted 26 Aug 2019 by Alessio Mereghetti
Post:
Thanks for pointing these out.

Odd behavior - it seems that the windows machine is not computing results correctly, whereas the linux one does. I am re-running a couple of cases, just to be sure that the linux host is behaving correctly.
Due to the long backlog, we will need to wait a bit to get the task re-sent to any wingman
33) Message boards : Sixtrack Application : SIXTRACKTEST (Message 39734)
Posted 26 Aug 2019 by Alessio Mereghetti
Post:
Thanks for pointing this out.

Not clear what happened (cannot even see the owner of the machine) - it seems like the machine did not even started the others...
34) Message boards : Sixtrack Application : SIXTRACKTEST (Message 39662)
Posted 20 Aug 2019 by Alessio Mereghetti
Post:
I think that your host has been hit by some very short (successful) tasks (with basically no dynamic aperture, a perfectly physical case) which led the BOINC server to think that the host is super-fast.
The FPOPs in the error messages:
<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
exceeded elapsed time limit 17457.49 (1920000000.00G/109981.44G)</message>
<stderr_txt>

</stderr_txt>
]]>

are too high to be real:
109981.44G


We had a similar issue in 2017:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4273
which led to updating the validator, but I think we need to fine-tune it even further.

Thanks for the precious feedback!
A.
35) Message boards : Sixtrack Application : SIXTRACKTEST (Message 39653)
Posted 19 Aug 2019 by Alessio Mereghetti
Post:
exactly - timewise, the final results may come later than going with single 10^7 turns jobs, but the overall BOINC and volunteer processing should be more efficient.
In addition, this option could give us the opportunity to resume any job/study from its ending point :)
36) Message boards : Sixtrack Application : SIXTRACKTEST (Message 39649)
Posted 19 Aug 2019 by Alessio Mereghetti
Post:
Hi,
sorry for the late reply - just back from vacation.

The 10^7 turns jobs are (should be) sent only on sixtracktest (for the time being) due to their duration and not to mess up with regular production. We are planning to go in production with such a long time range of beam dynamics with split jobs (eg 10^7 turns = 10 consecutive jobs * 10^6 turns) instead of only one job.

I have asked the scientist submitting these jobs to proceed slowly for not flooding volunteers with so long jobs. The first batch of jobs was sent out with the usual
delay_bound
of ~1w. We then increased the parameter to 2w, in order to decrease the amount of jobs killed because of the deadline not being met (and wasting useful resources). For instance, the four tasks reported by Crystal Pellet in his thread:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4296&postid=39537#39537

w-c6_job.B1topenergy.b6onIRoff_c6.2052__1__s__62.31_60.32__6.1_8.1__7__70.5_1_sixvf_boinc106_2
w-c6_job.B1topenergy.b6onIRoff_c6.2052__1__s__62.31_60.32__4.1_6.1__7__10.5_1_sixvf_boinc7_0
w-c2_job.B2topenergy.b6onIRon_c2.2052__1__s__62.31_60.32__8.1_10.1__7__34.5_1_sixvf_boinc520_2
w-c2_job.B2topenergy.b6onIRon_c2.2052__1__s__62.31_60.32__6.1_8.1__7__63_1_sixvf_boinc490_2

belong to this second batch.

Concerning the WU with the errors reported by mmonnin https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4296&postid=39551#39551 and maeax https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4296&postid=39535#39535:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=120080417
all tasks had a report deadline 2w after the sent time stamp, and the two failing tasks failed way earlier than that deadline.
For https://lhcathome.cern.ch/lhcathome/result.php?resultid=238806725, the issue seems to be related to the
rsc_fpops_bound
parameter, whereas for the other task https://lhcathome.cern.ch/lhcathome/result.php?resultid=238806726, the issue might be something different - maybe a transient problem with the HD: https://boinc.mundayweb.com/wiki/index.php?title=Process_exited_with_code_2_(0x2,_-254)

A.
37) Message boards : Sixtrack Application : Server Status (Message 39648)
Posted 19 Aug 2019 by Alessio Mereghetti
Post:
Hi,
sorry for the late reply - just back from holidays.
Do you still see the issue? I have just put online my pc and got 35 SixTrack tasks straight away.
Cheers,
A.
38) Message boards : Number crunching : Validation Pendind since 02.JUN.2019 (Message 39469)
Posted 30 Jul 2019 by Alessio Mereghetti
Post:
Hi,
thanks to computezrmle for the correct replies.

Concerning the comment by Crystal Pellet:

What surely would help, is when a 'resend' (3rd, 4th wingman) is needed,
that special created task is placed in front of the queue and not at the end.
This is normal BOINC-practice, but not at LHC.

This would simplify a lot the life of SixTrack users - let's see the IT experts.
Cheers,
A.
39) Message boards : Number crunching : LHC on android (Message 39444)
Posted 27 Jul 2019 by Alessio Mereghetti
Post:
Android is also beta for the sixtrack app - nevertheless, it works only if your android version is <8
If you ask for work, what does the log say?
40) Message boards : Sixtrack Application : Inconclusive, valid/invalid results (Message 39399)
Posted 20 Jul 2019 by Alessio Mereghetti
Post:
Hello, maeax,

thanks a lot for spotting this. At first glace I feared we ran into a corner case of a calculation not correctly coded, hence leading two different results on different platforms. Then, we checked re-running the WU, with the two exes - your result matches the linux one, as expected, whereas the windows one did not match the result from the other volunteer.
The windows and linux results match. Hence, we concluded that the other host most probably experienced a memory corruption not related to the code or the input files.

The wingman should confirm this.
Happy crunching!
A.


Previous 20 · Next 20


©2024 CERN