21) Questions and Answers : Getting started : Issues changing email address (Message 35244)
Posted 12 May 2018 by captainjack
Post:
I can't change my email address here or at the test site either. Maybe when they fix it here, it will work there too.
22) Message boards : ATLAS application : Download failures (Message 32824)
Posted 13 Oct 2017 by captainjack
Post:
The task fetch seems to ignore the parameter for "Max # CPUs". For computer 10476963 the Max # CPUs was changed to 2, but the server keeps sending 4 core tasks. The client_state.xml says

<app_version>
<app_name>ATLAS</app_name>
<version_num>101</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>4.000000</avg_ncpus>
<max_ncpus>2.000000</max_ncpus>


Seems odd that the max_ncpus is 2, but the avg_ncpus is 4.

Computer is using the default preferences.

Please let me know if I can provide more information.
23) Message boards : Number crunching : Less boinc credits than on other projects? (Message 30502)
Posted 26 May 2017 by captainjack
Post:
RaimundD,

Just to make sure you know, WCG takes the BOINC points and multiplies them by 7 to get WCG points. If you want to know how many BOINC points you get at WCG, you can check one of the accumulator web sites like boincstats.
24) Message boards : ATLAS application : New app version 1.01 (Message 29178)
Posted 10 Mar 2017 by captainjack
Post:
Just tried one on Linux. Task ran for about 20 minutes then got this:

2017-03-10 12:49:18 (8776): Guest Log: - Last 10 lines from /home/atlas01/RunAtlas/Panda_Pilot_5904_1489171051/PandaJob_3273309522_1489171055/athena_stdout.txt -
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.preExecute 2017-03-10 12:38:27,950 INFO Batch/grid running - command outputs will not be echoed. Logs for EVNTtoHITS are in log.EVNTtoHITS
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.preExecute 2017-03-10 12:38:27,952 INFO Now writing wrapper for substep executor EVNTtoHITS
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe._writeAthenaWrapper 2017-03-10 12:38:27,952 INFO Valgrind not engaged
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.preExecute 2017-03-10 12:38:27,952 INFO Athena will be executed in a subshell via ['./runwrapper.EVNTtoHITS.sh']
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.execute 2017-03-10 12:38:27,952 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh'])
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.execute 2017-03-10 12:46:25,442 INFO EVNTtoHITS executor returns 65
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.validate 2017-03-10 12:46:26,351 ERROR Validation of return code failed: Non-zero return code from EVNTtoHITS (65) (Error code 65)
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.trfExe.validate 2017-03-10 12:46:26,365 INFO Scanning logfile log.EVNTtoHITS for errors
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.transform.execute 2017-03-10 12:46:26,588 CRITICAL Transform executor raised TransformValidationException: Non-zero return code from EVNTtoHITS (65); Logfile error in log.EVNTtoHITS: "AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider"
2017-03-10 12:49:18 (8776): Guest Log: PyJobTransforms.transform.execute 2017-03-10 12:46:29,792 WARNING Transform now exiting early with exit code 65 (Non-zero return code from EVNTtoHITS (65); Logfile error in log.EVNTtoHITS: "AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider")

Task number 124796186

Let me know if you need more info.
25) Message boards : LHCb Application : Low CPU usage (Message 28087)
Posted 8 Dec 2016 by captainjack
Post:
Getting nothing but these error messages.

2016-12-08 15:01:50 (22444): Guest Log: [INFO] Job finished in slot1 with unknown exit code.


And no CPU usage.

Turning these off until I hear that they are working again.
26) Message boards : LHCb Application : Condor exited after 608s without running a job (Message 27989)
Posted 28 Nov 2016 by captainjack
Post:
Looks like it is working for me now. The task has made it past the 608 second mark and is using a full CPU thread.

Thanks for getting the image updated.

Will post again if anything changes.
27) Message boards : Number crunching : "New" project, old problem (LHCb) (Message 27971)
Posted 27 Nov 2016 by captainjack
Post:
jjv,

Yes it is a known problem and has been reported on the "LHCb Application" topic. The virtual machine can't communicate with the HTCondor server so it waits 600+ seconds then aborts. My recommendation would be to turn it off and monitor this post https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4014&postid=27898 to see when the project admins get it fixed.
28) Message boards : Sixtrack Application : Very low CPU-usage on Windows with SixTrack tasks (Message 27968)
Posted 26 Nov 2016 by captainjack
Post:
I noticed something similar on my Windows 10 machine.

When a Sixtrack task started, it used about 40% of a thread until the task was ~7% complete (~3 minutes). Then the % complete jumped back to almost 0% and it started using 100% of a thread.
29) Message boards : LHCb Application : Condor exited after 608s without running a job (Message 27951)
Posted 25 Nov 2016 by captainjack
Post:
Just tried another one and it looks like the problem still exists. Task number 108722165.

2016-11-24 19:16:39 (3560): Guest Log: [DEBUG] HTCondor ping
2016-11-24 19:16:49 (3560): Guest Log: [DEBUG] 0
2016-11-24 19:27:10 (3560): Guest Log: [ERROR] Condor exited after 627s without running a job.
2016-11-24 19:27:10 (3560): Guest Log: [INFO] Shutting Down.
2016-11-24 19:27:10 (3560): VM Completion File Detected.
2016-11-24 19:27:10 (3560): VM Completion Message: Condor exited after 627s without running a job.


Let me know if you need more information.
30) Message boards : LHCb Application : Condor exited after 608s without running a job (Message 27920)
Posted 23 Nov 2016 by captainjack
Post:
Still no response from HTCondor. That must be why the average run time for all Beauty tasks is 0.16 hours.

Time to turn this one off for a while.
31) Message boards : News : LHC@home consolidation (Message 27912)
Posted 22 Nov 2016 by captainjack
Post:
Crystal Pellet and Laurence,

Thanks for the clarification. I had completely misinterpreted the usage for the "Max # CPUs" parameter. Now I know how I will have to set up my app_config.xml.
32) Message boards : News : LHC@home consolidation (Message 27904)
Posted 22 Nov 2016 by captainjack
Post:
On the profile preferences, there is a parameter for "Max # CPUs". What is that parameter supposed to do?

One of my profiles is set for Max # jobs = 3 and Max # CPUs = 1. There are 3 tasks downloaded and all three of them are running.

Thanks for the insight.
33) Questions and Answers : Sixtrack : VirtualBox is not installed (Message 27886)
Posted 19 Nov 2016 by captainjack
Post:
CERN has decided to merge all of their volunteer projects into one umbrella project and the new merged project was previously the sixtrack project. You can read about that here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4002&postid=27816

The new and improved project will include virtualbox projects along with the sixtrack project which in not virtualbox.

If you go to the new project web site, and go to your preferences, you can see the other projects that will be included. Many of those projects require virtualbox.

Apparently, the new BOINC server checks to see if virtualbox is installed even if you do not have any of the virtualbox projects selected.

Your question was also asked by another person in a different thread. Apparently no one has been able to figure out why BOINC checks to see if you have virtualbox installed if you do not have any virtualbox projects selected.

Hope that helps.
34) Message boards : News : LHC@home consolidation (Message 27851)
Posted 15 Nov 2016 by captainjack
Post:
Ah, now I can see the options in Windows. I was taking the old link and putting https in front of it. That gets a security certificate error. The https web site has a different address. When I linked from BOINC, it worked. I have updated my favorite link to the new address.
35) Message boards : News : LHC@home consolidation (Message 27848)
Posted 15 Nov 2016 by captainjack
Post:
When I access the web site using Ubuntu/Firefox, I can see options to let me select applications that I want to run, number of tasks to download at one time and number of tasks to run at one time for each profile.

When I access the web site using Windows 10, I do not see any of those options.

When I access the web site using Windows 10 and use the https:// address, the browser shows a Security Certificate Error.
36) Message boards : Number crunching : Wrong applications sent to my computer? (Message 27508)
Posted 8 Jun 2015 by captainjack
Post:
Jim Martin,

I just removed the LHC@Home project and re-added it using both methods and they both worked fine for me.

I added it first using the "LHC@Home" listing in the pick-list under tools -> add project. As soon as it was added, it said that the master file was downloaded. I looked in the BOINC\projects\lhcathomeclassc.cern.ch_sixtrack folder and didn't see anything. I went ahead and removed it again and added it back in using the http://lhcathomeclassic.cern.ch/sixtrack address and it was re-attached. It said the master file was downloaded but I didn't see anything in the BOINC\projects\lhcathomeclassc.cern.ch_sixtrack folder. Then it downloaded a task, the task appeared in the project folder and now the task is running.

Please try to add it again and if it still doesn't show up, post the related messages from your event log. Maybe that will help give us some clues as to why it is not adding.

As a general question, are you able to add any other projects?
37) Message boards : Number crunching : Wrong applications sent to my computer? (Message 27506)
Posted 7 Jun 2015 by captainjack
Post:
Jim Martin,

If you open BOINC Manager, then "Tools", then "Add Project" it should bring up another window where you can select "Add Project" and click "Next". Then a list of projects you can pick from should pop up where you can choose "LHC@Home".

Or if you want to type the name in yourself, use "http://lhcathomeclassic.cern.ch/sixtrack".

Hope that helps.
38) Message boards : Number crunching : Computation error on network loss (Message 25698)
Posted 24 Aug 2013 by captainjack
Post:
Looks like you are running Linux and got a signal 11 error. The same thing happens over at WCG. Some of their projects will abort with a signal 11 and some keep running. Some people believe it is as much of a Linux problem as it is a BOINC/research problem. It never happens on my Windows 7 boxes.

Many people are reluctant to switch to Windows for a variety of reasons, one of which is that Linux is reputed to be 10-15% faster than Windows. For some of the WCG projects, Linux is about twice as fast as Windows.

So we keep running Linux knowing that it will have an occasional hiccup but there will be an overall speed gain.

Hope that helps.
39) Message boards : Number crunching : Computation errors (Message 25585)
Posted 15 May 2013 by captainjack
Post:
Hi jelle,

Are you running Linux? Since you are using 7.0.65 BOINC, I'm guessing that is the case.

On my Linux Ubuntu machines, there are a couple of things I know of that can cause a signal 11 error.

1. There were network problems (lost communications to the internet or other such issues) and BOINC gets a signal 11 error. Not much you can do about that one. Happened to me a couple of days ago and I had 3 jobs error out. If this is the cause, you should be able to look at the "Event Log" and see the network error messages.

2. When Linux gets busy doing something else and BOINC doesn't get any CPU cycles for a while, BOINC can get a signal 11 and error out the task. The recommended solution for that one is to go into your "Computing preferences" under "Your account". In the section for "Processor Usage" there is a parameter for "Suspend work when NON-BOINC CPU usage is above" and set that to 35%. That is the way my profile is set and I haven't had any of those errors in a while.

Hope that helps,
CaptainJack
40) Message boards : Number crunching : power glitch causes signal 11 error? (Message 25469)
Posted 7 Mar 2013 by captainjack
Post:
Yep, that's a known problem with Linux.

I've had my Ubuntu machines get that error when there was an internet connection problem. Windows machines would continue to work just fine.

The BOINC people know about it and the WCG team knows about it.

AFAIK, we are still waiting for a fix.


Previous 20


©2024 CERN