Message boards : ATLAS application : ATLAS native app
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

AuthorMessage
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,978
RAC: 139,751
Message 32093 - Posted: 25 Aug 2017, 20:33:29 UTC

Since 20 UTC the first task is running in SL69!
Will tomorrow see the result, because of 11 hours runtime shown.
Thank you Atlas-Team.
ID: 32093 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1127
Credit: 49,750,513
RAC: 9,551
Message 32095 - Posted: 26 Aug 2017, 3:40:20 UTC - in response to Message 32093.  

Since 20 UTC the first task is running in SL69!
Will tomorrow see the result, because of 11 hours runtime shown.
Thank you Atlas-Team.


You're welcome Axel
Volunteer Mad Scientist For Life
ID: 32095 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,978
RAC: 139,751
Message 32096 - Posted: 26 Aug 2017, 5:21:03 UTC

Thanks Magic.

Have overnight the first three Atlas SL69 finished.
BUT... only less than 50 cobblestones for the task.
Linux was at 91/8/25 the first time presentated :-).

Will now take a look to let it run in OpenSuse 42.x.

What is your Linux favorite, Magic?
ID: 32096 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 32101 - Posted: 26 Aug 2017, 7:53:05 UTC - in response to Message 32096.  

Sry for late response.

Will now take a look to let it run in OpenSuse 42.x.

This will not work/you will get no native tasks because right now it HAS to be either CentOS 7 or SL6.
Once they have fixed all remaining issues it will be available to other Linux distros, but for now it is restricted to those 2. I guess they will post here once it is ready.
ID: 32101 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,978
RAC: 139,751
Message 32106 - Posted: 26 Aug 2017, 9:16:00 UTC

ID: 32106 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 32107 - Posted: 26 Aug 2017, 10:13:51 UTC - in response to Message 32106.  
Last modified: 26 Aug 2017, 10:23:39 UTC

yes, i have understood this post the same way you are. but i tried it on debian (with working cvmfs and singularity) and only received vbox tasks. then i wrote here that only vbox tasks were sent, to which David replied with this:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4396&postid=31980
But i am curious what you will report :-)
ID: 32107 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,978
RAC: 139,751
Message 32108 - Posted: 26 Aug 2017, 10:41:42 UTC - in response to Message 32107.  
Last modified: 26 Aug 2017, 10:45:49 UTC

OpenSuse is only a option for the future.
Have noticed, that it is a Test-Version.
Do you have Virtualbox deinstalled for your Debian as David wrote?

We can only test step by step in the new Linux-App.

Edit: This Atlas-Tasks are part of the Production-Server.
ID: 32108 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 32110 - Posted: 26 Aug 2017, 12:20:19 UTC - in response to Message 32108.  

Do you have Virtualbox deinstalled for your Debian as David wrote?

Yes. But BOINC is just complaining that Virtualbox is not installed and the server does not send anything.
ID: 32110 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1127
Credit: 49,750,513
RAC: 9,551
Message 32121 - Posted: 26 Aug 2017, 20:04:52 UTC - in response to Message 32096.  

Thanks Magic.


What is your Linux favorite, Magic?


Well I like to test things but since I live just NW of Microsoft Corporation I am required to say I only run Windows OS's

BUT I can't wait for the day when I can get unlimited data high-speed internet since running VB tasks use up all 80GB of my monthly total in one week so I get throttled down to about 2MBps for the last 3 weeks and Cern and VB tend to have a problem starting these VB tasks before the time limit so they end up Invalid.

It takes 3MBps or better to be sure they will all start up and get beyond HTCondor Ping

Which is why I wish we had SixTrack tasks right now since you don't have to leave all the computers online to run those tasks so once I load all mine up I can just unplug the ethernet just to make sure no data is lost (I do the same with Einstein GPU's)

So since it is 1pm here it is not easy to even start these VB tasks one at a time (same with my vLHC-dev tasks)

19 more days before I get to run at 45MBps d/l so I hope we get a new batch of SixTracks.

(and the Atlas-alpha is even harder to get running)
Volunteer Mad Scientist For Life
ID: 32121 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,978
RAC: 139,751
Message 32123 - Posted: 27 Aug 2017, 16:25:37 UTC
Last modified: 27 Aug 2017, 16:27:34 UTC

The Scratch-folder is growing to GByte.
After a few Atlas-Tasks, the message is:
There is not enough Space to Start a new Task in Boinc.
This Scratch-folder begin with ax,gb,4f......
After deleting them, it is possible to get a new task.
BUT at the moment Atlas-production-Server have ZERO tasks. :-(

By the way, Magic we hope that the ISP for you hear your words.
ID: 32123 · Report as offensive     Reply Quote
Juha

Send message
Joined: 22 Mar 17
Posts: 30
Credit: 360,676
RAC: 0
Message 32126 - Posted: 27 Aug 2017, 20:05:54 UTC - in response to Message 32123.  

The scratch directory is cvmfs' cache directory. See Cache Settings. You can check your current configuration with:

cvmfs_config showconfig atlas.cern.ch
...
CVMFS_CACHE_BASE=/scratch/cvmfs    # from /etc/cvmfs/default.local
CVMFS_QUOTA_LIMIT=4096    # from /etc/cvmfs/default.local


For me, the difference between empty cache and all needed files in cache is about 40 minutes. I definitely want the files cached.
ID: 32126 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,978
RAC: 139,751
Message 32127 - Posted: 28 Aug 2017, 8:37:54 UTC

Thank you for your answer, Juha.

Have changed from 4,096 to 2,048.

After cvmfs_config reload and sudo service autofs restart,
this message is still shown,
but not every time and got also tasks from Atlas-Production Server combined with this message need more space!
OK, have Preferences changed from one task to two tasks. Maybe this is the reason.
ID: 32127 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,361,197
RAC: 132,051
Message 32128 - Posted: 28 Aug 2017, 9:20:14 UTC

CVMFS can be configured to use a local proxy (squid).
Thus you can keep the quota low and serve the requests from the proxy cache.
ID: 32128 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,978
RAC: 139,751
Message 32129 - Posted: 28 Aug 2017, 12:41:03 UTC
Last modified: 28 Aug 2017, 13:20:53 UTC

Proxy=DIRECT. Its ok so.

Have with 4 CPU's running two tasks with 2CPU's per task.
They finished after 600 sec. 0x1-Error.
Memory 10,240 GByte.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=153951224

After a reboot, the same Error with ONE task and 2 CPU's.

When reload of CVMFS and default.local is changed to PROXY=DIRECT it is running.

After reboot probe and chksetup are needed to see if something is not ok.
ID: 32129 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 242
Credit: 5,800,306
RAC: 0
Message 32131 - Posted: 29 Aug 2017, 6:42:21 UTC

I have one host where all my native ATLAS tasks fail with "validate error" after 6 minutes.

On my other host (my desktop PC), the ATLAS native app works fine. Both hosts are running CentOS 7 and have CvmFS and Singularity installed.

Our ATLAS friends suggested to check the home directory settings for Singularity, in fact the account running BOINC on the 2 boxes had different home directories. Now I have updated the account on the PC with the failing task with the same home directory settings as the PC that runs the native app successfully. However, it still fails.

Any further hints on this from the ATLAS team? Shall I re-install Singularity?
ID: 32131 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 32134 - Posted: 29 Aug 2017, 10:22:46 UTC - in response to Message 32131.  

Hi Nils,

This computer is only running vbox WU, and I suspect the validate error is due to not enough memory in the VM. The message "FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider" usually means running out of memory, and I've seen others say that the memory we specify for single-core tasks is not enough. I'll see if we can increase it.

This computer is running native tasks but they are failing because it looks like cvmfs is not running:

Checking for CVMFS
check cvmfs return values are 256, 256
CVMFS not found, aborting
ID: 32134 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 242
Credit: 5,800,306
RAC: 0
Message 32150 - Posted: 31 Aug 2017, 6:20:39 UTC - in response to Message 32134.  

Hi David,

Thanks, the first node used to run happily Vbox tasks, but ran out of memory as you said. It has also been running the native app successfully in the past.

I am not sure why the /cvmfs mount was lost on the second host. After re-installing CVMFS and Singularity on Tuesday, I now get a different error on this host:

Job failed: Non-zero failed job return code: 65


Ref.result: 154053235.
ID: 32150 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,978
RAC: 139,751
Message 32154 - Posted: 31 Aug 2017, 7:27:50 UTC
Last modified: 31 Aug 2017, 7:43:18 UTC

Boinc_conf under atlasathome have a change of the Boinc (7.5.1) at 17/8/26 6 UTC.

Have on two PC's SL69 and one run only with one CPU and not with two as defined in Preferences.

Installation of SL69 on both is identical.

This Boinc(7.5.1) is seen in the boinc_conf as old one too.

Edit: This SL69 have finished work with low cobblestones.

https://lhcathome.cern.ch/lhcathome/results.php?hostid=10496079
ID: 32154 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 32156 - Posted: 31 Aug 2017, 7:38:46 UTC - in response to Message 32150.  

Job failed: Non-zero failed job return code: 65


Ref.result: 154053235.


This is the same error as the vbox tasks so I think it's a memory problem again. This machine has only 3448MB of memory, we should probably require at least 4GB for native tasks.
ID: 32156 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2099
Credit: 159,815,978
RAC: 139,751
Message 32160 - Posted: 31 Aug 2017, 10:26:39 UTC - in response to Message 32154.  

Have on two PC's SL69 and one run only with one CPU and not with two as defined in Preferences.


Sorry, my fault. During the Installation of the third PC with SL69 today
the Processor-use for the PC was changed from me to 95% instead of 100%.
ID: 32160 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

Message boards : ATLAS application : ATLAS native app


©2024 CERN