Message boards : ATLAS application : ATLAS issues
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 6 · Next

AuthorMessage
Masky
Avatar

Send message
Joined: 29 Mar 18
Posts: 8
Credit: 270,958
RAC: 0
Message 35407 - Posted: 1 Jun 2018, 12:36:51 UTC

Hello people,

Sinds a few day`s i have issues with ATLAS projects.
Do you guy`s know whats going on ?
https://imgur.com/a/EKfooOs
ID: 35407 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,760,567
RAC: 232,601
Message 35411 - Posted: 1 Jun 2018, 21:35:01 UTC - in response to Message 35407.  

It happens ever now and again for me, I just abort them and it back to normal.
ID: 35411 · Report as offensive     Reply Quote
Masky
Avatar

Send message
Joined: 29 Mar 18
Posts: 8
Credit: 270,958
RAC: 0
Message 35412 - Posted: 1 Jun 2018, 23:11:14 UTC - in response to Message 35411.  

Every ATLAS project results in the same, unmanageable restarting later.
No other projects will be loaded aswel resulting your pc is idle all night.
ID: 35412 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,760,567
RAC: 232,601
Message 35414 - Posted: 2 Jun 2018, 8:57:02 UTC

I agree it's irritating, you could try downgrading to the 5.1.x branch of Virtual Box this seems more reliable.
ID: 35414 · Report as offensive     Reply Quote
wiseguy

Send message
Joined: 1 Nov 05
Posts: 1
Credit: 291,028
RAC: 0
Message 35441 - Posted: 6 Jun 2018, 15:52:05 UTC

ATLAS simply not able to download anything. Itworked earlier.
Error:

2018.06.06. 17:44:47 | ATLAS@home | [error] No scheduler URLs found in master file

What can I do?
ID: 35441 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,947,974
RAC: 137,227
Message 35442 - Posted: 6 Jun 2018, 15:59:49 UTC - in response to Message 35441.  

ATLAS simply not able to download anything. Itworked earlier.
Error:

2018.06.06. 17:44:47 | ATLAS@home | [error] No scheduler URLs found in master file

What can I do?

You used a retired URL.
Reconnect to this one:
https://lhcathome.cern.ch/lhcathome/
ID: 35442 · Report as offensive     Reply Quote
JamesF

Send message
Joined: 4 Feb 18
Posts: 1
Credit: 552,252
RAC: 0
Message 35474 - Posted: 10 Jun 2018, 13:00:22 UTC - in response to Message 35412.  

I am seeing the same issue. Did you find a resolution at all?
ID: 35474 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 35476 - Posted: 10 Jun 2018, 14:50:54 UTC - in response to Message 35474.  

Hello this is maybe an error of communication between the vboxwrapper and virtualbox.

ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time

BOINC will be notified that it needs to clean up the environment.
This is a temporary problem and so this job will be rescheduled for another time.


To solve it :
Go to the VirtualBox Manager and then File/VB Media Manager/ and in that box you may find some vdi's that need to be removed since they can mess up the new tasks trying to get a slot to use.

This is what you do not want to see.....and the good and the bad there



Erase all the vdis which have a yellow triangle and keep only the one with a green triangle.Don't touch the others.
It will cleanup your environment.

It happens sometimes to delete manualy some of them in virtualbox manager , when boinc fails to delete the vdis.
ID: 35476 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,947,974
RAC: 137,227
Message 35478 - Posted: 10 Jun 2018, 16:06:54 UTC - in response to Message 35474.  

Some basic thoughts.

Your computer page shows that you run a 4-core system with 2 GPUs (NVIDIA + INTEL).
If your GPUs run on full load, your system needs 2 CPU cores to support the GPUs.
Thus you will have only 2 CPU cores left for other work.
Your task overview shows that you run ATLAS using a 4-core setup.
This may sooner or later cause timing problems.

I suggest you may try a 2-core setup for ATLAS.
If you do so, you should also use an app_config.xml to avoid errors due to a too low default RAM setting.
<app_config>
  <app>
    <name>ATLAS</name>
    <max_concurrent>1</max_concurrent>
    <report_results_immediately/>
  </app>
  <app_version>
    <app_name>ATLAS</app_name>
    <plan_class>vbox64_mt_mcore_atlas</plan_class>
    <avg_ncpus>2.0</avg_ncpus>
    <cmdline>--nthreads 2 --memory_size_mb 4800</cmdline>
  </app_version>
</app_config>




Error 1:
"Vboxwrapper lost communication with VirtualBox"
"BOINC will be notified that it needs to clean up the environment"

This is mostly caused by:
- a crash
- unclean shutdown
- ...

All of those causes either that some old files remain in a "slot" folder or that the links to those files are not removed from the VirtualBox control files.
You may:
1. stop your BOINC client
2. run your VirtualBox GUI (use the same user that usually runs BOINC!)
3. Remove all VMs that are located in a BOINC slot but not "running" or "paused"
4. restart your BOINC client (better: restart your computer)



If this errors appear again, consider to repeat the cleanup and downgrade to the most recent VirtualBox 5.1.x.



Error 2: Image files marked with a yellow triangle.
This is nasty (I also have lots of them) but not responsible for the errors you notice.
Just clean up the list from time to time as described by PHILIPPE.
ID: 35478 · Report as offensive     Reply Quote
Rabinovitch
Avatar

Send message
Joined: 11 May 07
Posts: 23
Credit: 3,631,975
RAC: 0
Message 35512 - Posted: 13 Jun 2018, 15:45:17 UTC

Guys, why there are 100% of tasks are ending with errors on this host?
ID: 35512 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 35513 - Posted: 13 Jun 2018, 15:56:30 UTC - in response to Message 35512.  

Guys, why there are 100% of tasks are ending with errors on this host?

Do you have CVMFS installed and configured?
Checking for CVMFS
ls: &#208;&#189;&#208;&#181;&#208;&#178;&#208;&#190;&#208;&#183;&#208;&#188;&#208;&#190;&#208;&#182;&#208;&#189;&#208;&#190; &#208;&#191;&#208;&#190;&#208;&#187;&#209;&#131;&#209;&#135;&#208;&#184;&#209;&#130;&#209;&#140; &#208;&#180;&#208;&#190;&#209;&#129;&#209;&#130;&#209;&#131;&#208;&#191; &#208;&#186; '/cvmfs/atlas.cern.ch/repo/sw': &#208;&#157;&#208;&#181;&#209;&#130; &#209;&#130;&#208;&#176;&#208;&#186;&#208;&#190;&#208;&#179;&#208;&#190; &#209;&#132;&#208;&#176;&#208;&#185;&#208;&#187;&#208;&#176; &#208;&#184;&#208;&#187;&#208;&#184; &#208;&#186;&#208;&#176;&#209;&#130;&#208;&#176;&#208;&#187;&#208;&#190;&#208;&#179;&#208;&#176;
cvmfs_config doesn't exist, check cvmfs with cmd ls /cvmfs/atlas.cern.ch/repo/sw
ls /cvmfs/atlas.cern.ch/repo/sw failed,aborting the jobs
ID: 35513 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,947,974
RAC: 137,227
Message 35514 - Posted: 13 Jun 2018, 16:05:13 UTC - in response to Message 35512.  

Looks like a CVMFS error:
Checking for CVMFS
cvmfs_config doesn't exist, check cvmfs with cmd ls /cvmfs/atlas.cern.ch/repo/sw
ls /cvmfs/atlas.cern.ch/repo/sw failed,aborting the jobs


You may check your CVMFS installation.
Then run:
"sudo cvmfs_config wipecache"
"cvmfs_config probe"

What is the output of "cvmfs_config probe"?
ID: 35514 · Report as offensive     Reply Quote
Rabinovitch
Avatar

Send message
Joined: 11 May 07
Posts: 23
Credit: 3,631,975
RAC: 0
Message 35524 - Posted: 14 Jun 2018, 9:11:44 UTC - in response to Message 35513.  

Guys, why there are 100% of tasks are ending with errors on this host?

Do you have CVMFS installed and configured?


Now I do:


eti@DetiPC ~ $ sudo cvmfs_config chksetup
OK
deti@DetiPC ~ $ cvmfs_config probe
Probing /cvmfs/atlas.cern.ch... OK
Probing /cvmfs/atlas-condb.cern.ch... OK
Probing /cvmfs/grid.cern.ch... OK
deti@DetiPC ~ $ 


Let's see if this helps...
ID: 35524 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 35525 - Posted: 14 Jun 2018, 11:47:11 UTC - in response to Message 35524.  
Last modified: 14 Jun 2018, 12:03:03 UTC

cvmfs should now work.

Your most recent tasks show that singularity is not installed.
If your OS is not SLC6 (which is obviously the case) you also have to install singularity:
https://singularity.lbl.gov/

When singularity is working you should finally be good to go.
ID: 35525 · Report as offensive     Reply Quote
Rabinovitch
Avatar

Send message
Joined: 11 May 07
Posts: 23
Credit: 3,631,975
RAC: 0
Message 35530 - Posted: 16 Jun 2018, 4:25:26 UTC - in response to Message 35525.  
Last modified: 16 Jun 2018, 4:27:24 UTC


Your most recent tasks show that singularity is not installed.
If your OS is not SLC6 (which is obviously the case) you also have to install singularity:
https://singularity.lbl.gov/

When singularity is working you should finally be good to go.


Finally it works... What a complicated project to participate! I highly doubt that there will be too much of volunteers ready to perform all that quest with cvmfs and singularity just to help scientists for free...

What is SLC6?
ID: 35530 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,130,430
RAC: 104,897
Message 35531 - Posted: 16 Jun 2018, 5:11:12 UTC - in response to Message 35530.  

SLC6 = Scientific Linux Vers.6.9.
CentOS need's also no Singularity Installation as SL69.
ID: 35531 · Report as offensive     Reply Quote
Gloria Cicconofri

Send message
Joined: 13 Apr 18
Posts: 9
Credit: 35,148
RAC: 0
Message 36502 - Posted: 19 Aug 2018, 21:54:34 UTC

Hi everyone!
Sorry to bother, but I can't understand what's happening here. Lately I've been receiving quite a little amount of requests from LHC, and all of them, right after a few hours of computing gave he same exact result: error while computing.
I can't really undestand what's wrong. I'm using BOINC for other projects, and none of them gave this problem. Is there a way you can help me fix it?
Thanks a lot!
ID: 36502 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,502,974
RAC: 4,007
Message 36503 - Posted: 19 Aug 2018, 22:26:03 UTC - in response to Message 36502.  

Hi everyone!
Sorry to bother, but I can't understand what's happening here. Lately I've been receiving quite a little amount of requests from LHC, and all of them, right after a few hours of computing gave he same exact result: error while computing.
I can't really undestand what's wrong. I'm using BOINC for other projects, and none of them gave this problem. Is there a way you can help me fix it?
Thanks a lot!


Your tasks are saying you run out of disc space so maybe try setting your Boinc Manager

Options - Computing Preferences - Disc and Memory -



Try that and see if it helps
ID: 36503 · Report as offensive     Reply Quote
Gloria Cicconofri

Send message
Joined: 13 Apr 18
Posts: 9
Credit: 35,148
RAC: 0
Message 36505 - Posted: 19 Aug 2018, 22:45:50 UTC - in response to Message 36503.  
Last modified: 19 Aug 2018, 22:46:12 UTC

Thanks a lot, as soon as I get a new ATLAS request I'll let you know if it worked.
ID: 36505 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36506 - Posted: 19 Aug 2018, 23:26:08 UTC - in response to Message 36503.  

Your tasks are saying you run out of disc space so maybe try setting your Boinc Manager

Yes, it is a disk space problem but not the kind of disk space problem you are thinking of. It's the "196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED" error which seems to fool a lot of people. Maybe the solution you offered will work but not likely.

Harri Liljeroos explains the cause of this error thoroughly in https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4773&postid=36371#36371. Note it is not a problem of BOINC not having enough disk space assigned in user preferences.

The solution comes from deducing why <rsc_disk_bound>xxx</rsc_disk_bound> is being exceeded. Possible reasons are (but not limited to):
1) ATLAS tasks are being pre-empted by other project/tasks and causing an ultra-large snapshot file to be saved in the slot folder
2) old snapshots or other garbage left behind by previous tasks not being deleted
3) combination of 1) and 2)

I would try the following:

1) set "no new tasks" for all projects and drain the cache completely
2) delete all the slot folders in the BOINC data folder
3) set "switch between tasks every __ minutes" to a very large value to ensure that ATLAS tasks are not pre-empted
4) do not allow the OS to install updates and reboot the system whenever it wishes
5) install updates manually and give VBox ample time to shutdown running tasks before rebooting
ID: 36506 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 6 · Next

Message boards : ATLAS application : ATLAS issues


©2024 CERN