Thread 'ATLAS issues'

Author	Message
Masky Send message Joined: 29 Mar 18 Posts: 8 Credit: 270,958 RAC: 0	Message 35407 - Posted: 1 Jun 2018, 12:36:51 UTC Hello people, Sinds a few day`s i have issues with ATLAS projects. Do you guy`s know whats going on ? https://imgur.com/a/EKfooOs ID: 35407 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 899 Credit: 771,840,627 RAC: 186,831	Message 35411 - Posted: 1 Jun 2018, 21:35:01 UTC - in response to Message 35407. It happens ever now and again for me, I just abort them and it back to normal. ID: 35411 · Reply Quote

Masky Send message Joined: 29 Mar 18 Posts: 8 Credit: 270,958 RAC: 0	Message 35412 - Posted: 1 Jun 2018, 23:11:14 UTC - in response to Message 35411. Every ATLAS project results in the same, unmanageable restarting later. No other projects will be loaded aswel resulting your pc is idle all night. ID: 35412 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 899 Credit: 771,840,627 RAC: 186,831	Message 35414 - Posted: 2 Jun 2018, 8:57:02 UTC I agree it's irritating, you could try downgrading to the 5.1.x branch of Virtual Box this seems more reliable. ID: 35414 · Reply Quote

wiseguy Send message Joined: 1 Nov 05 Posts: 1 Credit: 291,028 RAC: 0	Message 35441 - Posted: 6 Jun 2018, 15:52:05 UTC ATLAS simply not able to download anything. Itworked earlier. Error: 2018.06.06. 17:44:47 \| ATLAS@home \| [error] No scheduler URLs found in master file What can I do? ID: 35441 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 39,191	Message 35442 - Posted: 6 Jun 2018, 15:59:49 UTC - in response to Message 35441. ATLAS simply not able to download anything. Itworked earlier. Error: 2018.06.06. 17:44:47 \| ATLAS@home \| [error] No scheduler URLs found in master file What can I do? You used a retired URL. Reconnect to this one: https://lhcathome.cern.ch/lhcathome/ ID: 35442 · Reply Quote

JamesF Send message Joined: 4 Feb 18 Posts: 1 Credit: 552,252 RAC: 0	Message 35474 - Posted: 10 Jun 2018, 13:00:22 UTC - in response to Message 35412. I am seeing the same issue. Did you find a resolution at all? ID: 35474 · Reply Quote

PHILIPPE Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0	Message 35476 - Posted: 10 Jun 2018, 14:50:54 UTC - in response to Message 35474. Hello this is maybe an error of communication between the vboxwrapper and virtualbox. ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time BOINC will be notified that it needs to clean up the environment. This is a temporary problem and so this job will be rescheduled for another time. To solve it : Go to the VirtualBox Manager and then File/VB Media Manager/ and in that box you may find some vdi's that need to be removed since they can mess up the new tasks trying to get a slot to use. This is what you do not want to see.....and the good and the bad there Erase all the vdis which have a yellow triangle and keep only the one with a green triangle.Don't touch the others. It will cleanup your environment. It happens sometimes to delete manualy some of them in virtualbox manager , when boinc fails to delete the vdis. ID: 35476 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 39,191	Message 35478 - Posted: 10 Jun 2018, 16:06:54 UTC - in response to Message 35474. asic thoughts. Your computer page shows that you run a 4-core system with 2 GPUs (NVIDIA + INTEL). If your GPUs run on full load, your system needs 2 CPU cores to support the GPUs. Thus you will have only 2 CPU cores left for other work. Your task overview shows that you run ATLAS using a 4-core setup. This may sooner or later cause timing problems. I suggest you may try a 2-core setup for ATLAS. If you do so, you should also use an app_config.xml to avoid errors due to a too low default RAM setting. [pre]<app_config> <app> <name>ATLAS</name> <max_concurrent>1</max_concurrent> <report_results_immediately/> </app> <app_version> <app_name>ATLAS</app_name> <plan_class>vbox64_mt_mcore_atlas</plan_class> <avg_ncpus>2.0</avg_ncpus> <cmdline>--nthreads 2 --memory_size_mb 4800</cmdline> </app_version> </app_config>[/pre] Error 1: "Vboxwrapper lost communication with VirtualBox" "BOINC will be notified that it needs to clean up the environment" This is mostly caused by: - a crash - unclean shutdown - ... All of those causes either that some old files remain in a "slot" folder or that the links to those files are not removed from the VirtualBox control files. You may: 1. stop your BOINC client 2. run your VirtualBox GUI (use the same user that usually runs BOINC!) 3. Remove all VMs that are located in a BOINC slot but not "running" or "paused" 4. restart your BOINC client (better: restart your computer) If this errors appear again, consider to repeat the cleanup and downgrade to the most recent VirtualBox 5.1.x. Error 2: Image files marked with a yellow triangle. This is nasty (I also have lots of them) but not responsible for the errors you notice. Just clean up the list from time to time as described by PHILIPPE. ID: 35478 · Reply Quote

Rabinovitch Send message Joined: 11 May 07 Posts: 23 Credit: 3,631,975 RAC: 0	Message 35512 - Posted: 13 Jun 2018, 15:45:17 UTC Guys, why there are 100% of tasks are ending with errors on this host? ID: 35512 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 35513 - Posted: 13 Jun 2018, 15:56:30 UTC - in response to Message 35512. Guys, why there are 100% of tasks are ending with errors on this host? Do you have CVMFS installed and configured? Checking for CVMFS ls: Ð½ÐµÐ²Ð¾Ð·Ð¼Ð¾Ð¶Ð½Ð¾ Ð¿Ð¾Ð»ÑÑÐ¸ÑÑ Ð´Ð¾ÑÑÑÐ¿ Ðº '/cvmfs/atlas.cern.ch/repo/sw': ÐÐµÑ ÑÐ°ÐºÐ¾Ð³Ð¾ ÑÐ°Ð¹Ð»Ð° Ð¸Ð»Ð¸ ÐºÐ°ÑÐ°Ð»Ð¾Ð³Ð° cvmfs_config doesn't exist, check cvmfs with cmd ls /cvmfs/atlas.cern.ch/repo/sw ls /cvmfs/atlas.cern.ch/repo/sw failed,aborting the jobs ID: 35513 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 39,191	Message 35514 - Posted: 13 Jun 2018, 16:05:13 UTC - in response to Message 35512. like a CVMFS error: [pre]Checking for CVMFS cvmfs_config doesn't exist, check cvmfs with cmd ls /cvmfs/atlas.cern.ch/repo/sw ls /cvmfs/atlas.cern.ch/repo/sw failed,aborting the jobs[/pre] You may check your CVMFS installation. Then run: "sudo cvmfs_config wipecache" "cvmfs_config probe" What is the output of "cvmfs_config probe"? ID: 35514 · Reply Quote

Rabinovitch Send message Joined: 11 May 07 Posts: 23 Credit: 3,631,975 RAC: 0	Message 35524 - Posted: 14 Jun 2018, 9:11:44 UTC - in response to Message 35513. Guys, why there are 100% of tasks are ending with errors on this host? Do you have CVMFS installed and configured? Now I do: eti@DetiPC ~ $ sudo cvmfs_config chksetup OK deti@DetiPC ~ $ cvmfs_config probe Probing /cvmfs/atlas.cern.ch... OK Probing /cvmfs/atlas-condb.cern.ch... OK Probing /cvmfs/grid.cern.ch... OK deti@DetiPC ~ $ Let's see if this helps... ID: 35524 · Reply Quote

gyllic Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,659,192 RAC: 136	Message 35525 - Posted: 14 Jun 2018, 11:47:11 UTC - in response to Message 35524. Last modified: 14 Jun 2018, 12:03:03 UTC cvmfs should now work. Your most recent tasks show that singularity is not installed. If your OS is not SLC6 (which is obviously the case) you also have to install singularity: https://singularity.lbl.gov/ When singularity is working you should finally be good to go. ID: 35525 · Reply Quote

Rabinovitch Send message Joined: 11 May 07 Posts: 23 Credit: 3,631,975 RAC: 0	Message 35530 - Posted: 16 Jun 2018, 4:25:26 UTC - in response to Message 35525. Last modified: 16 Jun 2018, 4:27:24 UTC Your most recent tasks show that singularity is not installed. If your OS is not SLC6 (which is obviously the case) you also have to install singularity: https://singularity.lbl.gov/ When singularity is working you should finally be good to go. Finally it works... What a complicated project to participate! I highly doubt that there will be too much of volunteers ready to perform all that quest with cvmfs and singularity just to help scientists for free... What is SLC6? ID: 35530 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2279 Credit: 178,779,667 RAC: 277	Message 35531 - Posted: 16 Jun 2018, 5:11:12 UTC - in response to Message 35530. SLC6 = Scientific Linux Vers.6.9. CentOS need's also no Singularity Installation as SL69. ID: 35531 · Reply Quote

Gloria Cicconofri Send message Joined: 13 Apr 18 Posts: 9 Credit: 35,148 RAC: 0	Message 36502 - Posted: 19 Aug 2018, 21:54:34 UTC Hi everyone! Sorry to bother, but I can't understand what's happening here. Lately I've been receiving quite a little amount of requests from LHC, and all of them, right after a few hours of computing gave he same exact result: error while computing. I can't really undestand what's wrong. I'm using BOINC for other projects, and none of them gave this problem. Is there a way you can help me fix it? Thanks a lot! ID: 36502 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1261 Credit: 92,820,347 RAC: 110,519	Message 36503 - Posted: 19 Aug 2018, 22:26:03 UTC - in response to Message 36502. Hi everyone! Sorry to bother, but I can't understand what's happening here. Lately I've been receiving quite a little amount of requests from LHC, and all of them, right after a few hours of computing gave he same exact result: error while computing. I can't really undestand what's wrong. I'm using BOINC for other projects, and none of them gave this problem. Is there a way you can help me fix it? Thanks a lot! Your tasks are saying you run out of disc space so maybe try setting your Boinc Manager Options - Computing Preferences - Disc and Memory - Try that and see if it helps ID: 36503 · Reply Quote

Gloria Cicconofri Send message Joined: 13 Apr 18 Posts: 9 Credit: 35,148 RAC: 0	Message 36505 - Posted: 19 Aug 2018, 22:45:50 UTC - in response to Message 36503. Last modified: 19 Aug 2018, 22:46:12 UTC Thanks a lot, as soon as I get a new ATLAS request I'll let you know if it worked. ID: 36505 · Reply Quote

bronco Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0	Message 36506 - Posted: 19 Aug 2018, 23:26:08 UTC - in response to Message 36503. Your tasks are saying you run out of disc space so maybe try setting your Boinc Manager Yes, it is a disk space problem but not the kind of disk space problem you are thinking of. It's the "196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED" error which seems to fool a lot of people. Maybe the solution you offered will work but not likely. Harri Liljeroos explains the cause of this error thoroughly in https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4773&postid=36371#36371. Note it is not a problem of BOINC not having enough disk space assigned in user preferences. The solution comes from deducing why <rsc_disk_bound>xxx</rsc_disk_bound> is being exceeded. Possible reasons are (but not limited to): 1) ATLAS tasks are being pre-empted by other project/tasks and causing an ultra-large snapshot file to be saved in the slot folder 2) old snapshots or other garbage left behind by previous tasks not being deleted 3) combination of 1) and 2) I would try the following: 1) set "no new tasks" for all projects and drain the cache completely 2) delete all the slot folders in the BOINC data folder 3) set "switch between tasks every __ minutes" to a very large value to ensure that ATLAS tasks are not pre-empted 4) do not allow the OS to install updates and reboot the system whenever it wishes 5) install updates manually and give VBox ample time to shutdown running tasks before rebooting ID: 36506 · Reply Quote