Message boards : ATLAS application : Atlas-native App finished in Seconds
Message board moderation

To post messages, you must log in.

AuthorMessage
maeax

Send message
Joined: 2 May 07
Posts: 732
Credit: 27,358,847
RAC: 39,338
Message 37777 - Posted: 18 Jan 2019, 16:44:15 UTC

This Compuer finishing Atlas-native App in a few seconds and get Cobblestones.
Also a lot of crashed tasks:
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10451475&offset=0&show_names=0&state=4&appid=
ID: 37777 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 732
Credit: 27,358,847
RAC: 39,338
Message 37791 - Posted: 19 Jan 2019, 12:15:33 UTC - in response to Message 37777.  

Have a Task with use of 7 Cpu's tested and running successful:
Laufzeit 29 min. 54 sek.
CPU Zeit 2 Stunden 6 min. 9 sek.
Pr├╝fungsstatus G├╝ltig
Punkte 58.22
ID: 37791 · Report as offensive     Reply Quote
BelgianEnthousiast

Send message
Joined: 5 Apr 15
Posts: 18
Credit: 5,676,431
RAC: 12,208
Message 37793 - Posted: 20 Jan 2019, 11:13:52 UTC - in response to Message 37791.  

Hi All,

I'm running into trouble with just any kind of LCH tasks these days.
Initially I was running 5-core applications (Theory, Atlas, and if available CMS, LHCb, etc...) without any issues for now about a year I guess since they became available.

Then I started getting "VM unmanageable" errors on Atlas about 2-3 months ago. Restarting BOINC after a reboot solved this issue, but I had to constantly reboot my system
which is not very handy.

I uninstalled VirtualBox (version 6 at that time) and reinstalled it again to try to solve the "VM unmanageable" error, but to no avail.

I then stopped all LHC WU's, searched for any settings that could provoke an issue.
Just out of curiosity, I then enabled 7-core applications as I thought "maybe 5-core WU's carry a flaw, and 7-core don't have this flaw".

To my horror, all of the tasks errored out with the message : WU aborted, within 30 seconds of starting the WU.
But without any further information. When checking the standard log, I can't even see what the origin of the error is.

I then scaled back to 6-core WU's, still the same issue, after 30 seconds maximum, "WU aborted".

I scaled back to 5-cores. I now even have the same issue on both of them...

FYI. There's nothing wrong with my processor or memory, I tried running WorldCommunityGrid and it runs just fine on 9 cores (out of 12, 2 more cores are assigned to GPUGrid)

I then decided to go drastic and remove VirtualBox and BOINC and reinstall them, but with keeping application settings for now.

I did it, but unfortunately, simply the same errors again on LHC... "WU aborted" after 30 seconds.

Is there a way I can easily enable logging (I see a lot of generic BOINC logging capabilities, but not specific to WU's) ?

Should I remove BOINC & VB completely including existing settings ? and start all over again ?

Would appreciate your help as it's a pity wasting so much time (struggling for nearly 2 months now) ! :-)
ID: 37793 · Report as offensive     Reply Quote
Gunde

Send message
Joined: 9 Jan 15
Posts: 37
Credit: 272,432,243
RAC: 531,698
Message 37794 - Posted: 20 Jan 2019, 12:38:43 UTC - in response to Message 37793.  

When i check log from last task today i found this.
VBoxManage.exe: error: VT-x is disabled in the BIOS for all CPU modes (VERR_VMX_MSR_ALL_VMX_DISABLED)

check your BIOS if VT-x is enabled.
ID: 37794 · Report as offensive     Reply Quote
BelgianEnthousiast

Send message
Joined: 5 Apr 15
Posts: 18
Credit: 5,676,431
RAC: 12,208
Message 37905 - Posted: 1 Feb 2019, 16:53:11 UTC - in response to Message 37794.  

Hi Gunde,

It was a combination of factors in fact.

I upgraded the BIOS to the latest version to patch an Intel security bug, but at the same time, it wreaked havoc in the BIOS settings,
effectively resetting the VT-X.

In parallel, there was something wrong with the interaction between BOINC 7.14.2 and Virtual Box 6.0.4 which got screwed up by
me installing other software. (silly me...)

And in the end, BOINC seems to have a good memory and retained that VT-X was disabled in the .xml file in the BOINC_DATA directory (see details in Yeti's check list).

So, in the end, flashing back latest working version of the BIOS, enabling VT-X again;
removing the software I installed;
removing BOINC & Virtual Box 6
reinstalling BOINC 7.14.2, VirtualBox 5.2.26;
modifying BOINC's memory in the xml file

and finally it works again like a charm ! Only pity I lost nearly an entire month to troubleshoot the darn thing...

Which makes me pose the question : does Atlas/LHC really need VM's to run ? Why can WorldComGrid, ClimatePrediction or Rosetta run without it ?
(just asking out of pure ignorance, apologies for that !)

Nice weekend to all !

B.E.
ID: 37905 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,194,277
RAC: 10,575
Message 37907 - Posted: 1 Feb 2019, 22:10:29 UTC - in response to Message 37905.  

Which makes me pose the question : does Atlas/LHC really need VM's to run ?


The apps need to run on a platform that has specific capabilities readily available on Linux as standard features or easy addons. Not saying those same capabilities couldn't be built into or added onto Windows or OSX but the experts believe it's less work to virtualize the environment and suffer the performance hit. Can't say I disagree with them.

Why can WorldComGrid, ClimatePrediction or Rosetta run without it ?

They feel that given their needs they can get statistically valid results from apps skillfully designed to compile and run on multiple platforms.
ID: 37907 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 739
Credit: 6,027,121
RAC: 1,035
Message 37911 - Posted: 2 Feb 2019, 10:13:02 UTC - in response to Message 37905.  

Which makes me pose the question : does Atlas/LHC really need VM's to run ?

BOINC is an important, but not a major partner to run ATLAS jobs.
Most partners are scientific institutions running hosts with Scientific Linux, so the Scientists develop their applications only for Linux and not for other OS's.
ID: 37911 · Report as offensive     Reply Quote

Message boards : ATLAS application : Atlas-native App finished in Seconds


©2019 CERN