Message boards :
ATLAS application :
ATLAS issues
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
All Atlas tasks on my main Linux host produce a HITS file, contrarily to th Windows 10 PC. Tullio |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
All Atlas tasks on my main Linux host produce a HITS file, contrarily to th Windows 10 PC. HITS file reporting in stderr output from ATLAS Vbox tasks is totally unreliable. Here is one of your recently completed ATLAS VBox tasks... https://lhcathome.cern.ch/lhcathome/result.php?resultid=205493869. The stferr output says nothing about HITS. It doesn't say HITS file successfully produced nor does it say failure to produce HITS. However if you check the bigpanda report for that task at https://bigpanda.cern.ch/job?pandaid=3993741609 you see that it did produce a HITS file. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
Thanks bronco. Atlas tasks ad SixTrack are the only one running both on a Windows 10 PC and two Linux hosts. All the other fail with condor job not running. Yet the PC has 22 GB RAM and 4 cores (but the Windows Task Manager says 2 cores and 4 logical processors) while the Linux boxen have only 2 cores and 8 GB RAM. I am running 2 core tasks on the Windows 10 PC and one core tasks only on the Linux boxen, with SuSE Leap 42.3 and 15.0. Why the number version of SuSE leap went back from 42.3 to 15.0 I don't know but I suspect it has something to do with SuSE Linux Enterprise System which is now 15.0, being optimized to connect to Microsoft Azure Cloud, which I won't certainly do. Tullio |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 169 |
Yes Tullio, OpenSuse 15.0 is the same Kernel as the Enterprise Version. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I am running on it two Einstein@home Continuous Gravitational Wave Search tasks which,according to Bruce Allen,chief of Einstein@home and his wife Maria Alessandra Papa, lead scientist for gravitational wave searches, I should not have received, but I got them and am running them. Tullio |
Send message Joined: 18 Dec 15 Posts: 1824 Credit: 119,083,784 RAC: 18,791 |
HITS file reporting in stderr output from ATLAS Vbox tasks is totally unreliable. Here is one of your recently completed ATLAS VBox tasks... https://lhcathome.cern.ch/lhcathome/result.php?resultid=205493869. The stferr output says nothing about HITS. It doesn't say HITS file successfully produced nor does it say failure to produce HITS. However if you check the bigpanda report for that task at https://bigpanda.cern.ch/job?pandaid=3993741609 you see that it did produce a HITS file.I have made same experience many times. So, yes, whatever stderr is saying may not mean a thing. Maybe it depends on the BOINC version and/or the VBOX version whether stderr shows correctly or not. Because in my case, a HITS file is shown everytime now in the stderr since I updated BOINC and VBOX. In any case, one can always check back in bigpanda - what's shown there is fact. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
My BOINC on the Windows PC is 7.12.1 and VBox 5.2.18. The Linux host has BOINC 7.8.3 and VBox 5.2.16. It always reports HITS file in the stderr.txt file. Tullio |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
My latest Windows task reports a HITS file. Tullio |
Send message Joined: 24 Oct 04 Posts: 1181 Credit: 54,887,670 RAC: 2,140 |
My latest Windows task reports a HITS file. https://lhcathome.cern.ch/lhcathome/result.php?resultid=206025343 |
Send message Joined: 18 Dec 15 Posts: 1824 Credit: 119,083,784 RAC: 18,791 |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=206025343seeing this also here, I am wondering what the notice 2018-08-26 03:58:46 (7576): Error creating VirtualBox instance! rc = 0x80004002 right at the beginning of the stderr means. I have this in all of my ATLAS tasks, regardless of what VB version I have being using during the past years. |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 169 |
|
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
On the 14 August issue of "Nature" there is an article about the Atlas strategy. Al least some in the Atlas Cooperation group want to search not only already simulated events,like those processed by BOINC users but all kind of events. If this on one side will require more processing power on the other hand may diminish the importance of what we are doing. I haven't read any comment on this subject here. Tullio |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4620&postid=35483#35483 translation: do not allow your spirit to be caught up in the madness of VBox, run ATLAS native and rejoice on the path of least resistance |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
On the 14 August issue of "Nature" there is an article about the Atlas strategy. Al least some in the Atlas Cooperation group want to search not only already simulated events,like those processed by BOINC users but all kind of events. If this on one side will require more processing power on the other hand may diminish the importance of what we are doing. I haven't read any comment on this subject here. Here is the link: https://www.nature.com/articles/d41586-018-05972-7 But why can't we help out with AI using BOINC? On GPUGrid, their Quantum Chemistry project uses BOINC to train their system for machine learning. They can then use the results to run on GPUs in-house. They are estimating energies and forces, but I don't know why it could not be applied to other areas. http://www.gpugrid.net/forum_thread.php?id=4707&nowrap=true#49606 |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I am running GPUGRID, both the CPU tasks and GPU tasks on a Linux box. On my Windows 10 PC with a GTX 1050 Ti not overclocked it gets too hot (80 C) and the computing stops. Tullio |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 169 |
translation: do not allow your spirit to be caught up in the madness of VBox, run ATLAS native and rejoice on the path of least resistance Without vbox, Atlas and Windows have a problem. Tullio, have OpenSuse 15.0 now active. WCG..., but will testing Atlas native! |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
OK, let me know if you succeed running Atlas native on Leap 15.0. It is running long Einstein@home tasks very well. Tullio |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I am running an Atlas one core task on my HP Linux laptop with an AMD E-450 CPU. Strangely enough, the "top" command shows a CPU usage which can reach 117% for VBoxHeadless. Tullio |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
Not so strange. It happens here too. In fact if you watch close enough and for long enough you'll see that it happens on every host. It's not an indication that something has gone awry. It's because CPU cycles are hard to count (and account for). Top doesn't see everything and if you do "man top" and pore over the minutiae you'll see that top isn't even the accountant. Top is merely the poor SOB that collects the reports from other accountants and tries to assemble all the details into a sensible report for users. The final report is close but it's not 100% accurate. Also, tasks sometimes run on more than 1 core briefly even though they are "assigned" a single core. The whole notion of core assignment and core affinity is far more complicated than what the average user realizes. In the BOINC world we bandy the term "assigned cores" around as if it's written in stone but in reality it's just a convenient concept that assists BOINC devs in creating more/less accurate algorithms and code for estimating how many tasks can be downloaded and completed before deadline. |
Send message Joined: 30 Aug 18 Posts: 3 Credit: 1,002 RAC: 0 |
I have recently started to devote some time of my computer to LHC Atlas jobs. Most of the jobs have failed, and the indication comes after more than 10 hours of processing time have been given to those. It would have been better if the code has in it to figure out if things are going wrong and give out proper messages so the issue can be resolved, and the job not just terminating/going invalid. Recent example: Job at start showed 16 hours, and then it ran for nearly 2 days, and at the end what I get is a big ZERO with Validate error message. 1) There isn't any proper indication if anything is going wrong. I don't understand what failed in the validation, but I doubt every sub-job had issues. 2) The completion time estimate should be somewhat accurate. https://lhcathome.cern.ch/lhcathome/result.php?resultid=206483510 - Can someone take a look at let me know what exactly failed here? Thanks. |
©2025 CERN