Message boards : ATLAS application : ATLAS issues
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 36554 - Posted: 25 Aug 2018, 7:47:02 UTC

All Atlas tasks on my main Linux host produce a HITS file, contrarily to th Windows 10 PC.
Tullio
ID: 36554 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36555 - Posted: 25 Aug 2018, 9:14:36 UTC - in response to Message 36554.  

All Atlas tasks on my main Linux host produce a HITS file, contrarily to th Windows 10 PC.
Tullio

HITS file reporting in stderr output from ATLAS Vbox tasks is totally unreliable. Here is one of your recently completed ATLAS VBox tasks... https://lhcathome.cern.ch/lhcathome/result.php?resultid=205493869. The stferr output says nothing about HITS. It doesn't say HITS file successfully produced nor does it say failure to produce HITS. However if you check the bigpanda report for that task at https://bigpanda.cern.ch/job?pandaid=3993741609 you see that it did produce a HITS file.
ID: 36555 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 36556 - Posted: 25 Aug 2018, 12:59:18 UTC - in response to Message 36555.  
Last modified: 25 Aug 2018, 13:00:16 UTC

Thanks bronco. Atlas tasks ad SixTrack are the only one running both on a Windows 10 PC and two Linux hosts. All the other fail with condor job not running. Yet the PC has 22 GB RAM and 4 cores (but the Windows Task Manager says 2 cores and 4 logical processors) while the Linux boxen have only 2 cores and 8 GB RAM. I am running 2 core tasks on the Windows 10 PC and one core tasks only on the Linux boxen, with SuSE Leap 42.3 and 15.0. Why the number version of SuSE leap went back from 42.3 to 15.0 I don't know but I suspect it has something to do with SuSE Linux Enterprise System which is now 15.0, being optimized to connect to Microsoft Azure Cloud, which I won't certainly do.
Tullio
ID: 36556 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,116,273
RAC: 104,623
Message 36557 - Posted: 25 Aug 2018, 15:41:02 UTC - in response to Message 36556.  

Yes Tullio,
OpenSuse 15.0 is the same Kernel as the Enterprise Version.
ID: 36557 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 36558 - Posted: 25 Aug 2018, 15:48:16 UTC - in response to Message 36557.  

I am running on it two Einstein@home Continuous Gravitational Wave Search tasks which,according to Bruce Allen,chief of Einstein@home and his wife Maria Alessandra Papa, lead scientist for gravitational wave searches, I should not have received, but I got them and am running them.
Tullio
ID: 36558 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,374,896
RAC: 102,120
Message 36559 - Posted: 25 Aug 2018, 19:24:44 UTC - in response to Message 36555.  

HITS file reporting in stderr output from ATLAS Vbox tasks is totally unreliable. Here is one of your recently completed ATLAS VBox tasks... https://lhcathome.cern.ch/lhcathome/result.php?resultid=205493869. The stferr output says nothing about HITS. It doesn't say HITS file successfully produced nor does it say failure to produce HITS. However if you check the bigpanda report for that task at https://bigpanda.cern.ch/job?pandaid=3993741609 you see that it did produce a HITS file.
I have made same experience many times. So, yes, whatever stderr is saying may not mean a thing.
Maybe it depends on the BOINC version and/or the VBOX version whether stderr shows correctly or not.
Because in my case, a HITS file is shown everytime now in the stderr since I updated BOINC and VBOX.
In any case, one can always check back in bigpanda - what's shown there is fact.
ID: 36559 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 36560 - Posted: 26 Aug 2018, 1:55:14 UTC - in response to Message 36559.  

My BOINC on the Windows PC is 7.12.1 and VBox 5.2.18. The Linux host has BOINC 7.8.3 and VBox 5.2.16. It always reports HITS file in the stderr.txt file.
Tullio
ID: 36560 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 36561 - Posted: 26 Aug 2018, 7:16:50 UTC

My latest Windows task reports a HITS file.
Tullio
ID: 36561 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,501,728
RAC: 4,157
Message 36562 - Posted: 26 Aug 2018, 7:43:59 UTC - in response to Message 36561.  

My latest Windows task reports a HITS file.
Tullio


https://lhcathome.cern.ch/lhcathome/result.php?resultid=206025343
ID: 36562 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,374,896
RAC: 102,120
Message 36563 - Posted: 26 Aug 2018, 11:59:53 UTC - in response to Message 36562.  

https://lhcathome.cern.ch/lhcathome/result.php?resultid=206025343
seeing this also here, I am wondering what the notice

2018-08-26 03:58:46 (7576): Error creating VirtualBox instance! rc = 0x80004002

right at the beginning of the stderr means.
I have this in all of my ATLAS tasks, regardless of what VB version I have being using during the past years.
ID: 36563 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,116,273
RAC: 104,623
Message 36564 - Posted: 26 Aug 2018, 12:35:34 UTC - in response to Message 36563.  

ID: 36564 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 36565 - Posted: 26 Aug 2018, 16:19:49 UTC

On the 14 August issue of "Nature" there is an article about the Atlas strategy. Al least some in the Atlas Cooperation group want to search not only already simulated events,like those processed by BOINC users but all kind of events. If this on one side will require more processing power on the other hand may diminish the importance of what we are doing. I haven't read any comment on this subject here.
Tullio
ID: 36565 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36566 - Posted: 26 Aug 2018, 16:44:49 UTC - in response to Message 36564.  

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4620&postid=35483#35483

translation: do not allow your spirit to be caught up in the madness of VBox, run ATLAS native and rejoice on the path of least resistance
ID: 36566 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 36567 - Posted: 26 Aug 2018, 18:18:51 UTC - in response to Message 36565.  

On the 14 August issue of "Nature" there is an article about the Atlas strategy. Al least some in the Atlas Cooperation group want to search not only already simulated events,like those processed by BOINC users but all kind of events. If this on one side will require more processing power on the other hand may diminish the importance of what we are doing. I haven't read any comment on this subject here.
Tullio

Here is the link: https://www.nature.com/articles/d41586-018-05972-7

But why can't we help out with AI using BOINC? On GPUGrid, their Quantum Chemistry project uses BOINC to train their system for machine learning. They can then use the results to run on GPUs in-house. They are estimating energies and forces, but I don't know why it could not be applied to other areas.
http://www.gpugrid.net/forum_thread.php?id=4707&nowrap=true#49606
ID: 36567 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 36568 - Posted: 27 Aug 2018, 8:10:49 UTC - in response to Message 36567.  

I am running GPUGRID, both the CPU tasks and GPU tasks on a Linux box. On my Windows 10 PC with a GTX 1050 Ti not overclocked it gets too hot (80 C) and the computing stops.
Tullio
ID: 36568 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,116,273
RAC: 104,623
Message 36569 - Posted: 27 Aug 2018, 8:26:37 UTC - in response to Message 36566.  

translation: do not allow your spirit to be caught up in the madness of VBox, run ATLAS native and rejoice on the path of least resistance

Without vbox, Atlas and Windows have a problem.

Tullio,
have OpenSuse 15.0 now active. WCG..., but will testing Atlas native!
ID: 36569 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 36570 - Posted: 27 Aug 2018, 10:19:37 UTC - in response to Message 36569.  

OK, let me know if you succeed running Atlas native on Leap 15.0. It is running long Einstein@home tasks very well.
Tullio
ID: 36570 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 36605 - Posted: 1 Sep 2018, 23:33:33 UTC

I am running an Atlas one core task on my HP Linux laptop with an AMD E-450 CPU. Strangely enough, the "top" command shows a CPU usage which can reach 117% for VBoxHeadless.
Tullio
ID: 36605 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36606 - Posted: 2 Sep 2018, 2:20:41 UTC - in response to Message 36605.  

Not so strange. It happens here too. In fact if you watch close enough and for long enough you'll see that it happens on every host. It's not an indication that something has gone awry. It's because CPU cycles are hard to count (and account for). Top doesn't see everything and if you do "man top" and pore over the minutiae you'll see that top isn't even the accountant. Top is merely the poor SOB that collects the reports from other accountants and tries to assemble all the details into a sensible report for users. The final report is close but it's not 100% accurate. Also, tasks sometimes run on more than 1 core briefly even though they are "assigned" a single core.

The whole notion of core assignment and core affinity is far more complicated than what the average user realizes. In the BOINC world we bandy the term "assigned cores" around as if it's written in stone but in reality it's just a convenient concept that assists BOINC devs in creating more/less accurate algorithms and code for estimating how many tasks can be downloaded and completed before deadline.
ID: 36606 · Report as offensive     Reply Quote
cIsCo

Send message
Joined: 30 Aug 18
Posts: 3
Credit: 1,002
RAC: 0
Message 36706 - Posted: 14 Sep 2018, 11:50:32 UTC

I have recently started to devote some time of my computer to LHC Atlas jobs. Most of the jobs have failed, and the indication comes after more than 10 hours of processing time have been given to those. It would have been better if the code has in it to figure out if things are going wrong and give out proper messages so the issue can be resolved, and the job not just terminating/going invalid.

Recent example: Job at start showed 16 hours, and then it ran for nearly 2 days, and at the end what I get is a big ZERO with Validate error message.
1) There isn't any proper indication if anything is going wrong. I don't understand what failed in the validation, but I doubt every sub-job had issues.
2) The completion time estimate should be somewhat accurate.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=206483510 - Can someone take a look at let me know what exactly failed here?

Thanks.
ID: 36706 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : ATLAS application : ATLAS issues


©2024 CERN