Message boards : ATLAS application : Successful Atlas
Message board moderation

To post messages, you must log in.

AuthorMessage
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 30736 - Posted: 11 Jun 2017, 10:35:33 UTC
Last modified: 11 Jun 2017, 10:36:32 UTC

An Atlas multicore task has successfully completed on my Linux box while all other LHC tasks, except Sixtrack, fail both on the Windows 10 PC and the Linux SUN WS. Why?
Tullio
ID: 30736 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1369
Credit: 9,128,790
RAC: 3,905
Message 30737 - Posted: 11 Jun 2017, 12:19:58 UTC

It looks like successful, but it wasn't.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=145020998

It did not return a HITS-file and the 4200MB RAM for the dual core VM was/is too low.
ID: 30737 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1747
Credit: 115,142,253
RAC: 90,812
Message 30738 - Posted: 11 Jun 2017, 13:30:17 UTC - in response to Message 30737.  

... and the 4200MB RAM for the dual core VM was/is too low.

when I was crunching 2-core ATLAS, I set 5000MB in the app_config.xml (whereas some 4500/4600 could be sufficient)
ID: 30738 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1156
Credit: 52,481,853
RAC: 59,570
Message 30741 - Posted: 11 Jun 2017, 22:17:29 UTC
Last modified: 11 Jun 2017, 22:20:37 UTC

I have run several hundred of these and when I tried running a 6-core task on a pc with only 8GB ram I had to be careful and not use the pc for anything else or the tasks would crash.

With that one I also had some vLHC-devs or LHC's running (it is an 8-core) so in order to have enough memory when I started a new 6-core Atlas I would have to suspend the other tasks and reboot so I had most of those 8GB ram or it would not start properly and just run until it decided to crash (which could be over an hour)

Didn't have to do that even running 8-core Atlas tasks on the hosts with 16GB-24GB ram

My main problem with these is you need a ISP speed that you can trust to do the initial start-up or they will run and eventually crash Invalid so I tended to watch mine on the LOG just to make sure instead of just wasting time.
Volunteer Mad Scientist For Life
ID: 30741 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 30744 - Posted: 12 Jun 2017, 7:35:36 UTC
Last modified: 12 Jun 2017, 8:13:53 UTC

My HP Linux laptop has 8 GB RAM. My Windows 10 PC has 24 GB RAM. My SUN WS has 8 GB RAM and is running Einstein@home tasks on both CPU and its GTX 750 Ti nVidia board. No errors and a whooping credit. Now a second Atlas task is running on 2 CPUs on the laptop, where all other LHC tasks fail miserably, save Sixtrack.All on default conditions, no tricks.
Tullio
Second task completed and getting me some credits, I need them.
ID: 30744 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1747
Credit: 115,142,253
RAC: 90,812
Message 30749 - Posted: 12 Jun 2017, 8:42:00 UTC - in response to Message 30744.  

... Now a second Atlas task is running on 2 CPUs on the laptop, where all other LHC tasks fail miserably

Also CMS fails? With 8 GB RAM, you could probably run 4 CMS tasks simultaneously (provided you have enough processor cores available)
ID: 30749 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 30758 - Posted: 12 Jun 2017, 11:22:44 UTC - in response to Message 30749.  

All LHC tasks fail, save Sixtrack and Atlas tasks. CP says a file is missing. Then why are they validated?
Tullio
ID: 30758 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1369
Credit: 9,128,790
RAC: 3,905
Message 30759 - Posted: 12 Jun 2017, 13:54:28 UTC - in response to Message 30758.  

All LHC tasks fail, save Sixtrack and Atlas tasks. CP says a file is missing. Then why are they validated?
Tullio

That's ATLAS's generosity for spending CPU-cycles.

You could try setting single core in your preferences or use an app_config.xml and set at least 4400MB for an ATLAS dual-core VM.
ID: 30759 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 30760 - Posted: 12 Jun 2017, 16:26:10 UTC - in response to Message 30759.  

Theory Simulation tasks use 630 MB and fail constantly. At least Atlas gives me some credits.I've been running Test4Theory@home tasks since November 2010 and they all worked before "consolidation".
Tullio
ID: 30760 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 30763 - Posted: 12 Jun 2017, 20:31:47 UTC - in response to Message 30760.  

Tullio , You keep a mystery...
Apparently , no one is able to help you...
But to maximize the chance someone gets the trick , i record all the informations , given in your log , with this post.

the messages in your logs seem to be different from one theory task to another but there are some details interesting.

With error :206 (0x000000CE) EXIT_INIT_FAILURE

2017-06-12 10:13:42 (17522): Guest Log: [INFO] Reading volunteer information
2017-06-12 10:13:45 (17522): Guest Log: [INFO] Volunteer: tullio (96166) Host: 10454176
2017-06-12 10:13:45 (17522): Guest Log: [INFO] VMID: 11a6f992-647c-415d-969d-d7d3ca99f9ef
2017-06-12 10:13:48 (17522): Guest Log: [INFO] Using weak account key.
2017-06-12 10:13:48 (17522): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2017-06-12 10:14:06 (17522): Guest Log: [INFO] Theory application starting. Check log files.
2017-06-12 10:14:13 (17522): Guest Log: [DEBUG] HTCondor ping
2017-06-12 10:14:15 (17522): Guest Log: [DEBUG] 0
2017-06-12 10:57:18 (17522): Guest Log: [ERROR] Condor exited after 2587s without running a job.
2017-06-12 10:57:18 (17522): Guest Log: [INFO] Shutting Down.
2017-06-12 10:57:18 (17522): VM Completion File Detected.
2017-06-12 10:57:18 (17522): VM Completion Message: Condor exited after 2587s without running a job.


Documentation on weak account is here.

With error : 194 (0x000000C2) EXIT_ABORTED_BY_CLIENT
the previous message disappear but there are :

2017-06-12 10:59:28 (12763): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds))
2017-06-12 10:59:28 (12763): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2017-06-12 10:59:28 (12763): Guest Log: BIOS: Booting from Hard Disk...
2017-06-12 10:59:31 (12763): Guest Log: BIOS: KBD: unsupported int 16h function 03
2017-06-12 10:59:31 (12763): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000
2017-06-12 11:09:22 (12763): VM Heartbeat file specified, but missing.
2017-06-12 11:09:22 (12763): VM Heartbeat file specified, but missing file system status. (errno = '2')
2017-06-12 11:09:22 (12763): Capturing screenshot.
2017-06-12 11:09:23 (12763): Screenshot completed.
2017-06-12 11:09:23 (12763): Powering off VM.


and

Command: VBoxManage -q showvminfo "boinc_7a8b17380ef80d34" --machinereadable
Exit Code: -2135228415
Output:
VBoxManage: error: Could not find a registered machine named 'boinc_7a8b17380ef80d34'
VBoxManage: error: Details: code VBOX_E_OBJECT_NOT_FOUND (0x80bb0001), component VirtualBoxWrap, interface IVirtualBox, callee nsISupports
VBoxManage: error: Context: "FindMachine(Bstr(VMNameOrUuid).raw(), machine.asOutParam())" at line 2781 of file VBoxManageInfo.cpp


and

2017-06-12 11:09:22 (12763):
Command: VBoxManage -q controlvm "boinc_7a8b17380ef80d34" keyboardputscancode 0x39
Exit Code: 0
Output:
VBoxManage: error: Error: '0x39' is not a hex byte!


some docs are here for the command line controlvm :
VBoxManage controlvm <vm> keyboardputscancode <hex> [<hex>...] Sends commands using keycodes to the VM. Keycodes are documented in the public domain, e.g. http://www.win.tue.nl/~aeb/linux/kbd/scancodes-1.html

It speaks about translation of keyboard key but it's rather difficult to understand.
----------------------------------------------------------------------------------------------------------------------------------------------------------------
To give us more details :
Why did you have the need to use a weak account in the past?
Do you still need it in the present ?
Are you sure this is since the consolidation , or did you join since the consolidation another project which may interfere with LHC project (using another virtualizer...)

I'm not able to help you but maybe someone else may do it if he gathers all the pieces of information , given here and have encountered a similar case in his personal experience.

Maybe chance for you...
ID: 30763 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1156
Credit: 52,481,853
RAC: 59,570
Message 30764 - Posted: 12 Jun 2017, 23:51:08 UTC

Well Tullio I don't run any linux so I don't know if the VB is just having the problem with your OS but looking at the Stderr on your failed Theory tasks it starts out OK but after about an hour the VB is *paused* for about an hour and then tries to start *running* again but after running several more hours (they tend to do that even if they are not actually running a *Job*)

THEN it says
Guest Log: [ERROR] Condor exited after 43809s without running a job.
2017-06-12 09:11:28 (24691): Guest Log: [INFO] Shutting Down.
2017-06-12 09:11:28 (24691): VM Completion File Detected.
2017-06-12 09:11:28 (24691): VM Completion Message: Condor exited after 43809s without running a job.


And it then shuts down giving you the Invalid

These are not the same as what we were running at the old Atlas of T4T

Here is an example of a Valid and Invalid task

Your Invalid task

A Valid task on Windows 10

And of course these tasks are nothing like the Einstein CPU or GPU tasks.

It is a 2-core with about 8GB ram......do you run 2 tasks at the same time or just one?
Your CPU is almost 6 years old.

If it was a Windows 10 OS you could watch the Task Manager to see what the Memory and CPU are doing but I don't know what you have with linux OS
Volunteer Mad Scientist For Life
ID: 30764 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1747
Credit: 115,142,253
RAC: 90,812
Message 30765 - Posted: 13 Jun 2017, 5:47:59 UTC - in response to Message 30764.  

Your CPU is almost 6 years old.

You are saying that a 6 years old CPU has problems to run such tasks?
ID: 30765 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1747
Credit: 115,142,253
RAC: 90,812
Message 30766 - Posted: 13 Jun 2017, 8:03:53 UTC - in response to Message 30765.  

Your CPU is almost 6 years old.

You are saying that a 6 years old CPU has problems to run such tasks?

I am just putting this question as the Intel® Core™2 Quad Q9550 (4-core) in one of my hosts is 9 years old, and I am successfully crunching 1 GPUGRID task, 2 CMS tasks, 1 WCG task simultaneously.
ID: 30766 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 30768 - Posted: 13 Jun 2017, 16:53:06 UTC - in response to Message 30766.  

@ Tullio :

There is maybe another solution , easier to save your situation.

Instead of looking in your host , change your account.

1°) Remove your LHC project from Boinc client.
2°) Add a new project "LHC" in your boinc client , using this project adress :
https://lhcathome.cern.ch/lhcathome/
And then create a new account
3°) Select the apps in the new web preference account and ask for job.

If the tasks succeed then you know this is your old account the problem , and not your host.
If they fail , this is your computer the problem and not your old account.

After this test ,you may ask the site admins to merge the old account with the new one and restore your old credits in the new account (But ask them before)...
ID: 30768 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1156
Credit: 52,481,853
RAC: 59,570
Message 30770 - Posted: 13 Jun 2017, 18:20:05 UTC - in response to Message 30766.  

Your CPU is almost 6 years old.

You are saying that a 6 years old CPU has problems to run such tasks?

I am just putting this question as the Intel® Core™2 Quad Q9550 (4-core) in one of my hosts is 9 years old, and I am successfully crunching 1 GPUGRID task, 2 CMS tasks, 1 WCG task simultaneously.


His is a AMD E-450 APU 2-core and yours is a Intel 2.67GHz Quad-core
Big difference.
And of course the proper URL is supposed to be used here.
https://lhcathome.cern.ch/lhcathome/

I have 9 computers here at home from the XP Pro 3-core Phenom that has been here since 2004 to the three 8-core Intel i7 3770's with Win 10

I have never been to the WCG site and the only GPU tasks I run are the Einstein GPU's and the current ones use a CPU core and the CMS tasks use more memory than a Theory task but I have run hundreds of those at vLHC-dev with the X2 core tasks so I run 4 X2 on the four 8-core computers since I have 16GB-24GB ram on those (well except the 8-core laptop I have since it only has 8GB ram right now) but those 2-core CMS tasks will not run with 4GB ram if you try to run 2 at the same time.
Volunteer Mad Scientist For Life
ID: 30770 · Report as offensive     Reply Quote

Message boards : ATLAS application : Successful Atlas


©2024 CERN