81) Message boards : Theory Application : Job size - download (Message 28492)
Posted 14 Jan 2017 by Luigi R.
Post:
As far as I can tell your machine is 8 cores with 8GB of RAM. From the memory perspective 2 Theory tasks are equivalent to 1 CMS task. When starting to run multiple VM tasks on a machine, start small and experiment by slowly increasing what you are running. Always start with a Theory task. If that works then it suggests there are no fundamental issues. Then try 1 CMS before trying 1 Theory and 1 CMS together. It has been mentioned by others that VM starts should be staged.

I think there are no issues on this machine. There are moments while 8 VMs are correctly running. My 24GB of ram are enough for 8 CMS tasks as well.


Today I'm experiencing many errors: 206 (0x000000CE) EXIT_INIT_FAILURE.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=112071733
[ERROR] Condor exited after 686s without running a job.


Sorry if I sound repetitive, but I see a bandwidth problem.
My host downloaded >2GB in 1.5 hours.

I will try to disable CMS tasks and run only 4 Theory tasks to see if things improve.
Multicore VMs would be good.
82) Message boards : Theory Application : Job size - download (Message 28485)
Posted 13 Jan 2017 by Luigi R.
Post:
I have 24GB of RAM though.
83) Message boards : Theory Application : Job size - download (Message 28482)
Posted 13 Jan 2017 by Luigi R.
Post:
I tried to suspend (without leaving in memory) and resume it, but the same error occurred after the VM completed startup. Then I aborted it. The other tasks are 'gracefully' running. Now I have 8/8 VMs running.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=111877625
84) Message boards : ATLAS application : Small number of test tasks (Message 28480)
Posted 13 Jan 2017 by Luigi R.
Post:
Validate error: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=53646494
85) Message boards : Sixtrack Application : Getting NO sixtrack WUs even tho my prefs ARE to accept them & test WUs AND serv status keeps saying there are THOUSANDS of Sixtrack WUs to send?? (Message 28478)
Posted 13 Jan 2017 by Luigi R.
Post:
Hello Life ... oPEA,


according to me this is a common problem from Sixtrack project. Hundreds of available tasks are not much to feed all LHC hosts (yours too). Server status is not reliable as much as you think because what you see is an hourly update. That handful of tasks is sent in a couple of minutes. So one believes there are a lot of tasks, but the most of time there are 0 tasks ready to send.

I've experienced a strange behaviour some time ago. If you set 10/10 days of additional work and your host gets a couple of hundreds of tasks, then it will go on getting new work because it will be requesting new task every time it will contact server to report finished ones. So the probability of getting new work is good.
86) Message boards : ATLAS application : Small number of test tasks (Message 28474)
Posted 13 Jan 2017 by Luigi R.
Post:
Ok, I will see if I could run some tasks. Do you need a feedback? Only negative ones?
87) Message boards : Theory Application : Job size - download (Message 28472)
Posted 13 Jan 2017 by Luigi R.
Post:
Now I have 5 VMs running and 3 idling (2 CMS and 1 Theory).


Edit: After 20 minutes 2-3 VMs running.
Maybe should I try to limit VMs number to see if I can get 1-2-3-etc... VMs running all the time?

Edit2: After another 5 minutes 4 VMs running.
88) Message boards : Theory Application : Job size - download (Message 28471)
Posted 13 Jan 2017 by Luigi R.
Post:
Done! It's better. I have 6 VMs running and 2 VMs (1 CMS and 1 Theory) idling.

Processes list


CMS VM idling (process 5770) (elapsed time: 45 minutes)


Theory VM idling (process 23721) (elapsed time: 49 minutes)
89) Message boards : Theory Application : Job size - download (Message 28468)
Posted 13 Jan 2017 by Luigi R.
Post:
running VM


idling VM
90) Message boards : Theory Application : Job size - download (Message 28467)
Posted 13 Jan 2017 by Luigi R.
Post:
1MB per job doesn't seem too much.

So I don't understand why I have 1 VM running e 7 VMs idling today, 0 running yesterday and 8 running two days ago.

stderr.txt idling today

2017-01-13 12:44:05 (3239): vboxwrapper (7.7.26196): starting
2017-01-13 12:44:05 (3239): Feature: Checkpoint interval offset (474 seconds)
2017-01-13 12:44:05 (3239): Detected: VirtualBox VboxManage Interface (Version: 5.0.26)
2017-01-13 12:44:05 (3239): Detected: Minimum checkpoint interval (600.000000 seconds)
2017-01-13 12:44:05 (3239): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2017-01-13 12:44:05 (3239): Starting VM. (boinc_33a2224c153eb7ca, slot#6)
2017-01-13 12:44:15 (3239): Successfully started VM. (PID = '3970')
2017-01-13 12:44:15 (3239): Reporting VM Process ID to BOINC.
2017-01-13 12:44:15 (3239): VM state change detected. (old = 'poweroff', new = 'running')
2017-01-13 12:44:15 (3239): Detected: Web Application Enabled (http://localhost:56077)
2017-01-13 12:44:15 (3239): Detected: Remote Desktop Enabled (localhost:37732)
2017-01-13 12:44:15 (3239): Status Report: Job Duration: '64800.000000'
2017-01-13 12:44:15 (3239): Status Report: Elapsed Time: '26987.109348'
2017-01-13 12:44:15 (3239): Status Report: CPU Time: '13027.730000'
2017-01-13 12:44:15 (3239): Preference change detected
2017-01-13 12:44:15 (3239): Setting CPU throttle for VM. (100%)
2017-01-13 12:44:15 (3239): Setting network throttle for VM. (80KB)
2017-01-13 12:44:15 (3239): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 600 seconds) or (Vbox_job.xml: 600 seconds))


stderr.txt running today

2017-01-13 12:44:05 (3240): vboxwrapper (7.7.26196): starting
2017-01-13 12:44:05 (3240): Feature: Checkpoint interval offset (327 seconds)
2017-01-13 12:44:05 (3240): Detected: VirtualBox VboxManage Interface (Version: 5.0.26)
2017-01-13 12:44:05 (3240): Detected: Minimum checkpoint interval (600.000000 seconds)
2017-01-13 12:44:05 (3240): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2017-01-13 12:44:05 (3240): Starting VM. (boinc_248c1324b9ac7c9c, slot#5)
2017-01-13 12:44:07 (3240): Successfully started VM. (PID = '3995')
2017-01-13 12:44:07 (3240): Reporting VM Process ID to BOINC.
2017-01-13 12:44:07 (3240): Guest Log: BIOS: VirtualBox 5.0.26
2017-01-13 12:44:07 (3240): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63
2017-01-13 12:44:07 (3240): VM state change detected. (old = 'poweroff', new = 'running')
2017-01-13 12:44:07 (3240): Detected: Web Application Enabled (http://localhost:33403)
2017-01-13 12:44:07 (3240): Detected: Remote Desktop Enabled (localhost:58296)
2017-01-13 12:44:07 (3240): Status Report: Job Duration: '64800.000000'
2017-01-13 12:44:07 (3240): Status Report: Elapsed Time: '26715.438963'
2017-01-13 12:44:07 (3240): Status Report: CPU Time: '14028.040000'
2017-01-13 12:44:07 (3240): Preference change detected
2017-01-13 12:44:07 (3240): Setting CPU throttle for VM. (100%)
2017-01-13 12:44:07 (3240): Setting network throttle for VM. (80KB)
2017-01-13 12:44:07 (3240): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 600 seconds) or (Vbox_job.xml: 600 seconds))
2017-01-13 12:44:09 (3240): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2017-01-13 12:44:09 (3240): Guest Log: BIOS: Booting from Hard Disk...
2017-01-13 12:44:11 (3240): Guest Log: BIOS: KBD: unsupported int 16h function 03
2017-01-13 12:44:11 (3240): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000
2017-01-13 12:44:22 (3240): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
2017-01-13 12:44:39 (3240): Guest Log: VBoxService 4.3.28 r100309 (verbosity: 0) linux.amd64 (May 13 2015 17:11:31) release log
2017-01-13 12:44:39 (3240): Guest Log: 00:00:00.000041 main Log opened 2017-01-13T11:44:36.426830000Z
2017-01-13 12:44:39 (3240): Guest Log: 00:00:00.000239 main OS Product: Linux
2017-01-13 12:44:39 (3240): Guest Log: 00:00:00.000272 main OS Release: 4.1.34-22.cernvm.x86_64
2017-01-13 12:44:39 (3240): Guest Log: 00:00:00.000292 main OS Version: #1 SMP Mon Oct 24 14:29:58 CEST 2016
2017-01-13 12:44:39 (3240): Guest Log: 00:00:00.000311 main OS Service Pack: #1 SMP Mon Oct 24 14:29:58 CEST 2016
2017-01-13 12:44:39 (3240): Guest Log: 00:00:00.000329 main Executable: /usr/sbin/VBoxService
2017-01-13 12:44:39 (3240): Guest Log: 00:00:00.000330 main Process ID: 2646
2017-01-13 12:44:39 (3240): Guest Log: 00:00:00.000330 main Package type: LINUX_64BITS_GENERIC
2017-01-13 12:44:39 (3240): Guest Log: 00:00:00.000852 main 4.3.28 r100309 started. Verbose level = 0
2017-01-13 12:47:58 (3240): Guest Log: [INFO] Mounting the shared directory
2017-01-13 12:47:58 (3240): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor
2017-01-13 12:47:58 (3240): Guest Log: [DEBUG] Testing network connection to cern.ch on port 80
2017-01-13 12:47:58 (3240): Guest Log: [DEBUG] Connection to cern.ch 80 port [tcp/http] succeeded!
2017-01-13 12:47:58 (3240): Guest Log: [DEBUG] 0
2017-01-13 12:47:58 (3240): Guest Log: [DEBUG] Testing CVMFS connection to lhchomeproxy.cern.ch on port 3125
2017-01-13 12:48:03 (3240): Guest Log: [DEBUG] Connection to lhchomeproxy.cern.ch 3125 port [tcp/a13-an] succeeded!
2017-01-13 12:48:03 (3240): Guest Log: [DEBUG] 0
2017-01-13 12:48:03 (3240): Guest Log: [DEBUG] Testing VCCS connection to vccs1.cern.ch on port 443
2017-01-13 12:48:04 (3240): Guest Log: [DEBUG] Connection to vccs1.cern.ch 443 port [tcp/https] succeeded!
2017-01-13 12:48:04 (3240): Guest Log: [DEBUG] 0
2017-01-13 12:48:04 (3240): Guest Log: [DEBUG] Testing connection to Condor server on port 9618
2017-01-13 12:48:04 (3240): Guest Log: [DEBUG] Connection to vccondor01.cern.ch 9618 port [tcp/condor] succeeded!
2017-01-13 12:48:04 (3240): Guest Log: [DEBUG] 0
2017-01-13 12:48:04 (3240): Guest Log: [DEBUG] Probing CVMFS ...
2017-01-13 12:48:05 (3240): Guest Log: Probing /cvmfs/grid.cern.ch... OK
2017-01-13 12:49:26 (3240): Guest Log: Probing /cvmfs/sft.cern.ch... OK
2017-01-13 12:49:26 (3240): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
2017-01-13 12:49:26 (3240): Guest Log: 2.2.0.0 3335 4 21800 3799 13 1 589102 10240001 2 65024 0 20 95 15203 0 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.168.203:3125 1
2017-01-13 12:49:39 (3240): Guest Log: [INFO] Reading volunteer information
2017-01-13 12:49:39 (3240): Guest Log: [INFO] Volunteer: Luigi R. (282378) Host: 10408772
2017-01-13 12:49:39 (3240): Guest Log: [INFO] VMID: 3fc883b4-41f5-4a16-aa19-a080e214e4c6
2017-01-13 12:49:39 (3240): Guest Log: [INFO] Requesting an X509 credential from vLHC@home
2017-01-13 12:49:40 (3240): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2017-01-13 12:49:41 (3240): Guest Log: [INFO] Theory application starting. Check log files.
2017-01-13 12:49:41 (3240): Guest Log: [DEBUG] HTCondor ping
2017-01-13 12:49:45 (3240): Guest Log: [DEBUG] 0
2017-01-13 12:50:30 (3240): Guest Log: [INFO] New Job Starting in slot1
2017-01-13 12:50:30 (3240): Guest Log: [INFO] Condor JobID: 1042966.0 in slot1
2017-01-13 12:50:35 (3240): Guest Log: [INFO] MCPlots JobID: 34797771 in slot1
2017-01-13 12:55:54 (3240): Guest Log: [INFO] Job finished in slot1 with 0.
2017-01-13 12:56:00 (3240): Guest Log: [INFO] New Job Starting in slot1
2017-01-13 12:56:00 (3240): Guest Log: [INFO] Condor JobID: 1043020.0 in slot1
2017-01-13 12:56:06 (3240): Guest Log: [INFO] MCPlots JobID: 34799529 in slot1
2017-01-13 13:08:52 (3240): Guest Log: [INFO] Job finished in slot1 with 0.
2017-01-13 13:08:56 (3240): Guest Log: [INFO] New Job Starting in slot1
2017-01-13 13:08:56 (3240): Guest Log: [INFO] Condor JobID: 1043156.0 in slot1
2017-01-13 13:09:02 (3240): Guest Log: [INFO] MCPlots JobID: 34799673 in slot1
2017-01-13 13:44:05 (3240): Guest Log: [INFO] Job finished in slot1 with 0.
91) Message boards : Theory Application : Job size - download (Message 28465)
Posted 13 Jan 2017 by Luigi R.
Post:
Hello, I would like to know the size of 1 job. When I run many VMs (Theory and CMS), I often experience a long-lasting idle. I guess that many concurrent downloads get stuck or maybe job size is too large for my ADSL (~600kb/s).
92) Message boards : Number crunching : Merge Credits from vLHC (Message 28332)
Posted 4 Jan 2017 by Luigi R.
Post:
Credit is assigned per application and the breakdown per application can be seen in the project statistics page.

Oh, that's what I was looking for.

Total credit will also be provided for the project and this will be used by the BOINC stats sites. As it is difficult to compare credit between different projects or applications, such compressions should be avoided or at least viewed with this understanding in mind.

I agree.
93) Message boards : Number crunching : Merge Credits from vLHC (Message 28319)
Posted 3 Jan 2017 by Luigi R.
Post:
I could argue the same about Sixtrack credits because it was very difficult get tasks in the past. Now someone can get a lot of credits while running VMs and easily overtake great Sixtrack contributors... but I will not do it.

I think it is ok to merge all the LHC credits. Maybe you could keep separated credit counts for every application within LHCatHome, but I prefer merged ones on my BOINC statistics.
94) Message boards : Number crunching : Sixtrack (notag/sse2/pni/sse3) (Message 27823)
Posted 16 Oct 2016 by Luigi R.
Post:
Hello, I think there is something wrong with server rating process.

I'm an i7-4770k owner and I have more than one BOINC client on the same host.
The server thinks one of those clients is faster to run (no tag) workunits, guessing cause of short tasks. Totally wrong. Usually I get sse2 tasks. I don't know why I don't get sse3 workunits, but I will try to crunch them through anonymous platform.
95) Message boards : Number crunching : Host messing up tons of results (Message 27375)
Posted 12 Apr 2015 by Luigi R.
Post:
32 tasks is the limit for my host. 32 tasks enduring ~80s (like this) would terminate in 320s. A great number reduces the probability of getting only flash-tasks. Is there a method to know how much time will a task (before running) get?

Another reason is also because I've often seen there are not many available tasks. Although I do a little "bunker", I'm finishing work before deadline (except that time).


P.S. the other machine (ID: 10356455) errors is cause of win8 failure after update, so no chance to cancel them. ;)

[/OT]
96) Message boards : Number crunching : Host messing up tons of results (Message 27372)
Posted 11 Apr 2015 by Luigi R.
Post:
About cancelled WUs...

Because I have often got network issues with my repeater and "flash"-tasks (that terminate in few seconds) could leave my machine without work. I edited my ncpus from cc_config to get about ~150 WUs and to ensure workload for an entire week, but I got too many ~8h tasks that weren't finishing on time.
97) Message boards : Number crunching : Host messing up tons of results (Message 27370)
Posted 11 Apr 2015 by Luigi R.
Post:
Hello, my machine (id: 10327477) has started to get some invalids. Is it normal?
98) Message boards : Number crunching : Host messing up tons of results (Message 27235)
Posted 28 Mar 2015 by Luigi R.
Post:
This sounds great. I thought inconclusive results would be removed soon from something like server cache. Well, I was wrong. My machine is ok.
99) Message boards : Number crunching : Host messing up tons of results (Message 27233)
Posted 28 Mar 2015 by Luigi R.
Post:
Same problem cause of that host.

I've got inconclusive validation for 2 or 3 ~8h tasks, that means 1 core-day wasted. A bit frustrating.

Thank you for your support.


Previous 20


©2024 CERN