1) Message boards : ATLAS application : ATLAS native version 2.73 (Message 40833)
Posted 8 days ago by Gunde
Post:
Thanks that could be what i need. I will check this.
2) Message boards : Theory Application : New Native Theory Version 1.1 (Message 40823)
Posted 8 days ago by Gunde
Post:
Theory Native migrate to application 300.02 (native_theory) so same but under another version on project side. To get new/more this would need to used in your settings.
3) Message boards : ATLAS application : ATLAS native version 2.73 (Message 40819)
Posted 9 days ago by Gunde
Post:
The cernvm-fs client yes not server one.

Cache is set on default and would show CVMFS_QUOTA_LIMIT=4000 in config and CVMFS_CACHE_BASE=/var/lib/cvmfs with 37MB in total with setup files included.
This looks fine to me and i check with
cvmfs_config showconfig
which post all lines with parameters. None of these lines mention the filesystem on that got mounted during operational.

I located /tmp on local host as this folder have high storage in use, the rootfs folder are used to atlas and issue would be that these folders increase and not get wiped after completed. My thought would be that filesystem are re-used or they failed to to get removed.

A host with 250GB running boinc only have hard limit at 100GB but normal operation with mix of project aty 20-30GB this include a few atlas running. When host run for several weeks of month without restart it looks like it would suffer on of disk full. It would result in boinc-client crash as system disk get full boinc limit is fine. Cvmfs filesystem is not included in boinc data so spin until system get full. The system does not handle this rootfs folders as might get correct info to do so.
4) Message boards : ATLAS application : ATLAS native version 2.73 (Message 40817)
Posted 9 days ago by Gunde
Post:
I have a question about cernvm-fs. It would mount a filesystem and stored in /tmp/rootfs-xxxxx each at size of 1.3GB.
Is it reused to next work and re-use filesystem or would these filesystem map never be purged?

I ask because some host end up with disk full. Host could be full with a 250GB disk with 20-30GB to boinc and system with /tmp included it could grow up max then crash boinc-client. It could be double amount of folder then task running concurrently so ether they are not re-used properly or not purged after unmounted.
Could not found in dokumentation mention anything on these rootfs folder or in troubleshooting. Commands to reload,wipecache or even restart autofs would handle these these folder. System itself pure them only at start.

A reboot would fix it and system would purge but would this be needed?
5) Message boards : Theory Application : Native app show odd resources in status (Message 40740)
Posted 15 days ago by Gunde
Post:
On few of my host got the share of resources CPUs to task just like coprocessors would get. I have not not seen this for a cpu application until today.
Some task (not all) on hosts got this and not fixed to host. Some got higher then other but lower target then 1 core/thread. I put put screenshot to of my my host taken from boinc manager.



second host with different usage:



Looking at cputime it looks to be far to high with that runtime it got.
No app_config.xml is used to Theory only to Atlas and no changes to cc_config for last weeks. If this application would be mt task it could show target of amount specified with full share of CPUs but this not used that i am aware of now. Most task should be normal of 1 core/thread and therefor not mt. Application on project since 300.02 released. My thought is a change in batch system got changes or my host have gone crazy.
Any clue why this happen?
6) Message boards : Theory Application : Creating local CVMFS repository is failing (Message 40736)
Posted 16 days ago by Gunde
Post:
If probe failed try wipe the cache.

sudo cvmfs_config wipecache
7) Message boards : Number crunching : Not receiving VM tasks? (Message 40411)
Posted 12 Nov 2019 by Gunde
Post:
Could try Cosmology with camb_boinc2docker. If that works you would know that is LHC only.
8) Message boards : Number crunching : Not receiving VM tasks? (Message 40408)
Posted 12 Nov 2019 by Gunde
Post:
Your Ryzen got work but failed on start

Waiting for VM "boinc_5128678a8ca33e95" to power on...
VBoxManage.exe: error: Not in a hypervisor partition (HVP=0) (VERR_NEM_NOT_AVAILABLE).
VBoxManage.exe: error: AMD-V is disabled in the BIOS (or by the host OS) (VERR_SVM_DISABLED)
VBoxManage.exe: error: Details: code E_FAIL (0x80004005), component ConsoleWrap, interface IConsole

AMD-V is disabled in the BIOS (or by the host OS) (VERR_SVM_DISABLED)

Take some time and check bios again on both try another version of virtualbox if it still fail.
9) Message boards : Number crunching : Not receiving VM tasks? (Message 40406)
Posted 12 Nov 2019 by Gunde
Post:
The host you link to say: Virtualization Virtualbox (6.0.14) installed, CPU does not have hardware virtualization support

Reboot and go to BIOS and enable Intel Virtualization Technology (Intel VT-x) in CPU advanced settings.
10) Message boards : Theory Application : Draining the Theory Queue (Message 40393)
Posted 11 Nov 2019 by Gunde
Post:
vbox or native?
11) Message boards : ATLAS application : ATLAS vbox version 2.00 (Message 40162)
Posted 15 Oct 2019 by Gunde
Post:
I have checked few of my host and it looks to an issue to get work on vbox vm:s. With old vbox 1.01 application it allow me to reach top and i would that only one process of Athena.py is running.

I could not reach top from console on new 2.00 Centos 7. Stuck on login on each session but from system monitor they stay on low usage. Would like to see if cpu and ram usage and which processes but not possible.
turn on Native application with 2.72 they fire up fine up running after getting data.
12) Message boards : ATLAS application : ATLAS vbox version 2.00 (Message 40160)
Posted 15 Oct 2019 by Gunde
Post:
Got a few ATLAS Simulation v2.00 (vbox64_mt_mcore_atlas) x86_64-pc-linux-gnu but server fallback to v1.01 on last downloaded task.
Was there any issue with v2.00? Same as for others i got a few long runners up to 4 days now. 2 of them stalled with kernal panic and 2 with reset adapter.

When check task list it appears that many vbox v1.01 got invalid/error on upload

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>AqhLDmNm2dvnsSi4apGgGQJmABFKDmABFKDmRR5ZDmABFKDm4Ju39n_0_r644774494_ATLAS_result</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
</message>


Was the data purged on server or any issue on servers to receive result file?
Issue started yesterday and i see several from today.
13) Questions and Answers : Preferences : Ready to start... (Message 40140)
Posted 12 Oct 2019 by Gunde
Post:
Rule is to follow keyboard+mouse and any move on these should have effect to boinc. Unplugging and plug in is not counted as in use and as long as system detect hardware it would any move of these.

I would use this, give it a try:
14) Questions and Answers : Preferences : Ready to start... (Message 40139)
Posted 12 Oct 2019 by Gunde
Post:

The setting as it is it pull down gpu also as gpu would require the cpu to be active. That is all it should be doing and design to do. The preferences for gpu would be unnecessary as would need to follow rule of cpu as long as it not have override time to suspend more then cpu.
However on value could give this a value greater then 0. Value as it is would be instant with value 0 of not in 'in use' suggest to increase value to minimum 1 min. This could be bad for any project or any boinc-client to handle request of suspend and resume that frequently. If you run Atlas,Theory or CMS it would take a snapshot of vm machine which include os and application and job into it. Estimated time to shutdown a vm and create a snapshot could take 10-30 sec with good SSD. Suspend and resume task would most likely end up with state 'Postponed'.

Network looks clear and fine. Daily Schedules is also clear so that would not hold up.

We would come up to Status as you show they end up at suspended.

Your issue is probably that it got suspended is probably the setting to 'Suspend when non-BOINC CPU usage is above 25%' in computing. Try increase that to something like 30-50-90% or uncheck that.

Computing
Suspend when computer is in use [0] -> 5 min

Suspend when non-BOINC CPU usage is above [25%] -> 80%




Usage above would include all usage for system except Boinc itself so it could be close in load to system and additional application in background.
Let me know if that changed it.
15) Questions and Answers : Preferences : Ready to start... (Message 40136)
Posted 12 Oct 2019 by Gunde
Post:
How would you like it?

If no limit set to [Run always] and [Network activity always]
As i understand you would like to like to keep 'Suspend when computer is in use'. This is based on time you have set in Options->Computing Preferences >Computing ->[When to Suspend]
Sixtrack would change to 'Ready to start' for that that never started and task already started would be 'waiting to run' and task are resume would be in state 'Running'.
Network is limited as based on your rules not as 'in use'.
Go to:Options->Computing Preferences->Network [Usage Limits] & [Others] then check if its limited based on time of day at Options->Computing Preferences->Daily Schedules. Make check that time is set correct if you really need to use this. It would say 'waiting for network access' if time is set for network.

This could also be if you have set on app_config if use that to require network to application If you use this. You point out that no setting is set then i would suggest to set 'Network activity always'.
It looks like you have set `Run based on Preferences` and 'Network based on Preferences' follow another layer of rule.
You could get additional info in Event Log to when and why client change state.

example: If network is limited to run based on time you have set the client will not run. ATLAS will show 'waiting for network access' as it would require to use internet to get jobs inside VM machine. In Event Log also say Suspending network activity - time of day. In that case you would be sure that is set wrong in [Daily Schedules]. If not it would be computer lost internet or app_config is set to hold it.
Application like Sixtrack would not need network while running so it never be suspended by this rule.

Leave suggestion on how you would like to setup boinc and we could help more. If not just try settings.
16) Message boards : ATLAS application : ATLAS native version 2.72 (Message 40129)
Posted 11 Oct 2019 by Gunde
Post:
Try remove singularity and use the one build in vm. It worked for me with 18.10.

$ sudo rm -rf \
    /usr/local/libexec/singularity \
    /usr/local/var/singularity \
    /usr/local/etc/singularity \
    /usr/local/bin/singularity \
    /usr/local/bin/run-singularity \
    /usr/local/etc/bash_completion.d/singularity


https://sylabs.io/guides/3.0/user-guide/installation.html#remove-an-old-version
17) Message boards : Number crunching : computation errors (Message 40001)
Posted 22 Sep 2019 by Gunde
Post:
I have not manage to build in 19.04 as some libs been deprecated. Cern would need to make it to support 19.04. This could have changed but not sure on that.

If all task failed i would check check setup.
cvmfs_config chksetup


Then also probe.
cvmfs_config probe


Does it say OK?
18) Message boards : Theory Application : New version 263.90 (Message 39783)
Posted 1 Sep 2019 by Gunde
Post:
Did a test and load 2.7 for Theory #Unlimited 32 core task and total for system around 3.6 cores .
Virtualbox would have 16C as supported usage so the do not know it would handle 32C task for Theory.

Other project as Cosmology is effected on this limit and they hand out 32C MT task in docker container but they fail at start if user do not set a limit.
19) Message boards : Theory Application : New version 263.90 (Message 39782)
Posted 31 Aug 2019 by Gunde
Post:
For the project it would help if project admins set count in cores to what the applications could scale up to. For users experience and also to project it could have great balance for MT task. I'm happy to see that Theory that they move out of single core task and also make it possible for native application but still on low processes in jobs to each task (2-4) last tested. I did a test last year and got 40 cores task at that time and host end at 6% in cpu usage. Way out of even possible to reach that count and it could be changed since that time but i would not use more then 4 cores today based on experience of what it could run on my host. Today i use 3 cores each but it would probably be times that only 1 cores are needed when it hit a long runner job.

Looking at Atlas it scale up to even 12 cores and really use up to that count and end faster if when events are done.

Theory could be reduced to max 4 cores as default and when jobs to vm are sorted out it could be increased to core count it could scale up to. This would put away high credits for users that use app_config and project would get more task running and work more effectively.

Remove "unlimited" in users LHC@home preferences to cores the application really could use as max target.
20) Message boards : Theory Application : Could not get X509 credentials (Message 39169)
Posted 22 Jun 2019 by Gunde
Post:
Got this on 3 old host added.
Exit status	206 (0x000000CE) EXIT_INIT_FAILURE

<message>
The filename or extension is too long.
 (0xce) - exit code 206 (0xce)</message>


But log show more why:

[019-06-23 00:33:02 (5080): Guest Log: [INFO] Requesting an X509 credential from LHC@home

2019-06-23 00:33:03 (5080): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev

2019-06-23 00:33:34 (5080): Guest Log: [INFO] Requesting an X509 credential from LHC@home

2019-06-23 00:33:35 (5080): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev

2019-06-23 00:34:06 (5080): Guest Log: [INFO] Requesting an X509 credential from LHC@home

2019-06-23 00:34:06 (5080): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev

2019-06-23 00:34:37 (5080): Guest Log: [INFO] Requesting an X509 credential from LHC@home

2019-06-23 00:34:38 (5080): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev

2019-06-23 00:35:09 (5080): Guest Log: [INFO] Requesting an X509 credential from LHC@home

2019-06-23 00:35:10 (5080): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev

2019-06-23 00:35:41 (5080): Guest Log: [INFO] Requesting an X509 credential from LHC@home

2019-06-23 00:35:42 (5080): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev

2019-06-23 00:36:14 (5080): Guest Log: [DEBUG] 

2019-06-23 00:36:14 (5080): Guest Log: curl: (60) Peer certificate cannot be authenticated with known CA certificates

2019-06-23 00:36:14 (5080): Guest Log: More details here: http://curl.haxx.se/docs/sslcerts.html

2019-06-23 00:36:14 (5080): Guest Log: curl performs SSL certificate verification by default, using a "bundle"

2019-06-23 00:36:14 (5080): Guest Log:  of Certificate Authority (CA) public keys (CA certs). If the default

2019-06-23 00:36:14 (5080): Guest Log:  bundle file isn't adequate, you can specify an alternate file

2019-06-23 00:36:14 (5080): Guest Log:  using the --cacert option.

2019-06-23 00:36:14 (5080): Guest Log: If this HTTPS server uses a certificate signed by a CA represented in

2019-06-23 00:36:14 (5080): Guest Log:  the bundle, the certificate verification probably failed due to a

2019-06-23 00:36:14 (5080): Guest Log:  problem with the certificate (it might be expired, or the name might

2019-06-23 00:36:14 (5080): Guest Log:  not match the domain name in the URL).

2019-06-23 00:36:14 (5080): Guest Log: If you'd like to turn off curl's verification of the certificate, use

2019-06-23 00:36:14 (5080): Guest Log:  the -k (or --insecure) option.

2019-06-23 00:36:14 (5080): Guest Log: [DEBUG] 

2019-06-23 00:36:14 (5080): Guest Log: ERROR: Couldn't find a valid proxy.

2019-06-23 00:36:14 (5080): Guest Log:        globus_sysconfig: File has zero length: File: /tmp/x509up_u0

2019-06-23 00:36:14 (5080): Guest Log: Use -debug for further information.

2019-06-23 00:36:15 (5080): Guest Log: [ERROR] Could not get an x509 credential

2019-06-23 00:36:15 (5080): Guest Log: [ERROR] The x509 proxy creation failed.

2019-06-23 00:36:15 (5080): Guest Log: [INFO] Shutting Down.


Not sure if cert renew in setup files or depended on what server hand out. I let host retry later but if not changed and it would use old certs more host would suffer from it.
These host (win 10 x64) have direct connection to server with no proxy/vpn attached.


Next 20


©2019 CERN