21) Message boards : Number crunching : All tasks carshing with NS_ERROR_FAILURE error (Message 43960)
Posted 20 Dec 2020 by Greger
Post:
Could be MacOSX issue but not sure. Mix of solution on google and virtualbox forum for this.
On linux it could be updated to kernel headers, reinstall or add dkms. Some solve it to move to anothe location or add extension.

Related to thread https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5567

My post https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5567&postid=43959
22) Message boards : Theory Application : Theory (VB64) crashing on MacOS (Message 43959)
Posted 20 Dec 2020 by Greger
Post:
Not sure but might be way if like to digg in to virtualbox forums for MacOSX:
it point to https://forums.virtualbox.org/viewtopic.php?f=8&t=87306 for this error code

There is one solution on stackoverflow to try:
https://stackoverflow.com/questions/52689672/virtualbox-ns-error-failure-0x80004005-macos

This error seems to appear with VirtualBox installs on versions of macOS 10.13.

To fix this issue, you have to uninstall VirtualBox (use the VirtualBox_uninstall.tool of the VirtualBox downloaded dmg).

Then, install it again executing VirtualBox.pkg. At the end of the install, go to System Preferences, Security and Privacy, and the click the allow button :



Could be that Oracle was blocked in Security & Privacy? Give it a try to allow it[/code]
23) Questions and Answers : Windows : All vBox WU in error (Message 43955)
Posted 20 Dec 2020 by Greger
Post:
Well you have maybe XEON cpu with 64 cores or maybe you only do word processing things on your hosts but I don't .

This is not about my hardware or what they processing. It have no affect solving issue here.

But anyway your arguments doesn't have any grip on reality. Why? Because for the last 6 hours, they haven't been enough paused and resumed processes and the VM from CERN could used 100% of the cpu time and still all the tasks have resulted in error and more than a few of them didn't had any pauses.


2020-12-20 09:40:29 (48180): VM state change detected. (old = 'Running', new = 'Paused')
2020-12-20 09:40:39 (48180): VM state change detected. (old = 'Paused', new = 'Running')
2020-12-20 09:51:31 (48180): VM state change detected. (old = 'Running', new = 'Paused')
2020-12-20 09:51:41 (48180): VM state change detected. (old = 'Paused', new = 'Running')
2020-12-20 09:53:14 (48180): Guest Log: [INFO] Mounting the shared directory


2020-12-20 09:54:02 (48180): VM Heartbeat file specified, but missing.
2020-12-20 09:54:02 (48180): VM Heartbeat file specified, but missing file system status. (errno = '2') 


And vm still limit on last task.

2020-12-20 09:33:39 (48180): Preference change detected
2020-12-20 09:33:39 (48180): Setting CPU throttle for VM. (65%)
2020-12-20 09:33:39 (48180): Setting checkpoint interval to 3600 seconds. (Higher value of (Preference: 3600 seconds) or (Vbox_job.xml: 600 seconds))


So in place again, of pointing out the habits or the system of the people trying to help the community, just use some common sense please and see that there is too much of the same occurrence that it could be the cpu cap.


You have set to throttle cpu and vm machines would have hard to handle it. You could set it back to 100%.
It is up to you. I can't help you if your not open to change to default settings.
24) Questions and Answers : Windows : All vBox WU in error (Message 43953)
Posted 20 Dec 2020 by Greger
Post:
2020-12-19 20:38:02 (28448): Setting CPU throttle for VM. (65%)


Vm machines are fragile and do not give much room from host to make there own optimizations. Reduce cpu or ram take in other limits and would make it work in state it is not intended to be in. A bad combo would pause/resume in these tasks.
SSD would not help to deal with these task, they would suffer and interrupted and cause unusefull instance in virtualbox clogging it up. These would need to cleared out by hand and system restart.

Set boinc to default and clear the out then restart system.
25) Message boards : CMS Application : Had ~100 failures on CMS 50 (Message 43913)
Posted 15 Dec 2020 by Greger
Post:
Memory 5.8 GB on both computers with 8 core system is bare minimum to handle is and few sixtrack task.
You have CMS task running and got valid but aborted last one. Probably it wet other task on waiting for ram.

Please uncheck box for native task and test application. You client got many task failed because CVMFS is not installed. If you want to run virtualbox you only get these task by uncheck native but would suggest to run sixtrack and maybe theory until you added more memory.
26) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 43884)
Posted 12 Dec 2020 by Greger
Post:
This update to the WMAgent codes changes would it change to deal with the ipv6 setup issue that we got?
27) Questions and Answers : Unix/Linux : The future of the CentOS Project is CentOS Stream (Message 43865)
Posted 11 Dec 2020 by Greger
Post:
CentOS Project shifts focus to CentOS Stream
Tuesday , 8, December 2020 Rich Bowen
The future of the CentOS Project is CentOS Stream, and over the next year we’ll be shifting focus from CentOS Linux, the rebuild of Red Hat Enterprise Linux (RHEL), to CentOS Stream, which tracks just ahead of a current RHEL release. CentOS Linux 8, as a rebuild of RHEL 8, will end at the end of 2021. CentOS Stream continues after that date, serving as the upstream (development) branch of Red Hat Enterprise Linux.

Meanwhile, we understand many of you are deeply invested in CentOS Linux 7, and we’ll continue to produce that version through the remainder of the RHEL 7 life cycle.

CentOS Stream will also be the centerpiece of a major shift in collaboration among the CentOS Special Interest Groups (SIGs). This ensures SIGs are developing and testing against what becomes the next version of RHEL. This also provides SIGs a clear single goal, rather than having to build and test for two releases. It gives the CentOS contributor community a great deal of influence in the future of RHEL. And it removes confusion around what “CentOS” means in the Linux distribution ecosystem.

When CentOS Linux 8 (the rebuild of RHEL8) ends, your best option will be to migrate to CentOS Stream 8, which is a small delta from CentOS Linux 8, and has regular updates like traditional CentOS Linux releases. If you are using CentOS Linux 8 in a production environment, and are concerned that CentOS Stream will not meet your needs, we encourage you to contact Red Hat about options.

We have an FAQ to help with your information and planning needs, as you figure out how this shift of project focus might affect you.


source: https://blog.centos.org/2020/12/future-is-centos-stream
28) Message boards : Number crunching : Not getting any tasks, though many are available (Message 43860)
Posted 11 Dec 2020 by Greger
Post:
BIOS could be buggy. I would reboot and turn off SVM then save and restart then go in again to bios and turn it on.

long time since i used win but in windows you can go to system/task monitor then to cpu section and it should say that [virtualization enable]. Sometimes boinc-client loose connection to virtualbox and restart of client helps.
If Virtualbox is full of failed vm:s it would need to cleaned out and system restart.
29) Message boards : Number crunching : Running Benchmark (Message 43858)
Posted 11 Dec 2020 by Greger
Post:
boinc.tacc is another project and what is say is that you have made changes on there site for location home and set host to use it instead of default location.
This would happen if host is told to use web pref and it post it as last changes was on that project.
How TACC use settings could be different then other project but a change from default to home would need a change from user.

If you want to avoid TACC go to pref from LHC: https://lhcathome.cern.ch/lhcathome/prefs.php?subset=global and make small change and update LHC on host and hit use web pref from LHC.
Or
Ignore log and make small adjustment on host and host would follow overide xml settings directly on host.

Rule for client is to listen to last changed and this was done 14-Apr-2020 and no changes done since then on other project or account manager if host would use that.
30) Message boards : ATLAS application : LHC sends 256 thread ATLAS native, not producing load (Message 43728)
Posted 27 Nov 2020 by Greger
Post:
For sure yes.

To reduce overhead it would be great to run it on default 12 core for each task but to increasing efficiency you can set it to a few core for each task. I do 4 core for each task as it was great balance for most systems i use.

Looking into systems and os you put in to them they do great with native and ubuntu 18.04 have been more friendly to setup and stable for now but you have a host with 20.04 hat is doing great and looks to be solid.
For 20.04 LTS i have build Singularity as container did not worked to atlas and could be option to do.

The main issue i see and you should do as you use this amount of power is to cut down latency by adding a host with squid to handle amount of data. There is great guide and great config provided by
computezrmle at https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5473 suggest to start using and then work with issues on host after that.

There are other great ways as cut down geo IP and increase ram and fast storage on host that do work and suggest to listen on computezrmle and users suggestions and read guides and FAQ for ports.
31) Message boards : ATLAS application : Web preferences when keeping multiple tasks in queue (Message 43624)
Posted 17 Nov 2020 by Greger
Post:
I am running native ATLAS, and have no problem running four ATLAS at once (two cores each).
I have often run four cores each also with no problem with multiple work units.

Maybe it is different on VirtualBox.


That's native. I'm running the vanilla VB version.


You can if you could change memory to same type and brand.
32) Message boards : ATLAS application : unknown application 'ATLAS'. Known applications: None (Message 43608)
Posted 14 Nov 2020 by Greger
Post:
You can ignore that message or as you say just skip app config and make changes on site instead. Simple solution is to not app config and run default

This how boinc-client works and has nothing with to do which project how handle applications. It would be unknown at start.
Reset project or repair would only wipe master files and make client to download it again.
33) Message boards : Theory Application : Odd runtime and CPU time registered for a task (Message 43601)
Posted 13 Nov 2020 by Greger
Post:
Does anybody have an idea what could have happened?
Not really.
Probably you have in your project preferences "No Limit" for # of cpus and are using an app_config.xml.

Your machine has 8 threads and the number of CPU-seconds reported is about 8 times the real number.

The project preferences for this venue is set to 3 for # of CPUs. Below is the Theory section from app_config.xml:
	<app>
		<name>Theory</name>
		<max_concurrent>4</max_concurrent>
	</app>
	<app_version>
		<app_name>Theory</app_name>
		<plan_class>vbox64_mt_mcore</plan_class>
		<avg_ncpus>1.000000</avg_ncpus>
		<cmdline>--nthreads 1 --memory_size_mb 750</cmdline>
	</app_version> 

All other Theory tasks for this host have normal runtime and CPU time, this was the only one like this I happened to notice. The host does about 30-35 Theory tasks a day + 2 Atlas tasks a day. App_config sets Atlas also to use only 1 CPU core.


Strange cputime but notice that log report default 630MB in memory. Your app is set to 750MB, you might need to re-check config for this. If doesn't catch mem it may not use config for nthreads 1

stderr
Setting Memory Size for VM. (630MB)
34) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 43544)
Posted 1 Nov 2020 by Greger
Post:
No sub task.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=288545179
35) Message boards : ATLAS application : Slow download to LHC (Message 43527)
Posted 23 Oct 2020 by Greger
Post:
ok thanks

Edit: Now back to normal at least for one host. Could be short limit.
36) Message boards : ATLAS application : Slow download to LHC (Message 43523)
Posted 23 Oct 2020 by Greger
Post:
For download to lhcathome-upload.cern.ch 137.138.xx.xx CERN-GPN it's been reduced to around 320KBps (2.7 Mbit/s on network monitor) for all host. Looking at host they grab Atlas at this stage. It looks like each host is fixed to this speed for each download.

And 188.185.xx.xxx got even lower at 1.50 Mbit/s.

In total with several host download currently it max 25 Mbit/s
While on download i did speedtest to local server and got full speed (around 300Mbit/s) and squid give full network speed when possible.

Upload is unaffected.



Download Upload Ping


speedtest.net
37) Message boards : LHC@home Science : Hat Trick Observation for Bosons (Message 43468)
Posted 5 Oct 2020 by Greger
Post:
Thanks Jim
38) Message boards : ATLAS application : computational error on windows 10 BOINC 17.16.11 and virtual box 6.1.14 (Message 43437)
Posted 29 Sep 2020 by Greger
Post:
No network access would mean that boinc-client is not set to always use internet. You can change this in boinc-manager.

These tasks require internet as it they download and upload job data while they are running. So if task start and do not have internet it would fail after few minutes.
39) Message boards : Sixtrack Application : EXIT_DISK_LIMIT_EXCEEDED (Message 43405)
Posted 24 Sep 2020 by Greger
Post:
Current batch have been great with no errors for disk limit.

Thanks team
40) Message boards : ATLAS application : error on Atlas native: 195 (0x000000C3) EXIT_CHILD_FAILED (Message 43338)
Posted 11 Sep 2020 by Greger
Post:
Same issue as before, no changes yet.

As mention it would work if you install any version of singularity. This would probably make 20.04 a n odd system do deal with regarding permissions to pre-build singularity container.

Tested it last yesterday with 3.6.2. Latest from release from github.


Previous 20 · Next 20


©2024 CERN