1) Message boards : Sixtrack Application : XBoinc (Message 47826)
Posted 8 Mar 2023 by marmot
Post:
Received a batch of 38 XBoinc at the test server.
They are 24 hours from deadline. Going to let them go past deadline and test if they error out from going over. None have completed on their own. 5 were aborted. They do not respond to a client call for shutdown, but they suspend properly.
Not sure what work they are doing for 24 to72+ hours and if they were supposed to depend on a GPU CoProc.
Here's my post there: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=557&postid=7927
2) Message boards : ATLAS application : Atlas Simulation tasks stuck (Message 44841)
Posted 29 Apr 2021 by marmot
Post:
I have a couple of these also.

Going on the 5th day and always approaching 100% but never reaching it.



Found a solution.

Suspend the WU manually in BOINC.

Open VBox Manager and find the newly saved state ATLAS VM.
Delete the saved state,

Now the WU has to start over and the corrupt execution state is gone.

Although, the credit was pitiful and, since the WU's started from scratch, would probably have been just as well to abort them.
BUT, nothing new is learned unless you experiment.

314464634	162960124	22 Apr 2021, 11:19:02 UTC	28 Apr 2021, 19:06:30 UTC	Completed and validated	497,012.25	553,831.60	265.24	ATLAS Simulation v2.00 (vbox64_mt_mcore_atlas)
windows_x86_64


313918637	162691404	21 Apr 2021, 9:24:05 UTC	29 Apr 2021, 2:26:32 UTC	Completed and validated	536,236.41	500,751.30	408.92	ATLAS Simulation v2.00 (vbox64_mt_mcore_atlas)
windows_x86_64
3) Message boards : ATLAS application : Atlas Simulation tasks stuck (Message 44840)
Posted 29 Apr 2021 by marmot
Post:
I have more than enough memory and 16 cores.
I have found that if BOINC runs more than one instance of ATLAS the task will stall at around 90% and drag on forever. It will only increase in finishing by .001% every few seconds.
You need to limit the number of simultaneous tasks by LHC. I even tried running Theory and ATLAS together and ATLAS stalled out all the time.

Now if you run ATLAS alone and set your cpu number to 4 in the preferences, then it will make ATLAS run only one task at a time. It should take under 8 hours to complete a task.
If you run any other LHC projects, then you will have to write a special script to force BOINC to run only 1 task from LHC at a time.


I have traced this behavior to ATLAS job not saving it state properly. (There maybe other causes but this is certainly one).

In BOINC advanced options, computing preferences, computing tab, set "switch between tasks every" to 9999 minutes so that ATLAS is never suspended when BOINC decides to swap WU tasks to accommodate resource share of multiple projects. (Or isolate your other projects from ATLAS).

If you need to shut down BOINC then suspend your ATLAS WU's 1 at a time so they get saved properly in VBox manager.
4) Message boards : ATLAS application : Atlas Simulation tasks stuck (Message 44829)
Posted 27 Apr 2021 by marmot
Post:
I have a couple of these also.

Going on the 5th day and always approaching 100% but never reaching it.
One of them gives a flashing numluck/scrolllock LED's when connecting to the VM using VBox manager.
The other just shows the login screen and is still using 1 core of the client.

I remember the tasks eventually ending and giving credit for the days of time consumed.
Something has changed?

Seriously do not want to abort 10 days of core usage for no credit if these will eventually end with credit.
5) Message boards : ATLAS application : Avoid ATLAS task with app_config (Message 44568)
Posted 26 Mar 2021 by marmot
Post:
it strange, yes, looking it say it successfully ran the WU but then couldn't successfully power off the VM, so you successfully did work for CMS, just the project didn't think so.

Sometimes I get junk VM's left over that can cause problems, so I clear them out. How is your storage, could be it too slow clean up after the WU is done and then it time out boinc detecting. Last one I see is that if I run 1 WU/core then it can really slow my computer so I do use at most 80%, it still loads my CPU to 100% normally but it not red lined.


I've had to go into the media manager and delete hundreds of failed attached disks when something like this happened before.

IIRC ended up performing a complete uninstall of Oracle vBox and then a fresh install to get the problem to disappear.

(But this is off topic from the original poster's question about how to stop ATLAS from even running)
6) Message boards : ATLAS application : Avoid ATLAS task with app_config (Message 44512)
Posted 18 Mar 2021 by marmot
Post:
If you are creative there should be a way to prevent ATLAS from running locally.
I've stopped WU's locally 3 years back with some techniques on various projects but the details are fading from my brain.

The 1st technique was discovered accidentally by attempting to fit the most Theory WU's into limited RAM on my new server. I set the RAM size for the LHC WU's too small to succeed (there are other posts in the forums about how to use app_config.xml to adjust RAM used by Theory or ATLAS).
However, this leads to a long list of failed WU's but the server should eventually stop sending WU's (for the day, maybe longer I don't remember).

The more subtle method I had to invent was because LHC@Home would continue to d/l the ATLAS virtual disk even though I had refused ATLAS WU's in the preferences and had set BOINC to leave at least 3GB free.
LHC@Home ignored all my limits on it's behavior and was running my laptop hardrive out of space causing me data losses.

This procedure is for Windows OS:
1) Shut down BOINC
2) enter the LHC project data folder
3) rt click on the ATLAS_vbox_2.00_image.vdi file and select rename
4) copy the name into the copy-paste buffer
5) delete ATLAS_vbox_2.00_image.vdi
6) rt click on the folder space and choose NEW FILE
7) create a new text document
8) rt click that document and rename it to ATLAS_vbox_2.00_image.vdi by pasting the name from the copy-paste buffer (or use your brain's memory)
9) rt click on that new file and select Properties
10) set the file to Read Only.
11) You'll have a 0 byte invalid ATLAS image that can't be modified by BOINC

Restart BOINC.
Now ATLAS should fail to ever start because it fails to d/l it's image and the one marked Read Only fails checksums and obviously can't start a VM.
7) Message boards : Number crunching : VM image ready to crunch (Message 44193)
Posted 24 Jan 2021 by marmot
Post:
Morning,

This is exactly what I meant, I didn't want to tlak about what my host is running or not running :)

I understood there is no "official" image built with such conf...maybe is time to spend some effort in such image by the community?



I'm working on my 1st version with antiX but am looking at Puppy's newest Ubuntu version as antiX broke the upgrade path from Debian 9 to 10. Will need a full rebuild. I have so many labor hours in this antiX 17.2 VM for BOINC now that I'll finish it and put it into production before developing another VM.

I'll check into Puppy's upgrade path continuity.

Thanks for bringing up the topic and hopefully some other people help with a BOINCix project.
I still think it will be more useful for managing workloads over a native Linux install on Windows 10 WSL subsystem.
8) Message boards : Number crunching : VM image ready to crunch (Message 44192)
Posted 24 Jan 2021 by marmot
Post:
I am running QuChemPedIA@home tasks on a Windows 10 PC using VirtualBox. QuChem is a Linux project and 94% of its users run a native Linux client, most on Debian. Strangely enough, when my PC completes a task, 90% of the times it was faster than its Linux wingman, even if the Linux host has a much more powerful CPU, mostly AMD Ryzen Threadripper.My CPU is an Intel i5 9400F, with 6 processors, that is 3 cores. Some of my Linux wingmen reach 128 processors. I am an old UNIX and Linux user, but at least in this case Windows is faster.
Tullio

How did you build your Linux VM?
Did you use the Cern product?

No, i am using the wrapper QuChem provided me. They complain that they don't have a developer and I don't know where they got it. I am 50 in their ranking list of RAC.
Tullio
In stderr.txt the vboxwrapper is indicated as 7.9.26200
I've completed a number of BOINC@TACC tasks on the same Windows 10 PC. They seem to use a docker 4.1.19-boot2 docker. Frankly, I don't know what it is.


Same BOINC turnkey solution that Kryptos@Home used to get their project up and sending work quickly.
Here's their GitHub page:
https://github.com/boot2docker/boot2dockerz
9) Message boards : Number crunching : VM image ready to crunch (Message 44191)
Posted 24 Jan 2021 by marmot
Post:
Success?
No, since the task did not create a HITS file.
[2021-01-21 10:38:47] No HITS result produced


According to the WU database; the WU is valid and received credit.

That is the only criteria a user is to judge the success or failure of BOINC WU's, and it has been this way for decades.

If no HITS file created is an error state; then do not give credit and mark the WU as one ending in an error.


In addition the CVMFS client is installed but not configured to use openhtc.io.
This disrespects the project's requirements not to swamp the CVMFS stratum-one-servers.
[2021-01-21 09:33:23] 2.6.0.0 14739 0 66852 77616 3 1 74634 4096001 0 65024 0 0 n/a 525 196 http://cvmfs-s1fnal.opensciencegrid.org/cvmfs/atlas.cern.ch DIRECT 1
[2021-01-21 09:33:23] CVMFS is ok
[2021-01-21 09:33:23] Efficiency of ATLAS tasks can be improved by the following measure(s):
[2021-01-21 09:33:23] The CVMFS client on this computer should be configured to use Cloudflare's openhtc.io.
[2021-01-21 09:33:23] Small home clusters do not require a local http proxy but it is suggested if
[2021-01-21 09:33:23] more than 10 cores throughout the same LAN segment are regularly running ATLAS like tasks.
[2021-01-21 09:33:23] Further information can be found at the LHC@home message board.


Again, the WU succeeded and reports valid.
If the HITS file is not being created properly then that is an issue that needs correcting.
Cloudflare is 3rd party and was never necessary for performing BOINC work.

This VM is a work in progress and correcting the keys error file; getting updates and then installing Singularity, then figuring out why the HITS file is not being created (might be corrected once Singularity is installed) take precedence over Cloudflare connection (which I do not even like to use. It is a for-profit company that does not respect data privacy. It appears they have improved their privacy regulations and so I will need to change my opinion here. I need a few days to read more about their changes. It takes time to change your emotions towards a company that you held personal animosity towards.)
10) Message boards : Number crunching : VM image ready to crunch (Message 44173)
Posted 21 Jan 2021 by marmot
Post:
Tested my first ATLAS in my antiX VM after increasing it's RAM to 2560 and using BleachBit to purge extraneous files to get enough disk on the meager 16GB VDI ( I need to grow it to 24-32GB).

I forgot to install Singularity but CVMFS is installed and the WU completed by grabbing what it needed of Singularity over the shared fs
The 3 core WU barely used any CPU and the OS showed less than 1300MB usage the entire time.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=296573069


BTW, Microsoft has built a Linux subsystem (WSL) for Windows 10. Some Linux distros are then available from the Microsoft store ( gotta grow that store).
https://subhankarsarkar.com/run-native-linux-on-windows/

I wonder how many users have successfully run native Linux BOINC WU with that system installed?
Probably BOINC would need 2 installs and I don't think BOINC WU reports would be able to discern the Linux app was run on a machine in a WSL subsystem. Maybe WUProps could detect Linux apps run on WSL?
11) Message boards : Number crunching : VM image ready to crunch (Message 44172)
Posted 21 Jan 2021 by marmot
Post:
I am running QuChemPedIA@home tasks on a Windows 10 PC using VirtualBox. QuChem is a Linux project and 94% of its users run a native Linux client, most on Debian. Strangely enough, when my PC completes a task, 90% of the times it was faster than its Linux wingman, even if the Linux host has a much more powerful CPU, mostly AMD Ryzen Threadripper.My CPU is an Intel i5 9400F, with 6 processors, that is 3 cores. Some of my Linux wingmen reach 128 processors. I am an old UNIX and Linux user, but at least in this case Windows is faster.
Tullio

How did you build your Linux VM?
Did you use the Cern product?
12) Message boards : Number crunching : VM image ready to crunch (Message 44167)
Posted 21 Jan 2021 by marmot
Post:
Hi team,

I was just wondering: Isn't there an image with all stuff done ready to crunch? I mean, I download it, use Virtualbox and run it, having insise the native apps.

In the end of the day, we are running VM anyway, seems to be "easier", isn't it? At least we would not be loading and unloading 1 image per task

On the other hand, is a million times easier for a non extra experienced user, to just download and load a linux VM and crunch than following the instructions (is quite long...)

Thx¡
Javi


This discussion has all gone off track.

It's the same question I and others have had:

'Is there a downloadable, prebuilt, Linux VDI that is ready to run LHC@Home native apps as soon as you create your own VM using that VDI drive and then change the machine name?'

I built an antiX (Debian 9 based) VM that runs on as little as 128MB RAM which has been running various Linus only WU's for BOINC projects on my Windows test laptop.
This VM was going to be setup for LHC WU's as well; but it now has keys download issues and no upgrade path to Debian 10 and so I'm considering developing a new version of my LinuxForBOINC VM.

This seems like a GitHub project waiting to be developed.

Maybe it could be called "BoincedLinuxVM"?

I'm sure many Windows users would love a downloadable Linux VM to run BOINC apps that are Linux only and not have the headaches of managing Linux or having to go through the 10 to 40 hours of creation time when the wheel has already been created by a GitHub team.
It would also save on RAM as native ATLAS only needs a VM to have 2560MB for 4 cores.
13) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 44135)
Posted 18 Jan 2021 by marmot
Post:



Hence, it is not a nice behaviour to launch a new project now that forces already outdated software components.


I can agree with that.

I can also see how this happened as when you search for Boinc2Docker (which uses Boot2Docker) you find a startup solution that will save a new project from reinventing the wheel, but it's not obvious that the work is based on, is now deprecated.

After the first 2 days of releasing this WU, and seeing the issues, Kryptos@Home recognized their mistake, apologized, and are planning on creating native apps.

I do not see intent to harm the user base here.
The project is also short term to solve a single cryptographic message and the solution might be found tomorrow or 6 months from now. I'm sure they do not want to sink thousands of hours of development time into something that might be done any day.


BTW, if you know of a more up-to-date, turnkey solution for new BOINC projects, let them know.
If you are developing a turnkey BOINC project product; here's your chance for a sale.
14) Message boards : Number crunching : Not getting any tasks, though many are available (Message 44128)
Posted 17 Jan 2021 by marmot
Post:
I do have "Run native if available" checked.

Sure.
Disable it.
And please, don't ask: "Why?"



This solved my problem and saved me diagnostics time.

So my 2700x now has my the 1st ATLAS WU for me in since sometime in 2018.

Back doing ATLAS till the summer months bring back the heat and it's time to shut down heat generators
(home heating with science computing... everyone should dump their furnaces and put a heating computer in each room!)
15) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 44127)
Posted 17 Jan 2021 by marmot
Post:

I suggest to contact the admins of the "new" project and ask them to update their project.
Older versions may run with LHC but there is no guarantee that they are supported after OS/BOINC updates.

Cosmology@Home is running with Virtualbox 5.2.44 only.
For me LHC@Home in Windows run also with 5.2.44, but also with 6.1.12 from Boinc-Mainpage. (Boinc 7.16.11).


IIRC, Cosmology@Home is also using Boot2Docker inside their VM's
Reviewing the GitHub page gives some insight as to why there might be issues under VBox 6.x. (such as the project was developed with 5.x VBox and deprecated before properly tested under VBox 6).
https://github.com/boot2docker/boot2docker
16) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 44125)
Posted 16 Jan 2021 by marmot
Post:
Can you tell me what version of VB is required for LHC?

I returned work two days ago on VBox 5.2.42 (Ubuntu 18.04.5).
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10671473


Thankyou Jim.
Question answered.
17) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 44123)
Posted 16 Jan 2021 by marmot
Post:
Can you tell me what version of VB is required for LHC? Because I've just joined another project that doesn't like later versions. Will it break LHC if I downgrade?

According to the VirtualBox download page all versions prior to 6.1 are out of support since July 2020:
https://www.virtualbox.org/wiki/Downloads

I suggest to contact the admins of the "new" project and ask them to update their project.
Older versions may run with LHC but there is no guarantee that they are supported after OS/BOINC updates.


@computezrmle, you avoided answering the question and just cast shade at the other project.
Their project is in development and, for this beta period, are not able to use VBox 6.1.x and later under Windows X only.

Do LHC WU's work properly under VBox ver 5.2.44?

Because I am going back to 5.2.44 and will run ATLAS on that version.
LHC doesn't appear to be calling on any new features of VBox that haven't been supported since ver 4.x
18) Message boards : Number crunching : Android x86 app (Message 37909)
Posted 2 Feb 2019 by marmot
Post:
Hi. I have an ASUS ZenFone Zoom smartphone (Intel Z3580 CPU quad-core 2.3GHz) but at the moment I can only help SETI research. Can you tell me when you will add an app for x86 Android devices?


We can run on android devices, see sixtracktest here: https://lhcathome.cern.ch/lhcathome/apps.php

Currently there are two issues:

1: The ram usage of sixtrack is relatively high - almost all internal arrays are allocated with a fixed size at compile time, and set to a "worst case" capacity. A lot of android devices just do not have sufficient spare ram. Work is in progress to re-structure the internals to reduce the memory usage and only allocate what is needed.

2: The security model in the current android release has restricted the usage of some syscalls, and we seem to use some of the restricted ones. This needs to be changed.

Once both of these issues have been resolved, we can push the android version out to production.



Can we get an update on the ARM Sixtrack application?

My tablet has been waiting since I got it...
19) Message boards : Sixtrack Application : multi threading six track possible in any way? (Message 37908)
Posted 2 Feb 2019 by marmot
Post:
Hi,

SixTrack is not a multi threaded application, and due to the nature of the simulations we run on SixTrack, there is no real need for parallelising it.


Thanks for the response and your work on the project.
20) Message boards : Sixtrack Application : multi threading six track possible in any way? (Message 37884)
Posted 1 Feb 2019 by marmot
Post:
Hello,
Why not to use LHC@HOME preferences ?
You can select amount of CPU !
If you want to run 4CPU on one host and 8CPU on a other host,
add a separate computer location with preferences you want (CPU/GPU/application/ressources).


That's running 8 single core WU at once; not running one 8 thread WU and not what I'm looking for.


Next 20


©2024 CERN