1) Message boards : Number crunching : Gridcoin / Stats? (Message 46436)
Posted 13 Mar 2022 by Greger
Post:
Why don't people just join a REAL team instead of Gridcoin!


[This would be off topic and sorry about that]
You are free to do what you want and think what is best for you i would support that but in another view on it:
Why do people join team under country name? I don't get why they should support a country when it doesn't have any relation to BOINC or project?
I am an individual man a true volunteer user in view to BOINC and projects and it will remain be so. Specially in this time with happen with invasion Ukraine and Russia. Would you support a country by day of what happen or what government choose to do? I would not support any country put any relation of actions to a country.
Why not make team for your self and show that you support science and project only?
I would support team for open solution and community driven system with where blockchain have best use more then just credit (that do share evenly today.

I have no sympathy for anyone who's a part of Gridcoin and it isn't working. BOINC should be "volunteer computing", not paid :/


A reward back from what i offset in power and hardware and time for science is future to expand and continue. Today all it is supported behind donations or company or organisations to keep it alive and volunteers keep dropping and much due to high bills and lower development. Many project struggle to keep servers online and may not be able to upgrade or replace bad hardware.
So even put your support on that from reward from reward you earned.

I would start my journey as a volunteer and keep supporting projects that i like with or without a team or Gridcoin. I am users for them but for my self a contributor directly to science without any organisation involved or any company relation to country or government or whatever between me or project. "They" or those people that would be a middleman better be someone or something that truly do not gain of me or science i do to project. I choose what it best for me and what i choose to support and so accept others to do the same.
It is an options/opportunity i take as an individual user back off in some cases to groups and teams to not be related in things that i would not support. So it is fine that people do the same to Gridcoin in my view.
I try to avoid mentions Gridcoin to project or other BOINC forums but in this case with this thread it mentioned as it was discussed inside gridcoin community and i am sorry for that and if other people do not see us as true volunteer i think they are wrong but i accept there thought. This why it could harm project or offend people that do want to a part of it and will respect that. But also see it from other side and try to understand how they will see it.

In view of challenges and teams and other groups that support and involve politics or religions or relation to governments would not be good and could create a unnecessary conflict that we do not need to BOINC or projects. But i only respect it for what it is and only my view so respect users support it.

I started with folding and moved to boinc before gridcoin and end my journey the same goal in my mind but never let politics, religion, groups or something else be apart. I will keep walking as long as i can and if found some shoes on way that are not used i be happy to use it to help myself on journey.

You choose what to do but make sure keep head up so you can proved of actions you do when look back in time or mirror. It is all up to you an don't let anyone take it from you.

[It will be my last post for this keep it on topic]
2) Message boards : Number crunching : Gridcoin / Stats? (Message 46424)
Posted 4 Mar 2022 by Greger
Post:
Correct we have have an issue but it is a general issue with server options that serve "credit new". In all this good for boinc purpose of distribution of credit based on effort time that is put into task.

I would like to refer to my post at github ( https://github.com/gridcoin-community/Gridcoin-Tasks/issues/245 ) but admit that addition issues is not upfront/clear but users on projects would be fully aware that not hard to exploit credit system as it is to LHC today (if not changed).

I have been since start to Atlas and when vLHC was in beta and follow project well and been happy as it is and used native cvmfs along with squid for many years to avoid virtualbox. Issue started when theory allow 32 threaded task meanwhile virtulabox only support 16 threads. The way to trick system was only to lock threads and keep high memory to vm as memory was big factor of multiplier to unit of calculate the credits. This some part also to Atlas today and issue have been when you locked to one thread to these units.

Each application of vm have it's own way to operate and long runners make hard to change and reward fairly and not clear or an easy issue to solve. Users that put in a large effort of not just runtime but also cputime should get reward for it. And this a part of creditnew do great for this project but easy to trick for more experienced users. But there are few keys downsides on this and a big factor and abuse is the fpops_est that open up for users to boost credit on task of 10-20 task before it is adjusted on hosts.

When i monitor host before greylist to Gridcoin a host was up to over 1000 burned hosts with script of new created instance of boinc-clients. I monitor jobs done from vm on these and stated to have done 5-3 jobs instead of of around possible 100 jobs to theory. IN all vm continue without issues and lack of validation of jobs for work done. This main issue to LHC and could be prevented on there side that collect data.

On boinc github there is great suggestion on wrapper to vbox and community have also react to to contribute for better solution to validate work Please review work from ILikeChocolate and paper is linked at https://boinc.berkeley.edu/forum_thread.php?id=14538#106759

In all this an effort could be done to solve issue to LHC if they used work around of credit system they use today or validate work better. This might be in great demand to gridcoin but affect ALL using boinc with or without any reward on top of it.
Gridcoin only read stats of RAC (avg work done) it does not affect boinc affect project directly but wide range of users that could help with or without grc. When co-lations/datacenter take over the project the opportunity for is lost and would on high value to stay open to people and scientist to being able to project as they like.

I would like give special thanks to you computezrmle to help users and commit to github and backend to LHC. It's been great time follow progress and hope admins could do same to project as you have done.
3) Message boards : ATLAS application : credits for runtime, not for cputime ? (Message 44798)
Posted 23 Apr 2021 by Greger
Post:
That is good point and mostly like that it would make it simple to developers and make it less complex. It would make it easier to measure hardware and focus on doing them efficiently rather then keep them to run in longer time and slow cpu:s.

Only bad side on it some project like this would be target on cherry picking if task have to wide runtime spectrum and users could find a way to filter task/jobs. In already complex applications it might be possible to avoid it but don't know.
In the end it might be a way to solve it with to not accept aborting task and give penalty to host if so or work around system to it.

Maybe a solutions is to use both. Sixtrack could work work as it is but others application could work fixed credit.
4) Message boards : ATLAS application : credits for runtime, not for cputime ? (Message 44796)
Posted 23 Apr 2021 by Greger
Post:
I have step out until there have been some re-work on this and would like admins to take a look on this and project could be greylisted until it has been fixed

I think there will be no re-work of this matter ever. Here is why:

First of all the amount of work done by BOINC-Users is just a small fraction of all work being done by WLCG internally (although it's said to be comparable to a TIER-2 site). I think they just keep up the infrastructure for BOINC because it's free computing power to do simulations. And BOINC-Crunchers do nothing else but simulations for CERN (besides Sixtrack).

Secondly the credit issue just affects Gridcoin participants, which again are only a small fraction of BOINC-Crunchers. Normal BOINC-Crunchers don't really care about credits.

So why should they waste time (and therefore money) to fix something that only matters to a few cryptocurrency nerds? Don't get me wrong, i'm one of those nerds, but even i think this matter is not really important. I'm in for the science, not for the Gridcoins!


Maybe not and i do agree to that. Boinc may not be big part to LHC in most simulation but there work into build it for boinc and maintain it to open up for people to volunteer support big or small inspired me. The possibility to support with my system on complex workloads and able to use native is good move from them. Now if we gpu support it would be even better and boinc could have great power to add on project. It is good that have been in work to add Atlas-long and now gpu application.

It sad that it would be pointed to credit to value contribution and not fair shared to to real job done. It just that it is open for users to trick systems to gain 10x more. Credits is somewhat useful for non-gridcoin users also to compare contribution and that would be something that would stay. Some would be badge hunters and others would focus on low user-id and testing in beta. At end You would do good and have the honor as pure volunteer.

I like community as it is to Gridcoin and good feelings to to contribute project but also get something back. I would still encourage users to contribute to project and i would probably also do so if gpu application is added but would be more to test and run it in small scaled not 24/7 on all systems.

I hope boinc server devs or project admin could improve boinc credit options and focus on this issue. There have been several issues unsolved to native application but more to virtualbox. These jobs to longrunners have been better to Theory but remain still and each new users would experience it and troubleshoot them.
5) Message boards : ATLAS application : credits for runtime, not for cputime ? (Message 44795)
Posted 23 Apr 2021 by Greger
Post:
This is unfortunately that there is no mechanism to check task if operate as it should. Only thing i could come up with would be trickle info with wrapper and possibly make client to cancel task if any issues job occurred. It would probably not be easy to code perfect system but sure it could be better.

I would like to announce that this project have been greylisted to Gridcoin until until it this is solved.

It is sad but necessary action for now. I have spent many cpu hours to project and had good time to contribute to it. I hope we could get it solved add get back soon.

https://github.com/gridcoin-community/Gridcoin-Tasks/issues/218
6) Message boards : ATLAS application : credits for runtime, not for cputime ? (Message 44653)
Posted 4 Apr 2021 by Greger
Post:
Need to bring up this old thread as i have notice this issue remain unsolved and host does abuse it more each day. I have posted PM to admins on site to look up abnormal behavior from host and accounts attached to these.

As you say the credit would adjust by time to host but we see that host are generated and burned out after fetched some work and then re-send result back.
This may be a part of abuse of this "creditnew" and project may have additional way on top of it but looks like it could be abused as it is today.

So issue remain that task could be tricked still today and system could make 10 times what they should get or even more from generating several boinc instances and get high credit of runtime both for theory and atlas and it would not stop there is also additional ways to fiddling with runtime and cputime host offline before resend to server.
I have so far seen 2 accounts here that abused it hard and that is with same system to open up many hosts with same result with high runtime but low cputime. Task looks clean on task and unit and generate hits-file but workunits don't have no wingman to compare with and how credits are generated is wrong.
Theory task remain fixed to 2 days on these host but last week they manage to go from 1/10 of cpu time to equal as the runtime even so jobs remain low.
Atlas is mt task and tricked to 8 threads and device peak flops is set but running only with 1 thread.

This is negative to project in long run and should be investigated. We need to change how credits is counted and apply cpu usage into counting would step forward
I have put message to admins here and to other project affected also talked to devs and Gridcoin community to review credit systems. This track back to "creditnew" and possible issues of this system. A re-work on new or custom made credit solution may be needed.

I have step out until there have been some re-work on this and would like admins to take a look on this and project could be greylisted until it has been fixed
7) Questions and Answers : Unix/Linux : Unable to connect to server since Jan 2021 (Message 44563)
Posted 26 Mar 2021 by Greger
Post:
Might need to turn http debug or have a look in journalctl for boinc.

boinc block project somehow and host not attached at all.
8) Message boards : ATLAS application : ATLAS long simulation 1.00 (Message 44555)
Posted 26 Mar 2021 by Greger
Post:
Worked just fine with 48 threads.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=307102515
9) Message boards : Number crunching : WUs with Pending Validation (Message 44307)
Posted 13 Feb 2021 by Greger
Post:
Tasks in progress 176513
Workunits waiting for validation 101240
Workunits waiting for assimilation 492
Workunits waiting for file deletion 113890
Tasks waiting for file deletion 200536


and growing... it will probably be fixed at Monday.
10) Questions and Answers : Windows : Major memory leak in LHC@home (Message 44183)
Posted 23 Jan 2021 by Greger
Post:
So what you're saying is that it allocates the ram and marks it used when it is in fact not using that ram? That's not a very good way of doing that. it caused my computer to start paging ram into my hard drive. Is there any way to make it NOT allocate 10gigs of ram? and only use what it needs?


Task manager does not show VBoxSVC process tree with VBoxHeadless which should match ram it need and use. Windows simply do not show it clearly.
11) Message boards : Theory Application : Dumpinh historograms (Message 44122)
Posted 16 Jan 2021 by Greger
Post:
This would not be graphics for task in operation for boinc-client but i found this site few years ago and interesting to watch when LHC is in action:
http://meltronx.com/ [*Link is not official and treat as untrusted.]
Check it out daytime in UTC +1[/url]

*Checked with WOT and Virustotal
12) Message boards : CMS Application : Please check your task times and your IPv6 connectivity (Message 44035)
Posted 1 Jan 2021 by Greger
Post:
According to the Grafana-Monitoring each CMS job runs a bit longer than 3 h.
At the end it uploads a result file of about 120 MB.

This numbers can be used to estimate how many CMS jobs can be run concurrently to reach 100 % upload saturation:

1 Mbit/s: 11
5 Mbit/s: 56
10 Mbit/s: 112
20 Mbit/s: 225
50 Mbit/s: 562
250 Mbit/s: 2812


@Gunde
Your computer list shows that you might had more than 3000 active cores during the last 2 days.
If this is correct and all of them ran CMS this may have saturated your upload.


Thanks for.the info.
I limit most the hosts in app config. Small host around 10 and big host to 20 task concurrently running as core idle if it waiting for free memory. I would estimate that with all host combined would be able to do 140 task concurrently. When I checked manager it rarely hit this limit as I do other project. Very low in and out before atlas got back.
13) Message boards : CMS Application : Please check your task times and your IPv6 connectivity (Message 44027)
Posted 30 Dec 2020 by Greger
Post:
Yes that is the one. When i look vm it would reach section where check NTP time and fail before it start to boot.Some of them error out with UROOT no access for input/output rw others passed and success boot up and run as they should.
When i start one and wait 20-30 min then next it worked but when i let boinc handle it it seems to affect the others and busy. Somehow this system did not allow more then one boot up session concurrently.

I would need to wipe this system when other task are done. I have another that is identical with os and hardware but use 4.18.0-147.3.1.el8_1 instead of 4.18.0-147.5.1.el8_1. When i make changes or new setup i do exact same for all host so. Same Virtualbox version on all and should be identical. The hosID: 10629847 have 18 CMS running concurrently while this host could not handle 8.

Yes, I think 10629638 was the one that worried me. I did have a thought; do you have enough bandwidth to support as many jobs as you are running? There is a large amount of data downloaded at the start of each job (conditions database, etc.) and of course there is 70 MB or so of results returned at the end of the job. If you check through the message board, there are instructions on how to set up a caching proxy on a local machine{1} , which greatly reduces the amount of initial downloads that must come through the external network.


Got Squid running and used it around 2 years now it sure helps a lot for files and latency. I have 1 Gbit link to 10 hosts in Lan but WAN is limited to 250/250 mbit. I saw spike from squid to hosts when it fetch master files and .vdi file i hit limit on local speed the spike was close 1 Gib/s to host at that time.

Inside vm it takes 1-2 sec for small files but higher HTTP.HTTP_Proxy flows to vocms s1ral then when i run theory or atlas. I reach in total 200-300 flows right now with only CMS active.

No error at all on other host while this host have less then 1/4 success rate to start CMS. Would not think it would network to this host and would believe corruption or permission on this host only.

I have set No new task set so it would not bother CMS more.
14) Message boards : CMS Application : Please check your task times and your IPv6 connectivity (Message 44024)
Posted 30 Dec 2020 by Greger
Post:
I have one host that bothers me (hostid 10629638). I have monitor since yesterday and works if i start them slowly and limit with max concurrently running task. I have installed extension pack now on this one.

Others hosts looks to running fine since i installed but they would not have extension pack.
15) Message boards : CMS Application : Please check your task times and your IPv6 connectivity (Message 44020)
Posted 30 Dec 2020 by Greger
Post:
Atlas completely dry and Theory heavy loaded. Hosts fetch a lot CMS now i hope it could hold until Atlas is back. First time i see Atlas shutdown.

I have installed VirtualBox on my CentOS machines to help out with high failures on jobs. Strangely they are not detected in all boinc-clients but i got 8 out of 11 host running.
16) Questions and Answers : Windows : All vBox WU in error (Message 43985)
Posted 22 Dec 2020 by Greger
Post:
Thanks you have changed to default and so this issue is solved regarding:
2020-12-20 09:40:29 (48180): VM state change detected. (old = 'Running', new = 'Paused')
2020-12-20 09:40:39 (48180): VM state change detected. (old = 'Paused', new = 'Running')
2020-12-20 09:51:31 (48180): VM state change detected. (old = 'Running', new = 'Paused')
2020-12-20 09:51:41 (48180): VM state change detected. (old = 'Paused', new = 'Running')


II restarted several times this VM and as you can see there hasn't been one single computation stopped => which proves that the start of a VM doesn't need 65% of a 3700X AMD cpu to start. Which in itself means that gunde was wrong by saying, yeah it's certainly the cpu cap blahblahblah. No it is not. And the computation stops at the start of CMS tasks (some of them not all of them), is not normal either, especially since those VM aren't as power hungry than a ubuntu 20.10 with GUI activated..


I disagree. i never say it not main cause and computezrmle mention on post before so i did not mention it in my post. To start troubleshooting i did ask this ask you set it to be in default. in settings It run without vm being interrupted. That is all about throttling.
That it manage to start/resume is not same as throttle a process. When make call to task it pass it on to vm pause state and create snapshot on current state.

Putting a 100% cpu on project like that clogged the system

This is why we recommend to reduce cores/threads [Use at most X% of cpus] instead of using [Use at most X% in cpu time]

I restarted several times this VM and as you can see there hasn't been one single computation stopped


We can focus on other issues from log
VM Heartbeat file specified, but missing


I would leave it to Crystal and computezrmle but this issue:

2020-12-22 11:33:20 (56104): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2020-12-22 11:53:34 (56104): VM Heartbeat file specified, but missing.


From my experience this could be related to network loses connection or server it connects to.

BOINC needs a quick and easy way to know if the project's app is still running so the app periodically touches a disk file in .../../slots/shared. The period is ~60 seconds and the file is named heartbeat. Touching a file either creates the file or, if the file already exists, updates the 'last accessed" datetime.

heartbeat is zero-length (ie. it's empty). You should be able to see heartbeat in file manager. If not then either your username doesn't have the required permissions or it doesn't exist. Watch it's last accessed datetime and notice that it increases by 60 secs every 60 secs.

So the VMwrapper (or possibly the VM itelf? ) touches heartbeat every 60 secs. BOINC periodically looks at heartbeat. At that point the possible scenarios go something like this:

1. If BOINC cannot see heartbeat then it can reasonably assume either the app/VM never started or the app/VM deleted heartbeat then died.

2. If BOINC can see heartbeat and it's last accessed datetime has incremented from the previous time it looked at heartbeat then BOINC can be reasonably sure the VM still lives.

3. If heartbeat exists but last access datetime has not incremented then BOINC could assume the VM lives but it's hung or it could assume it's dead but it didn't delete heartbeat before it died.

In your case it sounds like BOINC is terminating the task because it has no heartbeat and appears to be dead.

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4700&postid=35222
17) Message boards : CMS Application : Ubuntu 20.04.1 and Hypervisor failed (Message 43978)
Posted 21 Dec 2020 by Greger
Post:
Good thanks for trying it.
18) Message boards : CMS Application : Ubuntu 20.04.1 and Hypervisor failed (Message 43976)
Posted 21 Dec 2020 by Greger
Post:
Old boinc need to be removed inactive PPA repo for Costaboinca. Go to software and update and uncheck line for that PPA then:

sudo apt-get remove boinc-client boinc-manager

And try again with same but install to get package from apt repo.
19) Message boards : CMS Application : Ubuntu 20.04.1 and Hypervisor failed (Message 43966)
Posted 21 Dec 2020 by Greger
Post:
Sorry need to add install into it.
sudo apt-get install boinc-client boinc-manager


The thing is that i would like to see if have any affect on how costa package handle virtualbox and 6.1.16 have been solid on ubuntu 20.01. Only experience it not always been detect after reboot in boinc-client.

On start make sure it detect and use correct version if not it would need a restart boinc service.
sudo service boinc-client restart
same as for 18.04.
20) Message boards : CMS Application : Ubuntu 20.04.1 and Hypervisor failed (Message 43963)
Posted 21 Dec 2020 by Greger
Post:
You mention Ubuntu 20.04.1 before in last post. If still have it could you try ubuntu apt package with boinc-client boinc-manager with virtualbox 6.1.16
Could you try
sudo apt-get boinc-client boinc-manager
and get directly from apt package.
Have been stable to LHC and Cosmology for me with these.

on 18.04 i don't know which release would be best but 6.1 is possible. You might use PPA from costamagnagianfranco or https://launchpad.net/ubuntu/+source/boinc
There could issue use them with virtualbox?
Updating kernel could be an issue you try adding dkms to virtualbox.
sudo apt-get install virtualbox-dkms 


Then use virtualbox package from https://www.virtualbox.org/wiki/Linux_Downloads


Next 20


©2024 CERN