Message boards : ATLAS application : queue is empty
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 · Next

AuthorMessage
hadron

Send message
Joined: 4 Sep 22
Posts: 74
Credit: 9,961,959
RAC: 23,708
Message 48633 - Posted: 23 Sep 2023, 3:28:37 UTC - in response to Message 48626.  

Furthermore, it helps if one's contribution to the effort is appreciated by the people who run the projects. The only way I can see for such appreciation to be seen on an ongoing basis is through the credit system.
Cash would be a show of appreciation. Then again there are those who ran Collatz for credit and didn't believe there was a point to the maths. "It takes all sorts to make a world" as my gran used to say.

Well, since it's difficult to transfer cash in a TCP packet....
And I really don't care what the credit wh*res are doing. I run only those projects that are of personal interest to me, and I'm not doing any of this for any bragging rights. If I wanted that, I would have kitted myself out with a Threadripper 5995WX on something like an ASUS Pro WS WRX80E packing 128GB of RAM and a pair of video cards each with 24 GB of memory and something like a Radeon RX 7900 XTX GPU. But even with a system like that, I probably still wouldn't have even half of what some of those RAC wh*res have in their computing arsenal.
As I said, a steady graph of the RAC in the boinc manager statistics graphs shows everything is probably working according to plan -- while a sudden decline in the RAC can indicate some problem(s) in just one quick glance. That is the only reason I need to see the RAC at all, and the only reason I would like the credit per task remain at a constant value per CPU hour -- something which, at present, is certainly not the case with Atlas tasks.
ID: 48633 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 429
Credit: 6,612,148
RAC: 42,166
Message 48636 - Posted: 23 Sep 2023, 3:43:04 UTC - in response to Message 48633.  
Last modified: 23 Sep 2023, 3:46:37 UTC

Well, since it's difficult to transfer cash in a TCP packet....
Gridcoin is a nice idea, if it paid anything like the electric cost.

And it would be easy enough to allow anyone over a certain RAC to apply for an account, give paypal address, and get regular payments. But apparently a huge place like CERN can't afford it.

probably still wouldn't have even half of what some of those RAC wh*res have in their computing arsenal.
ROFL! And you can write hoars, that's lots of frost :-)

As for looking for errors, usually a computation error pops up in yellow on my list. Or I see the CPU% bar drop down and turn white. Or something runs a longer time than expected. I also look at the MSI afterburner graph on every machine each morning, using remote desktop, to make sure nothing is overheating, the CPU isn't throttling the GPU, and all the chips are running flat out.

RAC wouldn't work for me, I keep switching projects, and run projects with a non-constant supply of work.

I do hate acronyms: "1907: Given royal approval by Edward VII becoming the Royal Automobile Club", no wonder they're so expensive, probably still paying tax to the king.
ID: 48636 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1725
Credit: 108,408,581
RAC: 90,279
Message 48637 - Posted: 23 Sep 2023, 4:33:08 UTC - in response to Message 48621.  

The credit system that it AFAIK uses is a bit weird. https://boinc.berkeley.edu/trac/wiki/CreditNew
If i remember right did the credit per Workunit move around quite a bit when a new version was released. Guess right now with more users that come back cause of new ATLAS work and the different long tasks(2000events and it sounds like the new are shorter) is the change bigger. But AFIAK did it smooth out after a few days of Atlas running with constant flow of work.
this time, the credit seems to erode much faster that it did so far.
Yesterday evening, I got 517 points for a given task, this morning it was 204 points. Of course: same host, same number of cores, and even exactly same runtime. If it goes on like this, it will soon end up with zero points :-)
ID: 48637 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 74
Credit: 9,961,959
RAC: 23,708
Message 48646 - Posted: 23 Sep 2023, 19:28:47 UTC - in response to Message 48636.  

As for looking for errors, usually a computation error pops up in yellow on my list. Or I see the CPU% bar drop down and turn white. Or something runs a longer time than expected. I also look at the MSI afterburner graph on every machine each morning, using remote desktop, to make sure nothing is overheating, the CPU isn't throttling the GPU, and all the chips are running flat out.

RAC wouldn't work for me, I keep switching projects, and run projects with a non-constant supply of work.

OK, so you do a detailed check first thing every morning. I check the statistics graphs.
I don't know what you're using where those yellow flags happen -- VirtualBox perhaps? That is my last line of "defence" if I find a problematic task in the boinc manager.
I don't go to that depth unless I see something that suggests a problem somewhere. That, as I said, often begins with a glance at the statistics graphs. Then, I check the tasks for any project where the RAC has been dropping.
I do not switch projects. I run those which are doing things of interest to me -- LHC, Einstein, Cosmology and Rosetta. I have settled on those, and they are not likely to change, at least not in the near future.
Oh, if I had one of those 64-core Threadrippers I mentioned earlier (yeah, I wish!!) then I would also add in a project dealing with climate change. But that isn't going to be happening any time soon -- the CPU alone costs around $10K here in Canada.
ID: 48646 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 429
Credit: 6,612,148
RAC: 42,166
Message 48656 - Posted: 24 Sep 2023, 15:44:32 UTC - in response to Message 48646.  
Last modified: 24 Sep 2023, 16:35:39 UTC

OK, so you do a detailed check first thing every morning. I check the statistics graphs.
That doesn't show things up accurately. You're just looking at how well you've done over the last month.

And it can't work with Rosetta, as their work is on and off. So your RAC there will jump about randomly, and it will affect your RAC here when your computers go and do some Rosetta instead of LHC.

I don't know what you're using where those yellow flags happen -- VirtualBox perhaps? That is my last line of "defence" if I find a problematic task in the boinc manager.
Yellow is the colour I chose to show tasks saying "computation error". Lots of projects do that, usually the projects fault, but can be a dodgy or overheating CPU/GPU.

I don't go to that depth unless I see something that suggests a problem somewhere. That, as I said, often begins with a glance at the statistics graphs. Then, I check the tasks for any project where the RAC has been dropping.
I do not switch projects. I run those which are doing things of interest to me -- LHC, Einstein, Cosmology and Rosetta.
I get bored on one, or miss doing another. I want 100 machines. Too much to get done!

Cosmology is causing me a problem. They only work with VB 5. Uspex needs VB 6 or later. LHC will run on anything.

I have settled on those, and they are not likely to change, at least not in the near future.
Oh, if I had one of those 64-core Threadrippers I mentioned earlier (yeah, I wish!!) then I would also add in a project dealing with climate change. But that isn't going to be happening any time soon -- the CPU alone costs around $10K here in Canada.
I've got two 24 core Ryzens, that will do for now. Money is saving up for a new house.

The only Climate change I know of is CPDN. Their tasks are so rare you might aswell join it and get them when they are available. You're best joining with windows and linux as one subproject is on each (I've got Boinc running on linux in a VB inside Windows on the two fastest machines, so I can get tasks from either of their subprojects).
ID: 48656 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 74
Credit: 9,961,959
RAC: 23,708
Message 48661 - Posted: 25 Sep 2023, 19:04:32 UTC - in response to Message 48656.  

@Mr P Hucker
I only allow 2 concurrent Rosetta tasks, so it doesn't affect the other projects all that much. I also guess I didn't fully explain my rationale in checking at the graphs as a first indication of problems. Of course, there are constant fluctuations in the RAC from day to day -- it is the very large drops in RAC that catch my attention. These almost always suggest something that needs further checking. For example, Cosmology tasks sometimes become "stuck"; the VM becomes unmanageable, so boinc postpones further calculations for a full day, 86400 seconds.
During that time, boinc will run any Cosmology tasks that are in queue, but will not request any new ones. In my experience, it also appears that it does not report completed tasks. Given the short run times of these tasks, the queue is quickly emptied of Cosmology tasks. When this happens, the Cosmology RAC will quickly decline by a significant amount. Of course, this issue is also immediately apparent if I just look at the tasks list, but this is not the only problem I've encountered.
Recently, I got quite a few Theory tasks that were all running for just over 9 days, with quite some time showing for the estimated time of completion. With no Atlas tasks available at the time to take up the slack, the LHC RAC was dropping dramatically. In boinc, all looked to be OK -- CPU time was increasing along with run time, checkpoints appeared to be happening regularly, but the time to completion just kept creeping up. Now it was clearly time to check VirtualBox, where I saw the VM was still running, but the guest CPU usage was 0. Time to abort them all, which I did. Did I leave them too long before going to VB? Probably. However, Alt-F2 has never worked here. To gain direct VB access, I had to hack the boinc account to make it a login account, and run VirtualBox in a kdesu shell in my personal account:
kdesu -u boinc VirtualBox.
Once there, moving around is not all that easy -- so I tend to use it as a method of last resort.

The only 24-core Ryzen CPUs I am aware of are a couple of ThreadRippers. I'm happy for you that you can afford not just one, but two, CPUs that combined must have set you back nearly US$3500 :D
ID: 48661 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 429
Credit: 6,612,148
RAC: 42,166
Message 48667 - Posted: 27 Sep 2023, 7:48:19 UTC - in response to Message 48661.  

For example, Cosmology tasks sometimes become "stuck"; the VM becomes unmanageable, so boinc postpones further calculations for a full day, 86400 seconds.
You're using VB 7. Cosmology hates anything newer than 5. LHC works fine on 5. Only Uspex requires 6 or later. Or you can use the legacy Cosmology tasks, but they're not very fast.

During that time, boinc will run any Cosmology tasks that are in queue, but will not request any new ones. In my experience, it also appears that it does not report completed tasks. Given the short run times of these tasks, the queue is quickly emptied of Cosmology tasks. When this happens, the Cosmology RAC will quickly decline by a significant amount. Of course, this issue is also immediately apparent if I just look at the tasks list, but this is not the only problem I've encountered.
If you use Boinctasks, there's a % CPU usage column. I can see immediately something isn't processing.

The only 24-core Ryzen CPUs I am aware of are a couple of ThreadRippers. I'm happy for you that you can afford not just one, but two, CPUs that combined must have set you back nearly US$3500 :D
I have Ryzen 9 3900X and Ryzen 9 3900XT. When I say core I mean thread. Yes I know technically they're not cores, but they behave as such. Hell Boinc calls them CPUs!
ID: 48667 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 74
Credit: 9,961,959
RAC: 23,708
Message 48669 - Posted: 27 Sep 2023, 18:31:08 UTC - in response to Message 48667.  

The problem with Cosmology tasks is so rare it's not worth my effort to change VB versions; ver 7 is what comes with the distro, so I use it. The biggest problem is that one-day delay in restarting the postponed tasks. I haven't found a way to change that. I could probably ask in the forums on the Cosmology or Boinc websites, but it's just as easy to abort the problem task and let things carry on.

There is no boinctasks anywhere in the packages available from my distro, nor any package by that name. Besides, if it is a command-line util, I prefer to use a gui when one is available.

AFAIK, for quite some time now, each CPU core has its own math unit, which is shared between the threads the core includes. I guess I'm using the most restrictive possible definition of "core" -- if it includes a math unit, it is worthy of being called a CPU :D
Recently, I've seen the odd report of upcoming processors that may feature one math unit per thread. If that comes about, then your definition of a CPU and mine will coincide :D
ID: 48669 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 429
Credit: 6,612,148
RAC: 42,166
Message 48670 - Posted: 27 Sep 2023, 22:41:50 UTC - in response to Message 48669.  

The problem with Cosmology tasks is so rare it's not worth my effort to change VB versions
Odd, with me and others complaining in the forum, they break very often. Every single day I had to clear out jammed ones. Problem went away when I went to v5. Maybe only the Windows VB has this problem?

ver 7 is what comes with the distro, so I use it. The biggest problem is that one-day delay in restarting the postponed tasks.
Yes, that was my problem. Once there are loads of them, the queue runs out and I have to nudge them. So if a computer is unattended too long, no Cosmology gets done. V5 just works. Oracle broke it after that. When creating a new version of something, always make it compatible with older things.

I haven't found a way to change that. I could probably ask in the forums on the Cosmology or Boinc websites, but it's just as easy to abort the problem task and let things carry on.
Restarting Boinc resets the 1 day timer. There must be a way to make it shorter, but I think I asked and it's hard coded.

There is no boinctasks anywhere in the packages available from my distro, nor any package by that name. Besides, if it is a command-line util, I prefer to use a gui when one is available.
Google is your friend.
https://efmer.com/boinctasks/boinctasks-flavours/ - it's a GUI. Don't expect everything to be in the holy repository.

AFAIK, for quite some time now, each CPU core has its own math unit, which is shared between the threads the core includes. I guess I'm using the most restrictive possible definition of "core" -- if it includes a math unit, it is worthy of being called a CPU :D
If you call a core a CPU, what do you call the whole CPU?

You can often use all the threads, since the maths unit isn't used 100% of the time, there are memory accesses etc going on aswell, so I treat each thread as a core, so does Boinc.

Recently, I've seen the odd report of upcoming processors that may feature one math unit per thread. If that comes about, then your definition of a CPU and mine will coincide :D
If they have a mathSSS unit per thread, what will they continue to share?
ID: 48670 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 74
Credit: 9,961,959
RAC: 23,708
Message 48671 - Posted: 27 Sep 2023, 23:45:33 UTC - in response to Message 48670.  

The problem with Cosmology tasks is so rare it's not worth my effort to change VB versions
Odd, with me and others complaining in the forum, they break very often. Every single day I had to clear out jammed ones. Problem went away when I went to v5. Maybe only the Windows VB has this problem?

ver 7 is what comes with the distro, so I use it. The biggest problem is that one-day delay in restarting the postponed tasks.
Yes, that was my problem. Once there are loads of them, the queue runs out and I have to nudge them. So if a computer is unattended too long, no Cosmology gets done. V5 just works. Oracle broke it after that. When creating a new version of something, always make it compatible with older things.

Or, the problem simply isn't so prevalent with the Linux version. However, I don't know about other distros than opensuse.

I haven't found a way to change that. I could probably ask in the forums on the Cosmology or Boinc websites, but it's just as easy to abort the problem task and let things carry on.
Restarting Boinc resets the 1 day timer. There must be a way to make it shorter, but I think I asked and it's hard coded.[/quote]
If I restart boinc-client, the stalled task always restarts OK, and things return to normal. I have never had one stall a second time. However, does that not open up the possibility that some LHC tasks will then fail due to a compilation error? At least I have seen a few fail after a restart.
All in all, it seems to me to be best just to abort the errant Cosmology task and carry on.

There is no boinctasks anywhere in the packages available from my distro, nor any package by that name. Besides, if it is a command-line util, I prefer to use a gui when one is available.
Google is your friend.
https://efmer.com/boinctasks/boinctasks-flavours/ - it's a GUI. Don't expect everything to be in the holy repository.

Normally I don't go looking for things I have not heard of ;)
Anyway, boincmgr does all that I need to see right now, but thanks for the info about boinctasks. It definitely looks interesting; I'll be sure to keep it in mind if I want to start looking deeper than boincmgr allows.

AFAIK, for quite some time now, each CPU core has its own math unit, which is shared between the threads the core includes. I guess I'm using the most restrictive possible definition of "core" -- if it includes a math unit, it is worthy of being called a CPU :D
If you call a core a CPU, what do you call the whole CPU?

You can often use all the threads, since the maths unit isn't used 100% of the time, there are memory accesses etc going on aswell, so I treat each thread as a core, so does Boinc.

Recently, I've seen the odd report of upcoming processors that may feature one math unit per thread. If that comes about, then your definition of a CPU and mine will coincide :D
If they have a mathSSS unit per thread, what will they continue to share?

From Intel (see ref 2 at https://en.wikipedia.org/wiki/Central_processing_unit#cite_note-intel-pcm-2):
A thread is a logical, or virtual, CPU
A core is a (possibly multithreaded) CPU
The big chip that holds them all is a multi-core processor.

"one math unit per thread"
I have no idea what such a processor would be like. Maybe (big guess on my part) they are working on multi-threaded FPUs?
ID: 48671 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2159
Credit: 164,500,349
RAC: 168,704
Message 48676 - Posted: 28 Sep 2023, 15:51:37 UTC

New Tasks are dropping to Zero since one or two hours.
ID: 48676 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1143
Credit: 50,032,883
RAC: 6,260
Message 48679 - Posted: 29 Sep 2023, 2:18:35 UTC - in response to Message 48678.  
Last modified: 29 Sep 2023, 2:23:35 UTC

lol I see nothing.....nothing
ID: 48679 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2159
Credit: 164,500,349
RAC: 168,704
Message 48680 - Posted: 29 Sep 2023, 4:02:43 UTC - in response to Message 48678.  

I responded to all this, and somebody silently deleted it, fuck LHC. I'm off to do work on a respectable project with a programmer without OCD and a modicum of common sense.

queue is empty
This is the header of this Thread.
Mr. Hucker,
there are rules and you have to respect them!!!
ID: 48680 · Report as offensive     Reply Quote
swiftmallard

Send message
Joined: 30 Dec 13
Posts: 1
Credit: 451,563
RAC: 0
Message 48686 - Posted: 29 Sep 2023, 11:55:51 UTC - in response to Message 48669.  

The biggest problem is that one-day delay in restarting the postponed tasks. I haven't found a way to change that. I could probably ask in the forums on the Cosmology or Boinc websites, but it's just as easy to abort the problem task and let things carry on.


Error messages can show a task has been postponed for a number of reasons. My experience from crunching QuChemPedIA to restart postponed tasks is to shut down Boinc. Open VirtualBox and make certain all VMs close, that can take a few seconds. Then restart Boinc.
I have only just started crunching LHC, but that same procedure has worked for me here when I was crunching Theory tasks.
ID: 48686 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2159
Credit: 164,500,349
RAC: 168,704
Message 48753 - Posted: 4 Oct 2023, 17:00:19 UTC - in response to Message 48680.  
Last modified: 4 Oct 2023, 17:27:44 UTC

queue is empty.
Is back again, thank you.
ID: 48753 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2159
Credit: 164,500,349
RAC: 168,704
Message 48754 - Posted: 4 Oct 2023, 18:16:57 UTC - in response to Message 48753.  

got only three Rescheduler. queue is empty.
ID: 48754 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2159
Credit: 164,500,349
RAC: 168,704
Message 48756 - Posted: 5 Oct 2023, 15:30:52 UTC - in response to Message 48754.  

New Atlas vdi have 670 MByte instead of 1.5 GByte.
Thank you for investigation.
Magic you can now starting Tasks.
ID: 48756 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1143
Credit: 50,032,883
RAC: 6,260
Message 48757 - Posted: 5 Oct 2023, 20:31:55 UTC - in response to Message 48756.  

New Atlas vdi have 670 MByte instead of 1.5 GByte.
Thank you for investigation.
Magic you can now starting Tasks.

ID: 48757 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1143
Credit: 50,032,883
RAC: 6,260
Message 48758 - Posted: 5 Oct 2023, 21:58:17 UTC - in response to Message 48757.  



Well that should take 90 minutes but it is 3pm and I have nothing better to do here right now.
ID: 48758 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1333
Credit: 8,810,875
RAC: 5,554
Message 48759 - Posted: 6 Oct 2023, 6:08:56 UTC - in response to Message 48756.  

New Atlas vdi have 670 MByte instead of 1.5 GByte.
Thank you for investigation.
Magic you can now starting Tasks.

It's not the vdi file with that size. That is much bigger. About 4 GB unzipped.

The 670MB file is the pool.root file coming with every new task.
ID: 48759 · Report as offensive     Reply Quote
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 · Next

Message boards : ATLAS application : queue is empty


©2024 CERN