Message boards : Number crunching : Server problem?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 802
Credit: 65,422,197
RAC: 24,359
Message 52660 - Posted: 14 Nov 2025, 10:03:10 UTC - in response to Message 52659.  

Now I see on one host that it is getting only 1 Theory task at a time. And there is only 1 task waiting for crunching. So number of tasks in progress is max_concurrent (from app_config) + 1.
Do you have set the number 1 in the prefs for Max # CPUs?

No, I have both hosts set to 4 max CPUs.

My assessment of the situation turns out to be wrong. The win10 host (Boinc 7.16.5) is still doing what I said above but the win11 host (Boinc 8.0.2) didn't get new Theory tasks and was getting free CPU cores. So I enabled the CMS work for that host and I just got 8 of those.
ID: 52660 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 931
Credit: 781,055,498
RAC: 84,602
Message 52666 - Posted: 16 Nov 2025, 17:10:59 UTC
Last modified: 16 Nov 2025, 17:11:30 UTC

There does seem to be something different with the number of WUs.

I have unlimited set for both Max # CPU & Max # jobs, with run all applications.

My higher end computers, have about 40-50 WUs and are requesting work but are not given more so they are not fully utilised.

The cap for CMS seems to be 8 and for 10 Theory, not sure for ATLAS as I don't have so many of these at the moment. so it seems like unlimited is no longer unlimited.

I don't max_concurrent set, just that Theory and ATLAS should only use 1 core per job/task.
ID: 52666 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1978
Credit: 160,478,250
RAC: 49,897
Message 52667 - Posted: 16 Nov 2025, 17:34:28 UTC

I am making the same experience: for one of my PCs (16 cores) I have set the max. number of Theory tasks to "unlimitied", but I get only 10. How come?
ID: 52667 · Report as offensive     Reply Quote
Garrulus glandarius

Send message
Joined: 5 Apr 25
Posts: 82
Credit: 2,392,565
RAC: 7,877
Message 52668 - Posted: 16 Nov 2025, 17:52:27 UTC - in response to Message 52667.  

I am making the same experience: for one of my PCs (16 cores) I have set the max. number of Theory tasks to "unlimitied", but I get only 10. How come?


Might indeed be a hard cap on the server side. I noticed one such cap at TN-Grid where each host can get at most 6 tasks/thread (not core), regardless of any other settings.
ID: 52668 · Report as offensive     Reply Quote
pututu

Send message
Joined: 13 May 17
Posts: 2
Credit: 29,309,391
RAC: 35
Message 52669 - Posted: 16 Nov 2025, 18:18:27 UTC - in response to Message 52666.  
Last modified: 16 Nov 2025, 18:37:45 UTC

I'm also seeing a cap of 8 CMS tasks per PC irrespective of the cpu core count or the max work cache set. The only way to feed your PC with 100% utilization is to run multiple boinc clients if you have high core count setup with sufficient RAM and wanting to run CMS tasks only.

WIth multiple boinc clients, on linux machines I've not seeing this VBoxManage.exe error about registering/attaching the CMS_2025_04_08_prod.vdi virtual hard disk (maybe the first task but ok subsequently) but not on Windows machine.

I prefer to run CMS as the run times don't fluctuate as much as Theory tasks with CMS task seems to have a cap on the run time of 64,800 seconds. I had a few Theory tasks that run for days.
ID: 52669 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 802
Credit: 65,422,197
RAC: 24,359
Message 52670 - Posted: 16 Nov 2025, 19:01:27 UTC

I'm seeing the same caps for Theory and CMS here as well. But Atlas seems different. Last new tasks from Atlas were on 14th of November and I received 37 for a 8/16 core CPU and 22 for 16/32 core CPU. The limit on server side used to be 16 for both of those CPUs. There wasn't enough tasks available to actually see what is the limit now but definitely the limits have changed for Atlas too.
ID: 52670 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2739
Credit: 301,817,012
RAC: 83,107
Message 52671 - Posted: 16 Nov 2025, 19:44:08 UTC - in response to Message 52669.  

WIth multiple boinc clients, on linux machines I've not seeing this VBoxManage.exe error about registering/attaching the CMS_2025_04_08_prod.vdi virtual hard disk (maybe the first task but ok subsequently) but not on Windows machine.

This was not related to the number of BOINC instances running on the same host.
Instead, it was related to a possible race condition related to vboxwrapper.

Vboxwrapper 26210 used here for ATLAS/CMS/Theory mitigates/recovers from those errors on Apple/Linux/Windows.
ID: 52671 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 931
Credit: 781,055,498
RAC: 84,602
Message 52673 - Posted: 18 Nov 2025, 8:13:49 UTC

ATLAS got a bit more work so I have an update:

Name                           Value
----                           -----
Theory                         19
CMS                            6
ATLAS                          0

Theory                         7
CMS                            3
ATLAS                          1

Theory                         10
CMS                            8
ATLAS                          40

Theory                         16
CMS                            8
ATLAS                          9

Theory                         10
CMS                            8
ATLAS                          2

Theory                         10
CMS                            8
ATLAS                          40


Seems like maybe ATLAS cap is, 40.
ID: 52673 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 802
Credit: 65,422,197
RAC: 24,359
Message 52674 - Posted: 18 Nov 2025, 8:38:29 UTC - in response to Message 52673.  

To my liking 40 is too high especially with these 1000 event tasks.
ID: 52674 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 931
Credit: 781,055,498
RAC: 84,602
Message 52677 - Posted: 18 Nov 2025, 17:06:51 UTC - in response to Message 52674.  
Last modified: 19 Nov 2025, 20:18:42 UTC

In theory, BOINC, will send out tasked based upon preferences, e.g. if if you store 0.1 day of work and the tasks take 10 days, then you would not be storing many so a cap of 40 would not be reached easily.

e.g.

LHC@home	11/19/2025 9:17:41 PM	Not requesting tasks: don't need (CPU: job cache full; Intel GPU: no applications)
ID: 52677 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 931
Credit: 781,055,498
RAC: 84,602
Message 52685 - Posted: 21 Nov 2025, 21:55:05 UTC

Things seem to have gone back to normal.

For Therory, 4/6 have more than 10, the ones with less than 10 are probally as Harri and I were discussing, they are limited by there compute speed and small cache size.

For CMS, 1/6 has more than 8, so could be.

For ATLAS, none have more than 40 so this could still be the cap.
ID: 52685 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 931
Credit: 781,055,498
RAC: 84,602
Message 52749 - Posted: 14 Dec 2025, 13:57:54 UTC

caps are back to 10,40,8
ID: 52749 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1978
Credit: 160,478,250
RAC: 49,897
Message 52750 - Posted: 14 Dec 2025, 18:26:12 UTC - in response to Message 52749.  

caps are back to 10,40,8
I don't understand at all the cap of 10 for Theory - why? One of my rigs has 2 CPUs 8 cores each, so I could crunch 16 Theory tasks concurrently. Okay, I switched to 4 CMS concurrently.
ID: 52750 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 931
Credit: 781,055,498
RAC: 84,602
Message 52805 - Posted: 29 Dec 2025, 18:26:48 UTC

If it was multicore then it would make a little more sense but still.

The top 5 computers on BOINCStats by RAC have at least 16 cores/32 threads so its not even enough to max out these if a user just wanted to run Theroy.

The top computers are running 256 threads so 10 is a few % of the total capacity of these machines
ID: 52805 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 802
Credit: 65,422,197
RAC: 24,359
Message 53027 - Posted: 13 Feb 2026, 11:26:30 UTC

Getting a lot of Atlas download errors at the moment. First the server aborted all my downloaded Atlas tasks and now new downloads are failing.
ID: 53027 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 931
Credit: 781,055,498
RAC: 84,602
Message 53030 - Posted: 13 Feb 2026, 18:01:26 UTC

Yes, seems so.
ID: 53030 · Report as offensive     Reply Quote
CloverField

Send message
Joined: 17 Oct 06
Posts: 99
Credit: 65,247,244
RAC: 13,298
Message 53147 - Posted: 10 Mar 2026, 12:20:28 UTC

Are any of guys getting gateway timeouts around this time everyday? They clear up after a bit but I think it's been happening every day for me for about the past week?
ID: 53147 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 802
Credit: 65,422,197
RAC: 24,359
Message 53148 - Posted: 10 Mar 2026, 12:29:08 UTC

https://grafana.kiska.pw/d/boinc/boinc?orgId=1&var-project=lhc@home&from=now-7d&to=now&refresh=30m
See the Status and Response time graphs on that page. There are clearly times when it takes a long time the project to respond but I don't know if times correlate your observations.
ID: 53148 · Report as offensive     Reply Quote
CloverField

Send message
Joined: 17 Oct 06
Posts: 99
Credit: 65,247,244
RAC: 13,298
Message 53149 - Posted: 10 Mar 2026, 14:05:25 UTC - in response to Message 53148.  

So I think they line up with when I'm seeing the timeouts but it's happening over night so I just wake up to a backed up upload queue. However it looks like there's a new problem.
I'm seeing server cant open data base in the event logs.
ID: 53149 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1978
Credit: 160,478,250
RAC: 49,897
Message 53151 - Posted: 10 Mar 2026, 16:53:52 UTC - in response to Message 53148.  

In reply to Harri Liljeroos's message of 10 Mar 2026:
https://grafana.kiska.pw/d/boinc/boinc?orgId=1&var-project=lhc@home&from=now-7d&to=now&refresh=30m
See the Status and Response time graphs on that page. There are clearly times when it takes a long time the project to respond but I don't know if times correlate your observations.
exactly - response times have again been bad lately :-(
ID: 53151 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Server problem?


©2026 CERN