1) Message boards : ATLAS application : No HITS File But Still Granted Credit? (Message 47567)
Posted 11 days ago by computezrmle
Post:
Is this correct?

Yes.
Under certain circumstances the scripts can't reliably detect where an error comes form.
In those cases the project grants credit although it doesn't get it's own reward (the HITS file).

In the past few users complained about that and suggested not to reward the user in any case of an error.
But the project team decided to do it as it is now.


What can be seen from one of your logs is a long break starting during the task's setup phase.
This might have played a role (just a guess, without clear evidence).
You may try to avoid those breaks, especially the long ones.
2) Message boards : Theory Application : Madgraph fails due to compilation errors (Message 47558)
Posted 15 days ago by computezrmle
Post:
Noticed a madgraph task (native Theory) that fails with a series of compilation errors.
The task then remains idle until the 10 day limit is reached.
The first of those errors looks like this:

WARNING: fct <function compile_dir at 0x7fbefe4c3758> does not return 0. Stopping the code in a clean way. The error was:
A compilation Error occurs when trying to compile /shared/tmp/tmp.Rx4cI30Y1M/MG5RUN/SubProcesses/P0_uux_epem.
The compilation fails with the following output message:
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ parton_lum_1.f
    run.inc:74:21:
    
           common/to_rwgt/ do_rwgt_scale, rw_Fscale_down, rw_Fscale_up, rw_Rscale_down, rw_Rscale_up,
                         1
    Warning: Padding of 4 bytes required before 'rw_fscale_down' in COMMON 'to_rwgt' at (1); reorder elements or use -fno-align-commons [-Walign-commons]
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ parton_lum_2.f
    run.inc:74:21:
    
           common/to_rwgt/ do_rwgt_scale, rw_Fscale_down, rw_Fscale_up, rw_Rscale_down, rw_Rscale_up,
                         1
    Warning: Padding of 4 bytes required before 'rw_fscale_down' in COMMON 'to_rwgt' at (1); reorder elements or use -fno-align-commons [-Walign-commons]
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ parton_lum_3.f
    run.inc:74:21:
    
           common/to_rwgt/ do_rwgt_scale, rw_Fscale_down, rw_Fscale_up, rw_Rscale_down, rw_Rscale_up,
                         1
    Warning: Padding of 4 bytes required before 'rw_fscale_down' in COMMON 'to_rwgt' at (1); reorder elements or use -fno-align-commons [-Walign-commons]
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ parton_lum_chooser.f
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ matrix_1.f
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ matrix_2.f
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ matrix_3.f
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ real_me_chooser.f
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ leshouche_inc_chooser.f
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ fks_inc_chooser.f
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ setcuts.f
    setcuts.f:599:124:
    
                    write (*,'(a7,x,i3,x,i5,x,a1,3(e12.5,x)))') 'tau_min'
                                                                                                                                1
    Warning: Extraneous characters in format at (1)
    setcuts.f:603:199:
    
                    write (*,'(a7,x,i3,x,i5,x,a1,e12.5,x,a13,e12.5,x))')
                                                                                                                                                                                                           1
    Warning: Extraneous characters in format at (1)
    run.inc:74:21:
    
           common/to_rwgt/ do_rwgt_scale, rw_Fscale_down, rw_Fscale_up, rw_Rscale_down, rw_Rscale_up,
                         1
    Warning: Padding of 4 bytes required before 'rw_fscale_down' in COMMON 'to_rwgt' at (1); reorder elements or use -fno-align-commons [-Walign-commons]
    run.inc:74:21:
    
           common/to_rwgt/ do_rwgt_scale, rw_Fscale_down, rw_Fscale_up, rw_Rscale_down, rw_Rscale_up,
                         1
    Warning: Padding of 4 bytes required before 'rw_fscale_down' in COMMON 'to_rwgt' at (1); reorder elements or use -fno-align-commons [-Walign-commons]
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ setscales.f
    run.inc:74:21:
    
           common/to_rwgt/ do_rwgt_scale, rw_Fscale_down, rw_Fscale_up, rw_Rscale_down, rw_Rscale_up,
                         1
    Warning: Padding of 4 bytes required before 'rw_fscale_down' in COMMON 'to_rwgt' at (1); reorder elements or use -fno-align-commons [-Walign-commons]
    run.inc:74:21:
    
           common/to_rwgt/ do_rwgt_scale, rw_Fscale_down, rw_Fscale_up, rw_Rscale_down, rw_Rscale_up,
                         1
    Warning: Padding of 4 bytes required before 'rw_fscale_down' in COMMON 'to_rwgt' at (1); reorder elements or use -fno-align-commons [-Walign-commons]
    run.inc:74:21:
    
           common/to_rwgt/ do_rwgt_scale, rw_Fscale_down, rw_Fscale_up, rw_Rscale_down, rw_Rscale_up,
                         1
    Warning: Padding of 4 bytes required before 'rw_fscale_down' in COMMON 'to_rwgt' at (1); reorder elements or use -fno-align-commons [-Walign-commons]
    run.inc:74:21:
    
           common/to_rwgt/ do_rwgt_scale, rw_Fscale_down, rw_Fscale_up, rw_Rscale_down, rw_Rscale_up,
                         1
    Warning: Padding of 4 bytes required before 'rw_fscale_down' in COMMON 'to_rwgt' at (1); reorder elements or use -fno-align-commons [-Walign-commons]
    setscales.f:411:27:
    
           common/c_FxFx_scales/FxFx_ren_scales,nFxFx_ren_scales
                               1
    Warning: Padding of 4 bytes required before 'fxfx_fac_scale' in COMMON 'c_fxfx_scales' at (1); reorder elements or use -fno-align-commons [-Walign-commons]
    run.inc:74:21:
    
           common/to_rwgt/ do_rwgt_scale, rw_Fscale_down, rw_Fscale_up, rw_Rscale_down, rw_Rscale_up,
                         1
    Warning: Padding of 4 bytes required before 'rw_fscale_down' in COMMON 'to_rwgt' at (1); reorder elements or use -fno-align-commons [-Walign-commons]
    setscales.f:236:27:
    
           common/c_FxFx_scales/FxFx_ren_scales,nFxFx_ren_scales
                               1
    Warning: Padding of 4 bytes required before 'fxfx_fac_scale' in COMMON 'c_fxfx_scales' at (1); reorder elements or use -fno-align-commons [-Walign-commons]
    run.inc:74:21:
    
           common/to_rwgt/ do_rwgt_scale, rw_Fscale_down, rw_Fscale_up, rw_Rscale_down, rw_Rscale_up,
                         1
    Warning: Padding of 4 bytes required before 'rw_fscale_down' in COMMON 'to_rwgt' at (1); reorder elements or use -fno-align-commons [-Walign-commons]
    run.inc:74:21:
    
           common/to_rwgt/ do_rwgt_scale, rw_Fscale_down, rw_Fscale_up, rw_Rscale_down, rw_Rscale_up,
                         1
    Warning: Padding of 4 bytes required before 'rw_fscale_down' in COMMON 'to_rwgt' at (1); reorder elements or use -fno-align-commons [-Walign-commons]
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ born.f
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ sborn_sf.f
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ b_sf_001.f
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/gfortran -O -fno-automatic -ffixed-line-length-132 -c -I. -I../../lib/ fks_Sij.f
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/bin/g++ -O   -c -I. fastjetfortran_madfks_core.cc
    In file included from fjcore.hh:476,
                     from fastjetfortran_madfks_core.cc:30:
    /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/include/c++/8.3.0/valarray:89:10: fatal error: /cvmfs/sft.cern.ch/lcg/releases/gcc/8.3.0-cebb0/x86_64-slc6/include/c++/8.3.0/bits/valarray_array.h: Input/output error
     #include <bits/valarray_array.h>
              ^~~~~~~~~~~~~~~~~~~~~~~
    compilation terminated.
    make: *** [fastjetfortran_madfks_core.o] Error 1
3) Questions and Answers : Windows : "Work done" and "Avg. work done" not updating (Message 47555)
Posted 17 days ago by computezrmle
Post:
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10815829

This computer reports only 7.9 GB RAM.
Even if VirtualBox is correctly installed it will always suffer from overload if ATLAS runs on it.
Especially in a multicore setup.
CMS may also be too heavy.

Best would be to enable only SixTrack and Theory.
4) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 47549)
Posted 20 days ago by computezrmle
Post:
I did a tail -f to see access.log in real time.
...
While uploading the screen scrolls slower then normal.

You may try to prefix the tail command with "stdbuf -oL " to get a line buffered output



At the end of a ATLAS WU there are a lot of get requests to atlascern-frontier.openhtc.io.

They are also at the beginning of each ATLAS task.
The huge number of requests like those are among the reasons why a local proxy is recommended.
Unlike CVMFS Frontier does not have it's own local cache.



I hate software that takes full CPU doing about nothing.

Well, it processes the data transfer.
That's far away from doing nothing.
:-)

Without a Squid those CPU seconds would be included in other processes' accounting, e.g. BOINC, Vbox or ATLAS itself.

Large downloads are also affected.
Even the internet router's monitoring shows a significant increase in CPU load.
It's mostly not at 100% but I would guess this is caused by the sampling interval plus the built in averaging method.
5) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 47535)
Posted 25 days ago by computezrmle
Post:
The main problem is, this slows down all other requests to Squid.

How do you monitor this slowdown?
I also notice a 100 % load on 1 core while an upload is in progress but other transfers seem to be unaffected.

This is a typical ATLAS upload from my access.log:
[14/Nov/2022:02:57:46 +0100] "POST http://lhcathome-upload.cern.ch/lhcathome_cgi/file_upload_handler HTTP/1.1" 200 186172617 "-" "BOINC client (x86_64-suse-linux-gnu 7.21.0)" TCP_MISS:HIER_DIRECT

size: 186 MB
typical upload speed: 4 MB/s
time used: ~45 s

Between the timestamps 02:57:00 (estimated start) and 02:57:46 (logged end) my log lists roughly 60 other transfers - various clients, various servers, various methods.

Guess some kind of misconfiguration

As for Squid I'm not aware of an option that could be responsible for this.
6) Message boards : Number crunching : Current work - run times (Message 47534)
Posted 25 days ago by computezrmle
Post:
Some links that may give you an impression of how complex the interaction of BOINC client, BOINC Manager and VirtualBox can be and how long it takes between the first comment and a solution.
https://github.com/BOINC/boinc/issues/3105
https://github.com/BOINC/boinc/issues/3355
7) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 47530)
Posted 26 days ago by computezrmle
Post:
No matter what the real name is.
Feel free to add an alias to your hosts file (or your name resolution service) like:
10.116.178.201 proxy.home.arpa


Then tell BOINC to use the proxy "proxy.home.arpa" and finally that name will appear in the logs.

"home.arpa" is a standard domain for public use similar to the standard IP ranges (10.a.b.c, 192.168...).
It can be used by everyone and guarantees there won't be conflicts with existing domains.
Don't us domains like "example.com" for this.
8) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 47529)
Posted 26 days ago by computezrmle
Post:
...you have same troubles.

What trouble?


I restarted my squid box last night after a kernel update.
Hence, the UP Time (24673 s) from the statistic page was a bit too short for a "good quality" answer.
It will be more relevant after a week or so.

Nonetheless, the average (total) CPU Usage of 8.49 % needs to be divided by the number of concurrently running tasks attached to this squid.
If you assume (based on your observation) that this is mainly caused by uploads, my numbers would be:
25 ATLAS
44 CMS (I force them through Squid just to have the requests in the statistics tool)

Total: 69
Now divide 8.49% by 69 and you get 0.12 %
This is the average singlecore CPU usage squid spends per task.

Peak performance for uploads only matters if squid can't saturate the upload direction of the internet line any more. I never noticed this.
9) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 47526)
Posted 26 days ago by computezrmle
Post:
Within the Squid package there should be a "cachemgr.cgi".
If you run your own webserver, configure it to use that cgi.

See:
https://wiki.squid-cache.org/Features/CacheManager


Without a webserver "squidclient" can be used to get the same information.
squidclient mgr:info

The output from this example includes a statistic section like this:
Resource usage for squid:
        UP Time:        24673.527 seconds
        CPU Time:       2094.236 seconds
        CPU Usage:      8.49%
        CPU Usage, 5 minute avg:        0.35%
        CPU Usage, 60 minute avg:       9.49%
        Maximum Resident Size: 834032 KB
        Page faults with physical i/o: 0

On a multi CPU computer 'CPU Usage = 100%' would mean 1 core since Squid is a singlecore app.
10) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 47525)
Posted 26 days ago by computezrmle
Post:
Those entries are useful to identify a misconfigured proxy setting.
It does no harm to have it in the log.


This topic has been discussed many times, even in this thread:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5474&postid=44230
11) Message boards : Number crunching : Current work - run times (Message 47523)
Posted 26 days ago by computezrmle
Post:
Looks like you know exactly why a login doesn't work on your system.
You also describe what would be necessary to solve it.
It's your system, just do it.

If you prefer another method that would require changes to the boinc service or elsewhere, just do it.
12) Message boards : Number crunching : Current work - run times (Message 47520)
Posted 27 days ago by computezrmle
Post:
1 out of a couple methods.
This works without Vbox extensions being installed.


From your personal account run
su boinc (+ enter the pw)

then
cd ~; VirtualBox >/dev/null 2>&1

Click on the VM you want to examine, then on "show".
13) Message boards : Theory Application : Extreme Overload caused by a Theory Task (Message 47517)
Posted 28 days ago by computezrmle
Post:
It turns out MadGraph tasks are more common than I realized.

Theory's backend is mcplots.
We are currently processing "runs" from mcplots revision 2390 which has a total of 70981 rows.
192 of them are "madgraph" which means 0.27 %.

Here's a snippet from the complete list:

run 	events 	attempts 	success 	failure 	unknown
pp zinclusive 7000 -,-,50,130 - madgraph5amc 2.4.3.atlas lo 	7500000 	83 	75 	3	5
pp zinclusive 7000 -,-,50,130 - madgraph5amc 2.4.3.atlas lo1jet 	7000000 	84 	70 	6	8
pp zinclusive 7000 -,-,50,130 - madgraph5amc 2.4.3.atlas lo2jet 	6932000 	83 	70 	3	10
pp zinclusive 7000 -,-,50,130 - madgraph5amc 2.5.5.atlas lo 	7600000 	83 	76 	3	4



That last one was an odd one in that it prevented anything else from running. Others haven't done that.

No idea why.


They do use more than one CPU thread...

Yes.
And because of this...
... as their CPU time is greater than run time


and they're long-runners, taking over 24 hrs. to complete.

Not really long.
There are Theory tasks (very few) running a couple of days, sometimes a week.
The BOINC server has no influence on the mcplots backend queue.
It has to send out what it gets from there, short tasks, long tasks, sherpas, madgraphs ...
14) Message boards : Number crunching : VirtualBox -~-~ 7.0.2 ~-~- (released October 20 2022) (Message 47514)
Posted 9 Nov 2022 by computezrmle
Post:
What makes me wonder are these logfile lines pointing out a BOINC client crash during the first run of that task:
10:31:20 (5956): BOINC client no longer exists - exiting
10:31:20 (5956): timer handler: client dead, exiting

'client dead' means the BOINC client.
The 2nd run may have crashed due to trash left behind by the 1st run.
15) Message boards : CMS Application : CMS application can't start computing. (Message 47510)
Posted 8 Nov 2022 by computezrmle
Post:
https://cms-frontier.openhtc.io/

Frontier never makes HTTPS requests.
Instead it uses HTTP to get the benefits from caching.
Is it a typo or did you patch something?

Does it affect all of your tasks or just a few?

Does it affect all of your computers or just a few?
16) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 47506)
Posted 7 Nov 2022 by computezrmle
Post:
A recent example related to the dev project's vdi files.

This morning I downloaded CMS_2022_09_07.vdi.gz (1.6 GB) from the dev server to test a new app version.
Since the same *.gz file was also used for the previous app version a few days ago Squid still had a fresh copy in it's cache.
As a result the BOINC client completed the download within 16 seconds.

Mo 07 Nov 2022 09:10:35 CET | lhcathome-dev | Started download of CMS_2022_09_07.vdi
Mo 07 Nov 2022 09:10:51 CET | lhcathome-dev | Finished download of CMS_2022_09_07.vdi

[07/Nov/2022:09:10:50 +0100] "GET http://lhcathome-test.cern.ch/lhcathome-dev/download/CMS_2022_09_07.vdi.gz HTTP/1.1" 200 1607884608 "-" "BOINC client (x86_64-suse-linux-gnu 7.21.0)" TCP_REFRESH_UNMODIFIED:HIER_DIRECT
17) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 47504)
Posted 6 Nov 2022 by computezrmle
Post:
It depends on the szenario the Squid is part of.

In your case you allow (BOINC-)clients to bypass the proxy.
This works fine as long as your firewall policy allows direct HTTP traffic between the clients and external servers.

Other szenarios may configure Squid as part of the firewall and force all HTTP traffic through Squid.
Even if clients are configured not to use a proxy, they may not even notice the redirection.
Here, Squid must reliably handle the traffic.


Another point is that the project's vdi files are distributed via 'lhcathome-upload.cern.ch' (although they are downloads).
The suggested squid.conf allows to store those large files in the cache for multiple reuse.
This does not work if you bypass Squid.
18) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 47502)
Posted 6 Nov 2022 by computezrmle
Post:
... what was different with prior Squids that they didn't need this flag?

It might be related to this:
https://bugs.squid-cache.org/show_bug.cgi?id=5214

Setting "client_request_buffer_max_size" should be seen as a workaround.
Just think about what would happen if a project tries to upload a file that is larger than the configured buffer.
19) Message boards : CMS Application : New Version 70.00 (Message 47492)
Posted 4 Nov 2022 by computezrmle
Post:
The x509 issue has been solved today.
It is OK to run CMS.
20) Message boards : Theory Application : Extreme Overload caused by a Theory Task (Message 47487)
Posted 4 Nov 2022 by computezrmle
Post:
As far as I understood the complaint was that madgraph doesn't respect the 2-core limit any more.
The output shows that it respects the limit.

You may compare your BOINC RAM limits with the RAM usage BOINC reports for the madgraph task.
Those tasks are known to require lots of RAM.
They may also run a couple of days.


Next 20


©2022 CERN