Questions and Answers : Unix/Linux : I got a lot of errors does my system is correctly setup.
Message board moderation

To post messages, you must log in.

AuthorMessage
Alexandre_Phan

Send message
Joined: 19 Sep 20
Posts: 5
Credit: 72,745
RAC: 816
Message 49624 - Posted: 23 Feb 2024, 20:26:27 UTC
Last modified: 23 Feb 2024, 20:28:17 UTC

Hi,
I recently install boinc on a Linux desktop. Install CVMFS and all the probe come back as OK, I also install Virtualbox via boinc-virtualbox. But I still gets a lots of errors for 104 task, 28 valid, 20 invalid and 46 Errors (some of them came from the faulty theory task).
LHC@Home is the only project that give me that many errors. Did I do something wrong?
I don't know if it help but some times when starting the computer Ubuntu give me the msg that cvfs2 stop working.

Thanks for any information for what can be wrong on my setup.
ID: 49624 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2125
Credit: 159,933,138
RAC: 41,632
Message 49628 - Posted: 24 Feb 2024, 5:39:26 UTC - in response to Message 49624.  

Do you have WiFi or LAN for your network?
LAN is better.
ID: 49628 · Report as offensive     Reply Quote
Alexandre_Phan

Send message
Joined: 19 Sep 20
Posts: 5
Credit: 72,745
RAC: 816
Message 49630 - Posted: 24 Feb 2024, 8:30:40 UTC - in response to Message 49628.  

currently I only have access to wifi. I should be able to access lan in the following 2 month.
I also have a vpn always active. maybe it can also create problems?
ID: 49630 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2433
Credit: 228,041,616
RAC: 124,822
Message 49633 - Posted: 24 Feb 2024, 9:13:47 UTC - in response to Message 49630.  

I also have a vpn always active. maybe it can also create problems?

Yes.
ATLAS/CMS/Theory download thousands of files per task - mostly small ones but sometimes huge ones.
The project does everything to distribute those files as efficient as possible via servers/proxies as close to your home location as possible.

If you use a VPN (or much worse a network like Tor) those files are forced through deviations and bottlenecks.
This makes all efficiency efforts on the project's side useless.
ID: 49633 · Report as offensive     Reply Quote
Alexandre_Phan

Send message
Joined: 19 Sep 20
Posts: 5
Credit: 72,745
RAC: 816
Message 49635 - Posted: 24 Feb 2024, 10:43:33 UTC - in response to Message 49633.  

Ok thanks you for the answer. I deactivate the vpn and will let my computer crunch for a week. I will came back if the numbers of errors is still to high
ID: 49635 · Report as offensive     Reply Quote
Alexandre_Phan

Send message
Joined: 19 Sep 20
Posts: 5
Credit: 72,745
RAC: 816
Message 49642 - Posted: 24 Feb 2024, 22:35:23 UTC - in response to Message 49633.  

I switch of the VPN but I still have a lots of "Validate error" and they all contains some error with no dvData and dvTime

/.../

running command: /cvmfs/atlas.cern.ch/repo/containers/sw/apptainer/x86_64-el7/current/bin/apptainer exec -B /cvmfs,/var/lib/boinc-client/slots/3 /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7 sh start_atlas.sh
[2024-02-24 20:49:24] *** The last 200 lines of the pilot log: ***
[2024-02-24 20:49:24] 2024-02-24 19:49:08,324 | INFO | using path: /var/lib/boinc-client/slots/3/PanDA_Pilot-6120516034/memory_monitor_summary.json (trf name=prmon)
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | INFO | extracted standard info from prmon json
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | INFO | extracted standard memory fields from prmon json
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | WARNING | format EVNTtoHITS has no such key: dbData
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | WARNING | format EVNTtoHITS has no such key: dbTime
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | WARNING | found no stored workdir sizes
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | INFO | will not add max space = 0 B to job metrics
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | WARNING | wrong length of table data, x=[1708804001.0], y=[1432.0] (must be same and length>=4)
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | INFO | ..............................
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . Timing measurements:
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . get job = 0 s
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . initial setup = 1 s
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . payload setup = 2 s
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . stage-in = 0 s
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . payload execution = 25 s
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . stage-out = 0 s
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . log creation = 0 s
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | ..............................
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | building log extracts (sent to the server as 'pilotLog')
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | executing command: tail -n 20 /var/lib/boinc-client/slots/3/PanDA_Pilot-6120516034/pilotlog.txt
[2024-02-24 20:49:24] 2024-02-24 19:49:08,333 | WARNING | detected the following tail of warning/fatal messages in the pilot log:
[2024-02-24 20:49:24] - Log from pilotlog.txt -
[2024-02-24 20:49:24] 2024-02-24 19:49:08,324 | INFO | using path: /var/lib/boinc-client/slots/3/PanDA_Pilot-6120516034/memory_monitor_summary.json (trf name=prmon)
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | INFO | extracted standard info from prmon json
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | INFO | extracted standard memory fields from prmon json
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | WARNING | format EVNTtoHITS has no such key: dbData
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | WARNING | format EVNTtoHITS has no such key: dbTime
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | WARNING | found no stored workdir sizes
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | INFO | will not add max space = 0 B to job metrics
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | WARNING | wrong length of table data, x=[1708804001.0], y=[1432.0] (must be same and length>=4)
[2024-02-24 20:49:24] 2024-02-24 19:49:08,325 | INFO | ..............................
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . Timing measurements:
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . get job = 0 s
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . initial setup = 1 s
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . payload setup = 2 s
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . stage-in = 0 s
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . payload execution = 25 s
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . stage-out = 0 s
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | . log creation = 0 s
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | ..............................
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | building log extracts (sent to the server as 'pilotLog')
[2024-02-24 20:49:24] 2024-02-24 19:49:08,326 | INFO | executing command: tail -n 20 /var/lib/boinc-client/slots/3/PanDA_Pilot-6120516034/pilotlog.txt

/..../


[2024-02-24 20:49:24] 2024-02-24 19:49:08,333 | WARNING |
[2024-02-24 20:49:24] [begin log extracts]
[2024-02-24 20:49:24] - Log from pilotlog.txt -
[2024-02-24 20:49:24] *** Error codes and diagnostics ***
[2024-02-24 20:49:24] "exeErrorCode": 65,
[2024-02-24 20:49:24] "exeErrorDiag": "Non-zero return code from EVNTtoHITS (8); Logfile error in log.EVNTtoHITS: \"IOError: [Errno 2] No such file or directory: 'PDGTABLE.MeV'\"",
[2024-02-24 20:49:24] "pilotErrorCode": 1305,
[2024-02-24 20:49:24] "pilotErrorDiag": "Failed to execute payload:PyJobTransforms.transform.execute 2024-02-24 20:47:01,429 CRITICAL Transform executor raised TransformValidationException: Non-zero return code from EVNTtoHITS (8); Logfile error in log.EVNTtoHITS: \"IOError: [Errno 2] No such file or directory: 'PDGTABLE.MeV'\"",
[2024-02-24 20:49:24] *** Listing of results directory ***


/.../
ID: 49642 · Report as offensive     Reply Quote
Pascal

Send message
Joined: 13 May 20
Posts: 33
Credit: 1,162,931
RAC: 3,574
Message 49646 - Posted: 25 Feb 2024, 18:35:57 UTC - in response to Message 49642.  

bonsoir
personnellement Je travaille avec virtualbox car je n'arrive pas installer cvmfs et pareil,je n'ai que des erreurs sur theory simulation.
J'ai donc arréter de calculer pour theory en le décochant dans mon compte.
je suis sous linux mint 21.3
Lhc@home devient comme gpugrid,c'est a dire n'importe quoi!!

good evening
personally I work with virtualbox because I can not install cvmfs and the same, I have only errors on theory simulation.
So I stopped calculating for theory by decoding it in my account.
I am under linux mint 21.3
Lhc@home becomes like gpugrid, that is to say anything!
ID: 49646 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 25 Sep 17
Posts: 99
Credit: 3,303,203
RAC: 3,741
Message 49666 - Posted: 28 Feb 2024, 2:41:59 UTC - in response to Message 49642.  
Last modified: 28 Feb 2024, 2:42:16 UTC

Alexandre_Phan
I took a look at a bunch of your errors and the other computers they get sent to also show errors. Work units that won't complete on multiple different computers have a problem, usually.

I don't think your computer is the problem. You can check a few more by selecting 'Work Unit' on the tasks page for your computer. This link is an example you completed.

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=220275461
ID: 49666 · Report as offensive     Reply Quote
Pascal

Send message
Joined: 13 May 20
Posts: 33
Credit: 1,162,931
RAC: 3,574
Message 49667 - Posted: 28 Feb 2024, 10:14:30 UTC - in response to Message 49666.  

le probleme ne sont pas les ordinateurs.
j'avais des erreurs de calcul theory simulation sur un pc avec windows 10 et un pc linux mint.J'ai donc arreté theory simulation.
ce sont les unités de calcul theory simulation qui sont pourries.
LHC@home devient comme gpugrid,on leur met de la puissance de calcul a disposition et il font n'importe quoi avec.
Si ça continue je vais arreter ces 2 projets malgré le fait qu'ils font partie des plus intéressants.

The problem is not computers.
I had computational errors theory simulation on a pc with windows 10 and a pc linux mint. So I stopped theory simulation.
It is the units of calculation theory simulation that are rotten.
LHC@home becomes like gpugrid, we put computing power at their disposal and they do anything with it.
If it continues I will stop these 2 projects despite the fact that they are among the most interesting.
ID: 49667 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 25 Sep 17
Posts: 99
Credit: 3,303,203
RAC: 3,741
Message 49668 - Posted: 29 Feb 2024, 17:24:10 UTC - in response to Message 49667.  

Question for Native ATLAS users? How many cores or threads do you assign to an ATLAS tasks? I noticed Alexandre_Phan computer is AMD Ryzen 75800X with 8 cores , 16 threads. The successfully completed work units were assigning 'running run_atlas (--nthreads 12)'

What do you Native ATLAS users assign for those and how do you control that job? User has native ATLAS and Therory tasks that have completed. I only use the VBox version of ATLAS and '--nthreads 4' in my app_config.xml for control of processor cores per work unit. I also have Max # CPUs 4 set on my LHC@home project preferences on this website

Alexandre_Phan, do you have LHC Project preferences set? You might try setting Max # CPUs to 4 to limit the cores per ATLAS workunit and/or unsellecting ATLAS or Theory, temporarally, to see ATLAS or Theory will run successfully on its own and mixing the two work types is an issue?
ID: 49668 · Report as offensive     Reply Quote
Alexandre_Phan

Send message
Joined: 19 Sep 20
Posts: 5
Credit: 72,745
RAC: 816
Message 49669 - Posted: 29 Feb 2024, 18:56:03 UTC - in response to Message 49668.  

I have set the max cpu as no limit, I just set the limit to 4 cpu to see if it will work.
Some time to time I have WU that work for LHC@Home. But lots of them failed and not only for me
ID: 49669 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 25 Sep 17
Posts: 99
Credit: 3,303,203
RAC: 3,741
Message 49670 - Posted: 29 Feb 2024, 19:53:01 UTC - in response to Message 49669.  

You can also try to post in the Theory and ATLAS forum section with a link to this, original forum thread. You might get more people commenting on if you have an issue or not, or some more feedback. It just helps to have the conversations all in the original thread and keeps it easier to follow what has been suggested and tried.
ID: 49670 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : I got a lot of errors does my system is correctly setup.


©2024 CERN