Message boards : ATLAS application : Error with 2 CPUs
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile PhilTheNet
Avatar

Send message
Joined: 21 Sep 14
Posts: 25
Credit: 723,818
RAC: 0
Message 46354 - Posted: 25 Feb 2022, 6:07:58 UTC

Hi there,

On two different machines when I launch ATLAS with 2 cpus the spots are in error after 20mn
cf: https://lhcathome.cern.ch/lhcathome/result.php?resultid=345659467

On these same machines when I run ATLAS with 1 cpu the spots are not in error

Is there a known problem or a specific setting?

Thanks
ID: 46354 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,949,569
RAC: 124,837
Message 46355 - Posted: 25 Feb 2022, 7:17:15 UTC - in response to Message 46354.  
Last modified: 25 Feb 2022, 7:23:18 UTC

Your finished task have no hitsfile produced:
2022-02-24 18:01:35] No HITS result produced
WSL2 is difficult to find the errors atm.
Ubuntu 20.04.4 LTS [5.10.60.1-microsoft-standard-WSL2|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.2)]

Guide to Getting Quickly Started Running Native ATLAS (Ubuntu 20.04 on WSL2)
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5782
ID: 46355 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,574,389
RAC: 120,862
Message 46356 - Posted: 25 Feb 2022, 7:24:21 UTC - in response to Message 46354.  

https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10771719
Looks like this Linux runs as a WSL guest on a Windows host.

AFAIK ATLAS native on WSL doesn't succeed if it is configured to use more than 1 core.
See:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5777
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5782



Beside that you may be aware that ATLAS does not support snapshots.
Hence, even the task marked as valid didn't produce any valuable result:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=345564531
The log shows 9 restarts
This job has been restarted, cleaning up previous attempt

and finally
No HITS result produced
ID: 46356 · Report as offensive     Reply Quote
Profile PhilTheNet
Avatar

Send message
Joined: 21 Sep 14
Posts: 25
Credit: 723,818
RAC: 0
Message 46357 - Posted: 25 Feb 2022, 7:37:04 UTC - in response to Message 46356.  

I don't understand why there are "This job has been restarted, cleaning up previous attempt" the computer has not been shut down ?

Thank you for the answers
ID: 46357 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,574,389
RAC: 120,862
Message 46358 - Posted: 25 Feb 2022, 7:48:14 UTC - in response to Message 46357.  

This happens when the BOINC client switches from ATLAS to another task and back.
You may check if "leave non-GPU apps in RAM when paused" from the preferences menu solves the issue.
If not, you may need a separate client that doesn't run any other project.
Even then, stopping the client or a shutdown/reboot will cause ATLAS to start from scratch.
ID: 46358 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 8 Dec 19
Posts: 37
Credit: 7,579,770
RAC: 2,548
Message 46359 - Posted: 25 Feb 2022, 7:54:00 UTC - in response to Message 46354.  

The reason is because you're running it on WSL2. WSL2 is not exactly the same as regular Linux (Ubuntu in your case) because it has a custom kernel and it's init.d, not systemd.

It's good to see others use WSL2 for BOINC projects but it does have its quirks in LHC. One of them is that native ATLAS can only be ran single core in WSL2 and fails when you try to run it multi-core. I've tried to figure out a solution to that a few times in the past but no success so far.

Another quirk is that native Theory doesn't run on WSL2 without a modification (it's an easy one though), which took me a while to figure out. If you're thinking of running Theory check out this post: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5777&postid=46031, it should be at the bottom of the thread.

I like WSL2 because you can run Linux projects on Windows machines with minimal resources compared to regular virtual machines. I used to use Hyper-V before learning about WSL2 and now very rarely use Hyper-V.
ID: 46359 · Report as offensive     Reply Quote
Profile PhilTheNet
Avatar

Send message
Joined: 21 Sep 14
Posts: 25
Credit: 723,818
RAC: 0
Message 46360 - Posted: 25 Feb 2022, 8:01:07 UTC - in response to Message 46358.  
Last modified: 25 Feb 2022, 8:04:34 UTC

<leave_apps_in_memory>1</leave_apps_in_memory> is already active
ID: 46360 · Report as offensive     Reply Quote
Profile PhilTheNet
Avatar

Send message
Joined: 21 Sep 14
Posts: 25
Credit: 723,818
RAC: 0
Message 46361 - Posted: 25 Feb 2022, 8:04:12 UTC - in response to Message 46359.  

The reason is because you're running it on WSL2. WSL2 is not exactly the same as regular Linux (Ubuntu in your case) because it has a custom kernel and it's init.d, not systemd.

It's good to see others use WSL2 for BOINC projects but it does have its quirks in LHC. One of them is that native ATLAS can only be ran single core in WSL2 and fails when you try to run it multi-core. I've tried to figure out a solution to that a few times in the past but no success so far.

Another quirk is that native Theory doesn't run on WSL2 without a modification (it's an easy one though), which took me a while to figure out. If you're thinking of running Theory check out this post: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5777&postid=46031, it should be at the bottom of the thread.

I like WSL2 because you can run Linux projects on Windows machines with minimal resources compared to regular virtual machines. I used to use Hyper-V before learning about WSL2 and now very rarely use Hyper-V.


Thks

Nota: same errors with ubuntu server on vmware fusion 12
ID: 46361 · Report as offensive     Reply Quote
Profile PhilTheNet
Avatar

Send message
Joined: 21 Sep 14
Posts: 25
Credit: 723,818
RAC: 0
Message 46363 - Posted: 25 Feb 2022, 8:05:45 UTC - in response to Message 46358.  

This happens when the BOINC client switches from ATLAS to another task and back.
You may check if "leave non-GPU apps in RAM when paused" from the preferences menu solves the issue.
If not, you may need a separate client that doesn't run any other project.
Even then, stopping the client or a shutdown/reboot will cause ATLAS to start from scratch.


<leave_apps_in_memory>1</leave_apps_in_memory> is already active
ID: 46363 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 8 Dec 19
Posts: 37
Credit: 7,579,770
RAC: 2,548
Message 46364 - Posted: 25 Feb 2022, 8:20:10 UTC - in response to Message 46360.  

Change the following line to a high number in global_prefs_override.xml to prevent BOINC from switching between tasks. If the line is not there just add it.:
<cpu_scheduling_period_minutes>10080.000000</cpu_scheduling_period_minutes>

I have mine set to 10080 minutes (1 week). I have the same setting on my Windows and Linux BOINC setups. I don't see a good reason to switch between tasks mid-task, just let a task finish before moving to the next one. For a project like ATLAS and maybe Theory this setting is necessary.
ID: 46364 · Report as offensive     Reply Quote
Profile PhilTheNet
Avatar

Send message
Joined: 21 Sep 14
Posts: 25
Credit: 723,818
RAC: 0
Message 46365 - Posted: 25 Feb 2022, 8:36:52 UTC - in response to Message 46364.  

Change the following line to a high number in global_prefs_override.xml to prevent BOINC from switching between tasks. If the line is not there just add it.:
<cpu_scheduling_period_minutes>10080.000000</cpu_scheduling_period_minutes>

I have mine set to 10080 minutes (1 week). I have the same setting on my Windows and Linux BOINC setups. I don't see a good reason to switch between tasks mid-task, just let a task finish before moving to the next one. For a project like ATLAS and maybe Theory this setting is necessary.


Ok

Thks
ID: 46365 · Report as offensive     Reply Quote

Message boards : ATLAS application : Error with 2 CPUs


©2024 CERN