Message boards : Theory Application : Problem of the day
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,861,807
RAC: 219
Message 46767 - Posted: 11 May 2022, 21:21:06 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=354189019
===> [runRivet] Wed May 11 07:14:58 UTC 2022 [boinc pp mb-inelastic 2760 - - sherpa 1.2.3 default 100000 254]

Ran ok till 22400 events then did nothing more for 12+hrs when I killed it. Full usage of a core but nothing actually being done.
I had another Sherpa 1.x.x a week or so ago that did similar.
I thought the early Sherpas were deprecated ages ago?
ID: 46767 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2167
Credit: 165,492,697
RAC: 82,704
Message 46885 - Posted: 14 Jun 2022, 7:51:36 UTC
Last modified: 14 Jun 2022, 8:32:47 UTC

Have seen a Download of hours, Running Theory First time on Threadripper today in Win11pro!
Combination of Atlas AND Theory stopped Tasks for Theory, because of Tasklimit of 8. LHC-Prefs are set to nolimit.
Have deleted App-config for using Atlas. Now Download of Theory-Tasks is ok.
ID: 46885 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2167
Credit: 165,492,697
RAC: 82,704
Message 46923 - Posted: 27 Jun 2022, 4:58:42 UTC - in response to Message 46885.  
Last modified: 27 Jun 2022, 5:00:08 UTC

output.tgz with 17 Byte is not deleted in the Theory-SLOT-Folder after finishing a Theory Task.
Boinc is 7.16.11.
Have running Atlas and Theory as native in a CentOS8-VM.
Atlas have no problems, folder is empty after closing task.
When 400 Slots reached no more task is running neither Atlas not Theory.
ID: 46923 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2167
Credit: 165,492,697
RAC: 82,704
Message 46943 - Posted: 27 Jun 2022, 17:58:06 UTC - in response to Message 46923.  
Last modified: 27 Jun 2022, 17:59:03 UTC

<app>
<name>Theory</name>
<max_concurrent>40</max_concurrent>
</app>
Win11pro
Get only 20 Theory-Tasks.
ID: 46943 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2167
Credit: 165,492,697
RAC: 82,704
Message 46955 - Posted: 28 Jun 2022, 9:07:37 UTC - in response to Message 46943.  
Last modified: 28 Jun 2022, 9:14:30 UTC

Seeing .vdi with 2 GByte for Theory? (781 MByte in the most of the Theory-Tasks)
ID: 46955 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2167
Credit: 165,492,697
RAC: 82,704
Message 46956 - Posted: 28 Jun 2022, 9:28:41 UTC - in response to Message 46955.  

Setting environment...
grep: /etc/redhat-release: No such file or directory
seeing in all Theory Win11pro and Win10pro.
ID: 46956 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2167
Credit: 165,492,697
RAC: 82,704
Message 48474 - Posted: 26 Aug 2023, 8:53:17 UTC

396590990 10797673 23 Jul 2023, 4:08:11 UTC 24 Jul 2023, 5:20:31 UTC Fehler beim Berechnen 31,383.27 56.78 --- Theory Simulation v300.07 (vbox64_theory)
windows_x86_64
396640378 10766984 24 Jul 2023, 7:53:29 UTC 4 Aug 2023, 16:01:35 UTC Fehler beim Berechnen 979,582.08 976,997.30 --- Theory Simulation v300.07 (vbox64_theory)
windows_x86_64
397097333 10815119 4 Aug 2023, 10:33:23 UTC 18 Aug 2023, 7:07:57 UTC Abgebrochen 1,191,012.29 1,184,630.00 --- Theory Simulation v300.07 (native_theory)
x86_64-pc-linux-gnu

397672641 10797673 13 Aug 2023, 0:47:01 UTC 14 Aug 2023, 8:09:40 UTC Abgebrochen 82,726.89 81,947.66 --- Theory Simulation v300.07 (vbox64_theory)
windows_x86_64
397724905 9995505 14 Aug 2023, 10:24:48 UTC 24 Aug 2023, 12:59:12 UTC Fehler beim Berechnen 864,760.23 853,816.60 --- Theory Simulation v300.07 (vbox64_theory)
windows_x86_64
398041850 10812982 24 Aug 2023, 17:29:15 UTC 4 Sep 2023, 17:29:15 UTC In Bearbeitung --- --- --- Theory Simulation v300.07 (vbox64_theory)
x86_64-pc-linux-gnu

This Theory Tasks need some control from Cern-IT.
ID: 48474 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2167
Credit: 165,492,697
RAC: 82,704
Message 48545 - Posted: 14 Sep 2023, 16:42:57 UTC

2023-09-14 08:42:50 (54888): Guest Log: 2.5.2.0 4077 3 28120 27003 3 1 244127 4096000 0 65024 0 0 n/a 0 0 http://s1cern-cvmfs.openhtc.io/cvmfs/sft.cern.ch http://10.116.178.201:3128 1
2023-09-14 08:42:50 (54888): Guest Log: Probing /cvmfs/grid.cern.ch... Failed!
2023-09-14 08:42:50 (54888): Guest Log: 08:42:48 CEST +02:00 2023-09-14: cranky: [ERROR] 'cvmfs_config probe grid.cern.ch' failed.
80 hour for nothing! 8 Tasks a 10 hour.
ID: 48545 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 690
Credit: 45,110,504
RAC: 25,241
Message 48927 - Posted: 16 Nov 2023, 9:07:43 UTC

Several tasks have failed lately because Probing /cvmfs/... Failed! Unfortunately the task doesn't detect this and keeps on running although nothing happens in the VM anymore.
ID: 48927 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2167
Credit: 165,492,697
RAC: 82,704
Message 48928 - Posted: 16 Nov 2023, 9:46:04 UTC - in response to Message 48927.  
Last modified: 16 Nov 2023, 9:48:11 UTC

such a faulty task seen also yesterday.
bootstrap problem.
Computer ID 10795955
Laufzeit 15 Stunden 4 min. 20 sek.
CPU Zeit 2 min. 5 sek.
ID: 48928 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1143
Credit: 50,090,615
RAC: 3,327
Message 48932 - Posted: 18 Nov 2023, 1:06:17 UTC

I don't run as many at the same time as before but I like these since we can check and see they are actually running in the log so that many hours are fine if they are really working.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=401733280

funny Theory works fine on the same host that had problems last week with the CMS all of a sudden
And with that problem a clean install of VB newest and newest Boinc made no difference so maybe the latest Windows update so I may see if it needs a reinstall of the CMS vdi ....later
ID: 48932 · Report as offensive     Reply Quote
Darrell

Send message
Joined: 8 Jul 08
Posts: 20
Credit: 28,690,196
RAC: 29,841
Message 48939 - Posted: 19 Nov 2023, 4:33:28 UTC

I'm having a problem with WUs not being downloaded due to "11/19/2023 8:40:03 AM | LHC@home | Not requesting tasks: don't need (CPU: ; NVIDIA GPU: )" even though I ask for "<project_max_concurrent>18</project_max_concurrent>" in app_config.xml, and have "use 100%" for both CPU # and CPU time. This is computer ASUS570 ID=10837826, 32GB, AMD 5950X. I get only one at a time when the previous WU ends. I only run Theory VBOX64 on this computer.

My other three computers are working fine and work is flowing. What else do I need to check?
ID: 48939 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2167
Credit: 165,492,697
RAC: 82,704
Message 48940 - Posted: 19 Nov 2023, 5:03:49 UTC - in response to Message 48939.  

have you different prefs?
home, school....
ID: 48940 · Report as offensive     Reply Quote
Darrell

Send message
Joined: 8 Jul 08
Posts: 20
Credit: 28,690,196
RAC: 29,841
Message 48942 - Posted: 19 Nov 2023, 9:51:46 UTC - in response to Message 48940.  

No. All are set to the same - home, school, work, default. There is something strange going on today, though. Only a few are actually making progress. The others are ... stalled? Of the 17 active on my A3900X1, only 3 are showing more than a fraction of a percent of a CPU time in my monitor. Those three are going 100% of a cpu each. Peculiar, but I have a meeting to run in a few minutes so I can't pursue just now. I haven't looked at the other computers yet.
ID: 48942 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2167
Credit: 165,492,697
RAC: 82,704
Message 48943 - Posted: 19 Nov 2023, 11:14:04 UTC - in response to Message 48942.  

Your tasks finish now. Was there an ISP problem?
ID: 48943 · Report as offensive     Reply Quote
Darrell

Send message
Joined: 8 Jul 08
Posts: 20
Credit: 28,690,196
RAC: 29,841
Message 48944 - Posted: 19 Nov 2023, 12:46:31 UTC - in response to Message 48943.  

Perhaps you looked at one of my other computers.

401889030 217094866 19 Nov 2023, 6:11:52 UTC 30 Nov 2023, 6:11:52 UTC In progress --- --- --- Theory Simulation v300.07 (vbox64_theory) windows_x86_64

is the ONLY WU in process for the computer I can't get/keep more than 1 WU running. Please take another look, and I have had NO ISP problems. Thanks.
ID: 48944 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1334
Credit: 8,834,586
RAC: 1,562
Message 48945 - Posted: 19 Nov 2023, 14:15:44 UTC - in response to Message 48944.  

is the ONLY WU in process for the computer I can't get/keep more than 1 WU running. Please take another look, and I have had NO ISP problems. Thanks.
Set in your preferences Max # CPUs to No limit
ID: 48945 · Report as offensive     Reply Quote
Darrell

Send message
Joined: 8 Jul 08
Posts: 20
Credit: 28,690,196
RAC: 29,841
Message 48947 - Posted: 20 Nov 2023, 1:26:30 UTC - in response to Message 48945.  

ALL of my preferences have NO LIMIT for number of jobs and number of CPUs. My other computers are downloading and running 15-20 WUs each, so I know the preferences work.

I did a "diff" between this and another computer's app_config. They were identical except the other (running) computer did NOT have the entry, below:


<app>
<name>Theory</name>
</app>
<app_version>
<app_name>Theory</app_name>
<plan_class>vbox64_theory</plan_class>
<avg_ncpus>1.0</avg_ncpus>
</app_version>

After removing this, and having BOINC reload configs, I got 4 new tasks. Is this mis-coded in some way? I am curious!
ID: 48947 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1334
Credit: 8,834,586
RAC: 1,562
Message 48949 - Posted: 20 Nov 2023, 9:03:21 UTC - in response to Message 48947.  

After removing this, and having BOINC reload configs, I got 4 new tasks. Is this mis-coded in some way? I am curious!
After I posted yesterday, I saw that on your mentioned machine you sometimes had 2 tasks in progress for a very short period.
Another thing I noticed is, that the machine never had run BOINC benchmarks. You could try running the benchmarks to improve the situation.
Now the server has:

Measured floating point speed 1 billion ops/sec
Measured integer speed 1 billion ops/sec

Another machine of yours shows:

Measured floating point speed 4.6 billion ops/sec
Measured integer speed 18.89 billion ops/sec
ID: 48949 · Report as offensive     Reply Quote
Darrell

Send message
Joined: 8 Jul 08
Posts: 20
Credit: 28,690,196
RAC: 29,841
Message 48952 - Posted: 21 Nov 2023, 5:57:24 UTC - in response to Message 48949.  

I tried running the benchmarks, but it made no difference. Besides, another of my computers had not had the benchmarks run and it was processing just fine.

I did try another thing, though. I drained my (one) WU, cancelled a replacement that had started and reset the project. Then I removed app_config, adjusted the preferences to only 50% of the CPUs, and restarted. This worked so far, but I have NO idea why.

I plan to follow up on this approach to see what I can see.
ID: 48952 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Theory Application : Problem of the day


©2024 CERN