Message boards : Theory Application : Problem of the day
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1144
Credit: 50,109,480
RAC: 4,430
Message 32376 - Posted: 10 Sep 2017, 3:57:39 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=155279014

I got a couple of these tonight and others that do start up running.

So is the Condor asleep again?
The run time was only 13 min 58 sec so it wasn't past the 20 minute wall.

(this is not a internet speed problem on my end)
Volunteer Mad Scientist For Life
ID: 32376 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 380
Credit: 238,712
RAC: 0
Message 32396 - Posted: 12 Sep 2017, 13:14:31 UTC - in response to Message 32376.  

Yes, it could have been the server was temporarily unavailable.
ID: 32396 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1144
Credit: 50,109,480
RAC: 4,430
Message 32696 - Posted: 8 Oct 2017, 15:49:01 UTC
Last modified: 8 Oct 2017, 15:49:44 UTC

Last few days I have had this happening once in a while in 4 seconds when they are suspended.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=158796702

It will have a task running as this happens and starts tasks after this.

New versions of Boinc and VB
ID: 32696 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1144
Credit: 50,109,480
RAC: 4,430
Message 32769 - Posted: 10 Oct 2017, 22:19:46 UTC

https://lhcathome.cern.ch/lhcathome/results.php?userid=5472

Too many of these errors lately and there would be even more if I didn't have most of the new ones suspended.
ID: 32769 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1334
Credit: 8,834,586
RAC: 1,392
Message 32790 - Posted: 11 Oct 2017, 13:57:44 UTC

No or not enough Theory jobs available.

Some BOINC-tasks fail after about 13 minutes run time: EXIT_NO_SUB_TASKS
https://lhcathome.cern.ch/lhcathome/result.php?resultid=158871945

or when you're lucky and get a first task, a big chance you don't get a second and the task ends early.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=158871885
ID: 32790 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1144
Credit: 50,109,480
RAC: 4,430
Message 32791 - Posted: 11 Oct 2017, 14:49:23 UTC

I am having better luck with the -dev multi-core Theory tasks

I would switch to the Atlas tasks here except for those huge vdi's that take half a day to d/l on each host.

Rather get some SixTracks here now (and let me know we have those before they are all loaded on computers that don't do these VB tasks at all)
ID: 32791 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1334
Credit: 8,834,586
RAC: 1,392
Message 32796 - Posted: 11 Oct 2017, 15:48:15 UTC - in response to Message 32790.  

No or not enough Theory jobs available.

It looks like a temporary hickup - Normal job delivering now.
ID: 32796 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1334
Credit: 8,834,586
RAC: 1,392
Message 32797 - Posted: 11 Oct 2017, 16:10:09 UTC - in response to Message 32796.  

No or not enough Theory jobs available.

It looks like a temporary hickup - Normal job delivering now.

Sorry - False hope.

The theory VM's are killed one after the other due to lack of sub jobs.
ID: 32797 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1144
Credit: 50,109,480
RAC: 4,430
Message 33213 - Posted: 6 Dec 2017, 0:31:40 UTC
Last modified: 6 Dec 2017, 0:32:52 UTC

(just thought I would bring this up since staring at 9 computers and trying to get 50 tasks running with VB can make a person......)

https://lhcathome.cern.ch/lhcathome/result.php?resultid=168740585

OK I just happened to go back to this pc to make sure the tasks were running.

The LHC-dev Theory was running with no problem but the Theory task had the good old "Could not connect to vccs1.cern.ch on port 443" after doing the *Testing VCCS connection to vccs1.cern.ch on port 443*

With the *nc: getaddrinfo: Name or service not known* between those two.

It did have no problem with Testing CVMFS connection to lhchomeproxy.cern.ch on port 3125
Connection to lhchomeproxy.cern.ch 3125 port [tcp/a13-an] succeeded!
: nc: connect to lhchomeproxy.cern.ch port 3125 (tcp) timed out: Operation now in progress


But as usual the internet speed is not fast enough to do the simple connection with the Cern server and get the credentials so it can then get to HTCondor ping.........so the second try is just a 25 minute computer error and it is a good thing I always try to set the Boinc Manager at *no new tasks* so it does't keep doing that until it gets lucky and it starts (no a squid will not help me get credentials)

So along with the ones that do that *VM Completion Message: Condor exited after 774s without running a job.* after 36 minutes running......well it is always a good thing to NOT let them auto-run with BM set to get new tasks all the time.

(of course if it was just running SixTrack tasks non-stop there would not be this problem.....but....)
Volunteer Mad Scientist For Life
ID: 33213 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1144
Credit: 50,109,480
RAC: 4,430
Message 33218 - Posted: 6 Dec 2017, 17:03:14 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=168746258

I just saw this one happen and wondered why it was all of a sudden finished after almost 7 hours.....so I check here on the stderr and once again the VM Heartbeat file specified, but missing heartbeat

Of course I am just pointing this out and know there is no actual reason or fix and I have been running these VB tasks daily 24/7 longer than anyone.

(and probably spend too many hours with a monitor in front of me)

.....just talking to myself.....no big deal and nothing new
Volunteer Mad Scientist For Life
ID: 33218 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2167
Credit: 165,921,302
RAC: 102,850
Message 46199 - Posted: 8 Feb 2022, 16:58:28 UTC

343600482 182231829 10548292 8 Feb 2022, 3:39:48 UTC 8 Feb 2022, 4:05:42 UTC Fertig, Warte auf Bestätigung 1,379.91 898.31 ausstehend Theory Simulation v300.06 (vbox64_theory)
windows_x86_64
ID: 46199 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 690
Credit: 45,247,731
RAC: 31,691
Message 46217 - Posted: 10 Feb 2022, 18:50:02 UTC

Today the we have been granted a lot more tasks our hosts to download. For the past couple of years the amount of tasks we could get was 8 tasks for a host. At least on my two computers with 4/8 and 8/16 CPU cores. New limits seems to be 63 per host, maybe this value comes from my cache size (1 day + 0.1 days). Let's hope I can crunch all of them before deadline.
ID: 46217 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 690
Credit: 45,247,731
RAC: 31,691
Message 46218 - Posted: 10 Feb 2022, 19:18:42 UTC - in response to Message 46217.  

Today the we have been granted a lot more tasks our hosts to download. For the past couple of years the amount of tasks we could get was 8 tasks for a host. At least on my two computers with 4/8 and 8/16 CPU cores. New limits seems to be 63 per host, maybe this value comes from my cache size (1 day + 0.1 days). Let's hope I can crunch all of them before deadline.

This may have been a temporary setting as after more careful look it seems that I haven't downloaded any new theory tasks since about 11 local time (9 UTC).
ID: 46218 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 817
Credit: 667,999,990
RAC: 180,765
Message 46219 - Posted: 10 Feb 2022, 20:24:38 UTC - in response to Message 46218.  
Last modified: 10 Feb 2022, 20:25:08 UTC

The cache size settings are only applied if you have set both Max # jobs & Max # CPUs to no limit, in my observations.

However, this has other consequences that you need to manage.
ID: 46219 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 690
Credit: 45,247,731
RAC: 31,691
Message 46221 - Posted: 10 Feb 2022, 22:12:59 UTC - in response to Message 46219.  

The cache size settings are only applied if you have set both Max # jobs & Max # CPUs to no limit, in my observations.

However, this has other consequences that you need to manage.

OK, my settings are 4 CPUs and 'No limit' for number of tasks and that's how they have been for ages. So something changed today server side that made my theory download limits jump high. Anyway, now my task buffer is diminishing (currently at 50 & 58 theory tasks) while new Atlas tasks are being replenished when old ones finish.
ID: 46221 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 690
Credit: 45,247,731
RAC: 31,691
Message 46231 - Posted: 11 Feb 2022, 15:09:10 UTC

It happened again today. Both Hosts have now 80 theory tasks downloaded. So is this is the new normal?
ID: 46231 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2167
Credit: 165,921,302
RAC: 102,850
Message 46232 - Posted: 11 Feb 2022, 16:11:52 UTC - in response to Message 46231.  

Theory have the most unsend tasks and the running tasks growing from 2k yesterday to 4k today.
seeing also some more tasks downloading (only 3-5 per Boincmanager)
ID: 46232 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 817
Credit: 667,999,990
RAC: 180,765
Message 46234 - Posted: 11 Feb 2022, 17:50:31 UTC - in response to Message 46231.  

Could be I saw something like you described, like the limit on task was something like 48 or 4x number of cores, this is why I went to no limit for both as for my 56 core Xeon it didn't actually send enough to fill, but it could be doable now based on you observations.
ID: 46234 · Report as offensive     Reply Quote
Evangelos Katikos

Send message
Joined: 4 Oct 21
Posts: 10
Credit: 40,205,665
RAC: 38,746
Message 46247 - Posted: 14 Feb 2022, 12:41:57 UTC

They increased the limit to the level of sixtrack even though sixtrack has a lower average computation time. Seems after cms, I´ll have to abort en masse theory workunits also.
ID: 46247 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2167
Credit: 165,921,302
RAC: 102,850
Message 46251 - Posted: 16 Feb 2022, 0:46:33 UTC

After finishing of Theory-native in CentOS8-VM File output.tgz is not deleted in slot - Boinc 7.16.11,
so the Slotnumber is growing in a free Slotnumber.
Have to delete it manually!
ID: 46251 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Theory Application : Problem of the day


©2024 CERN