Message boards :
Theory Application :
Problem of the day
Message board moderation
Author | Message |
---|---|
Send message Joined: 24 Oct 04 Posts: 1163 Credit: 53,679,698 RAC: 52,047 |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=155279014 I got a couple of these tonight and others that do start up running. So is the Condor asleep again? The run time was only 13 min 58 sec so it wasn't past the 20 minute wall. (this is not a internet speed problem on my end) Volunteer Mad Scientist For Life |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
Yes, it could have been the server was temporarily unavailable. |
Send message Joined: 24 Oct 04 Posts: 1163 Credit: 53,679,698 RAC: 52,047 |
Last few days I have had this happening once in a while in 4 seconds when they are suspended. https://lhcathome.cern.ch/lhcathome/result.php?resultid=158796702 It will have a task running as this happens and starts tasks after this. New versions of Boinc and VB |
Send message Joined: 24 Oct 04 Posts: 1163 Credit: 53,679,698 RAC: 52,047 |
https://lhcathome.cern.ch/lhcathome/results.php?userid=5472 Too many of these errors lately and there would be even more if I didn't have most of the new ones suspended. |
Send message Joined: 14 Jan 10 Posts: 1409 Credit: 9,325,730 RAC: 9,392 |
No or not enough Theory jobs available. Some BOINC-tasks fail after about 13 minutes run time: EXIT_NO_SUB_TASKS https://lhcathome.cern.ch/lhcathome/result.php?resultid=158871945 or when you're lucky and get a first task, a big chance you don't get a second and the task ends early. https://lhcathome.cern.ch/lhcathome/result.php?resultid=158871885 |
Send message Joined: 24 Oct 04 Posts: 1163 Credit: 53,679,698 RAC: 52,047 |
I am having better luck with the -dev multi-core Theory tasks I would switch to the Atlas tasks here except for those huge vdi's that take half a day to d/l on each host. Rather get some SixTracks here now (and let me know we have those before they are all loaded on computers that don't do these VB tasks at all) |
Send message Joined: 14 Jan 10 Posts: 1409 Credit: 9,325,730 RAC: 9,392 |
No or not enough Theory jobs available. It looks like a temporary hickup - Normal job delivering now. |
Send message Joined: 14 Jan 10 Posts: 1409 Credit: 9,325,730 RAC: 9,392 |
No or not enough Theory jobs available. Sorry - False hope. The theory VM's are killed one after the other due to lack of sub jobs. |
Send message Joined: 24 Oct 04 Posts: 1163 Credit: 53,679,698 RAC: 52,047 |
(just thought I would bring this up since staring at 9 computers and trying to get 50 tasks running with VB can make a person......) https://lhcathome.cern.ch/lhcathome/result.php?resultid=168740585 OK I just happened to go back to this pc to make sure the tasks were running. The LHC-dev Theory was running with no problem but the Theory task had the good old "Could not connect to vccs1.cern.ch on port 443" after doing the *Testing VCCS connection to vccs1.cern.ch on port 443* With the *nc: getaddrinfo: Name or service not known* between those two. It did have no problem with Testing CVMFS connection to lhchomeproxy.cern.ch on port 3125 Connection to lhchomeproxy.cern.ch 3125 port [tcp/a13-an] succeeded! : nc: connect to lhchomeproxy.cern.ch port 3125 (tcp) timed out: Operation now in progress But as usual the internet speed is not fast enough to do the simple connection with the Cern server and get the credentials so it can then get to HTCondor ping.........so the second try is just a 25 minute computer error and it is a good thing I always try to set the Boinc Manager at *no new tasks* so it does't keep doing that until it gets lucky and it starts (no a squid will not help me get credentials) So along with the ones that do that *VM Completion Message: Condor exited after 774s without running a job.* after 36 minutes running......well it is always a good thing to NOT let them auto-run with BM set to get new tasks all the time. (of course if it was just running SixTrack tasks non-stop there would not be this problem.....but....) Volunteer Mad Scientist For Life |
Send message Joined: 24 Oct 04 Posts: 1163 Credit: 53,679,698 RAC: 52,047 |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=168746258 I just saw this one happen and wondered why it was all of a sudden finished after almost 7 hours.....so I check here on the stderr and once again the VM Heartbeat file specified, but missing heartbeat Of course I am just pointing this out and know there is no actual reason or fix and I have been running these VB tasks daily 24/7 longer than anyone. (and probably spend too many hours with a monitor in front of me) .....just talking to myself.....no big deal and nothing new Volunteer Mad Scientist For Life |
Send message Joined: 2 May 07 Posts: 2220 Credit: 173,696,209 RAC: 24,770 |
343600482 182231829 10548292 8 Feb 2022, 3:39:48 UTC 8 Feb 2022, 4:05:42 UTC Fertig, Warte auf Bestätigung 1,379.91 898.31 ausstehend Theory Simulation v300.06 (vbox64_theory) windows_x86_64 |
Send message Joined: 28 Sep 04 Posts: 719 Credit: 48,122,289 RAC: 32,152 |
Today the we have been granted a lot more tasks our hosts to download. For the past couple of years the amount of tasks we could get was 8 tasks for a host. At least on my two computers with 4/8 and 8/16 CPU cores. New limits seems to be 63 per host, maybe this value comes from my cache size (1 day + 0.1 days). Let's hope I can crunch all of them before deadline. |
Send message Joined: 28 Sep 04 Posts: 719 Credit: 48,122,289 RAC: 32,152 |
Today the we have been granted a lot more tasks our hosts to download. For the past couple of years the amount of tasks we could get was 8 tasks for a host. At least on my two computers with 4/8 and 8/16 CPU cores. New limits seems to be 63 per host, maybe this value comes from my cache size (1 day + 0.1 days). Let's hope I can crunch all of them before deadline. This may have been a temporary setting as after more careful look it seems that I haven't downloaded any new theory tasks since about 11 local time (9 UTC). |
Send message Joined: 27 Sep 08 Posts: 829 Credit: 687,480,613 RAC: 179,904 |
The cache size settings are only applied if you have set both Max # jobs & Max # CPUs to no limit, in my observations. However, this has other consequences that you need to manage. |
Send message Joined: 28 Sep 04 Posts: 719 Credit: 48,122,289 RAC: 32,152 |
The cache size settings are only applied if you have set both Max # jobs & Max # CPUs to no limit, in my observations. OK, my settings are 4 CPUs and 'No limit' for number of tasks and that's how they have been for ages. So something changed today server side that made my theory download limits jump high. Anyway, now my task buffer is diminishing (currently at 50 & 58 theory tasks) while new Atlas tasks are being replenished when old ones finish. |
Send message Joined: 28 Sep 04 Posts: 719 Credit: 48,122,289 RAC: 32,152 |
It happened again today. Both Hosts have now 80 theory tasks downloaded. So is this is the new normal? |
Send message Joined: 2 May 07 Posts: 2220 Credit: 173,696,209 RAC: 24,770 |
Theory have the most unsend tasks and the running tasks growing from 2k yesterday to 4k today. seeing also some more tasks downloading (only 3-5 per Boincmanager) |
Send message Joined: 27 Sep 08 Posts: 829 Credit: 687,480,613 RAC: 179,904 |
Could be I saw something like you described, like the limit on task was something like 48 or 4x number of cores, this is why I went to no limit for both as for my 56 core Xeon it didn't actually send enough to fill, but it could be doable now based on you observations. |
Send message Joined: 4 Oct 21 Posts: 10 Credit: 43,152,980 RAC: 21,128 |
They increased the limit to the level of sixtrack even though sixtrack has a lower average computation time. Seems after cms, I´ll have to abort en masse theory workunits also. |
Send message Joined: 2 May 07 Posts: 2220 Credit: 173,696,209 RAC: 24,770 |
After finishing of Theory-native in CentOS8-VM File output.tgz is not deleted in slot - Boinc 7.16.11, so the Slotnumber is growing in a free Slotnumber. Have to delete it manually! |
©2024 CERN