Message boards : Theory Application : Another day of Server problems
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1115
Credit: 49,721,096
RAC: 14,340
Message 34962 - Posted: 11 Apr 2018, 18:32:53 UTC
Last modified: 11 Apr 2018, 19:17:06 UTC

https://lhcathome.cern.ch/lhcathome/results.php?userid=5472&offset=0&show_names=0&state=6&appid=13
[ERROR] Could not connect to vccs1.cern.ch on port 443

The second time lately only while I am asleep so on the hosts I leave running able to ask for more work these will just keep running Invalids over and over until the server just might stop sending.

I leave all my 8-core hosts loaded with tasks and then not able to get more when this happens since when this happens there could be thousands of these Invalids and that is something I do not ever like to see here.

Of course as usual only VB tasks do this and since I am only running Theory and CMS those are the only ones I have running Invalids.

......and now I go upstairs to see what all my other computers are doing (probably nothing if they lost all those tasks)

Of course if I was just running Sixtracks this would not be happening.

Edit: as I expected.......this is just the first one I have checked so far and it is a good thing I have the four 8-cores set to not get new tasks so they only lost about 30 tasks each......and of course this on each one.



(it should say *Server Error* instead of computer error)

Edit2: well I have them all up and running again and it looks like the server is doing its part this time so I only got 90 Server Error tasks this time.........I will be watching them all day as usual.....so about the next 16 hours.
Volunteer Mad Scientist For Life
ID: 34962 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,804,831
RAC: 127,553
Message 34963 - Posted: 11 Apr 2018, 19:49:24 UTC

Hi Magic,
have a HomeServer2011 with Theory and CMS avalaible.
The possiblitiy for the Server is, that if one Project have a problem, the other Project get tasks.
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10474793.
BUT, the yellow triangle is also there. Only for Theory.
ID: 34963 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1115
Credit: 49,721,096
RAC: 14,340
Message 34967 - Posted: 12 Apr 2018, 5:54:11 UTC - in response to Message 34963.  

Well I do have CMS running on one 8-core but did lose 25 of them earlier today and once I reloaded and started those they are working for over 10 hours now.

But still trouble with the Theory tasks so I have been losing them and right now watching the VM Console they are taking longer than usual to get past HTCondor Ping

I just got on of the 8-cores running them all again but I have all the other computers ethernet unplugged and am going to just try one at a time.

Hate to have to wait until after 2am to get them all running.

I haven't had the the internet-Cern server problem for a couple months and I sure hope this isn't going to happen again since it seemed like it was fixed on the Cern end.

OK I'm switching over to the next one to see if it will get back running again and maybe restart those CMS after I get all these running again.

(or end up running Sixtrack tasks since they don't need to be online once they are started)
ID: 34967 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 373
Credit: 238,712
RAC: 0
Message 34969 - Posted: 12 Apr 2018, 7:50:10 UTC - in response to Message 34962.  

The machine vccs1.cern.ch no longer exists and in the latest code the test is using vccs.cern.ch.

My guess is that CVMFS is not updating properly, maybe due to the network. If the problem persists, I will release a new image that contains these files. The image was last updated a year ago so the cache may be a little stale by now.

btw are you still running your 32bit machine?
ID: 34969 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1115
Credit: 49,721,096
RAC: 14,340
Message 34971 - Posted: 12 Apr 2018, 8:29:19 UTC - in response to Message 34969.  
Last modified: 12 Apr 2018, 8:31:16 UTC

The machine vccs1.cern.ch no longer exists and in the latest code the test is using vccs.cern.ch.

My guess is that CVMFS is not updating properly, maybe due to the network. If the problem persists, I will release a new image that contains these files. The image was last updated a year ago so the cache may be a little stale by now.

btw are you still running your 32bit machine?


Yeah that may be the problem Laurence

I still have the X86 but it has been taking a break for a while (just to save a couple dollars) I was running the tasks that still had the 32bit versions (it was until March 21st and I think I ran Sixtracks and some Theory tasks for a month or so) but if it is ever needed to test or run some 32bit I will fire it back up again.
ID: 34971 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,804,831
RAC: 127,553
Message 34972 - Posted: 12 Apr 2018, 8:59:37 UTC - in response to Message 34969.  
Last modified: 12 Apr 2018, 9:05:23 UTC

My guess is that CVMFS is not updating properly, maybe due to the network. If the problem persists, I will release a new image that contains these files. The image was last updated a year ago so the cache may be a little stale by now.

-dev is using Multicore for Theory and CMS.
Is this possible for the production to use?

This is the stats for x86 at the moment in production:
Microsoft Windows (98 or later) running on an Intel x86-compatible CPU 263.50 (vbox32) 8 Oct 2017, 10:47:42 UTC 37 GigaFLOPS
ID: 34972 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1273
Credit: 8,480,147
RAC: 2,155
Message 35029 - Posted: 18 Apr 2018, 5:56:01 UTC - in response to Message 34969.  

...
btw are you still running your 32bit machine?

I'm also running Win32 with Theory tasks on a quad core with OS Win10, but mostly only running 1 Theory at the time.

They're running fine. Only problem is that very often vm_image.vdi is left behind in VirtualBox Media Manager.
ID: 35029 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1115
Credit: 49,721,096
RAC: 14,340
Message 35031 - Posted: 18 Apr 2018, 7:38:48 UTC - in response to Message 35029.  

...
btw are you still running your 32bit machine?

I'm also running Win32 with Theory tasks on a quad core with OS Win10, but mostly only running 1 Theory at the time.

They're running fine. Only problem is that very often vm_image.vdi is left behind in VirtualBox Media Manager.


Mine isn't running right now but is the X86 Xp Pro on the old 3-core phenom and since we can only use 4GB ram on a X86 I could only run 2 Theory tasks at a time.
It would run 3 for a while until it ran out of memory so I just set it to run only 2 Theory tasks

BUT I have that vdi problem on the Win7's and Win10's so I check mine twice a day and remove them myself (they are all running the newest versions of VB and Boinc) .....so that is what I am doing right now since it is after midnight here.

I could change the X86 OS to a X64 and then add more ram but I just want to see how many decades this 32bit XP Pro will run with no problems ever.
I have its twin computer here and a few years ago I changed it to X64 Win7 and added memory so it has 8GB now (DDR2) and since then updated it to Win10 back when they first started testing it.
ID: 35031 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1273
Credit: 8,480,147
RAC: 2,155
Message 35033 - Posted: 18 Apr 2018, 8:09:32 UTC - in response to Message 35031.  

Mine isn't running right now but is the X86 Xp Pro on the old 3-core phenom and since we can only use 4GB ram on a X86 I could only run 2 Theory tasks at a time.
It would run 3 for a while until it ran out of memory so I just set it to run only 2 Theory tasks
...

My x86 machine is a rather newer, but slow tablet with only 2GB RAM. Win10 steals more than 1GB of it :-(
I was able to run 3 Theory's with 256MB, but without rebooting every few days, the system freezes.
So I decided to run only 1 Theory with 384MB configured in app_config.xml
With 4 GB RAM and VM memory reduced via app_config.xml, you should be able to run 3 Theory's.
ID: 35033 · Report as offensive     Reply Quote

Message boards : Theory Application : Another day of Server problems


©2024 CERN