Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Grid Node Server on Azure VM: Start-up Issues
SethO
#1 Posted : Saturday, February 8, 2014 4:16:07 AM(UTC)
Rank: Newbie

Groups: Registered
Joined: 12/19/2013(UTC)
Posts: 4
Location: United States of America

My team runs an NCrunch Grid Node Server on a 4-core Azure VM. Once we connect, it works fantastic.

However, each night we shut down our VMs to keep costs down. In the morning, when they start up, we cannot connect. We have seen this issue for every version of the beta. To "fix" it, we log on to the box, go to the Grid Server configuration, and click "OK". This restarts the service and all our clients immediately connect successfully.

I'm not sure if anyone else has seen this problem. I can tell you that this does not happen on a couple Grid Servers sitting on spare desktop workstations (even when we restart them). Those work as expected.

Thanks for any input/suggestions. I can provide additional details if needed.

-Seth
Remco
#2 Posted : Saturday, February 8, 2014 7:50:04 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 7,123

Thanks: 957 times
Was thanked: 1286 time(s) in 1193 post(s)
Hi Seth,

Thanks for sharing this issue. Making sure that the grid node service works correctly in Azure is fairly important as much of the intention behind the distributed processing is to make it work well with the cloud, so I'm very keen to help get this issue resolved.

The first thing I'm wondering if you can just confirm that the machines are completely shut down overnight and need a full software reboot in the morning - and that they aren't just suspended? I'd think that this is the case, but it's worth checking to be sure.

The next thing of interest would be to see what's happening in the log file. If you enable logging in the grid node configuration, then inspect the log file after the service has 'ghosted', is there any interesting information showing? Usually the service will report when it starts listening for connections. Does it show any binding errors? How do those first few steps in the log file compare with when the service is manually restarted?

I wonder if it's possible that the startup sequence on the VM is such that the service tries to bind to the network before the network has been fully initialised. A workaround to this could be to try and find a way to start the service later during the boot to ensure everything is ready for it to start listening. Changing the 'Startup type' to 'Automatic (Delayed Start)' for the service inside the Windows Service Configuration may very well do the trick here.


Cheers,

Remco
SethO
#3 Posted : Monday, February 10, 2014 4:45:43 PM(UTC)
Rank: Newbie

Groups: Registered
Joined: 12/19/2013(UTC)
Posts: 4
Location: United States of America

Each night, we issue a Stop-AzureVM PowerShell command, and each morning we start them with Start-AzureVM. When stopping, we pass the -Force (and, more importantly, we do not pass the -StayProvisioned) flag to completely deallocate that VM. This releases the IP address and we are no longer charged. As a side-note, I can replicate this behavior just using the "Stop" and "Start" controls from the Azure Portal (web) Virtual Machine management console, so PowerShell is not a likely culprit.

I enabled logging on the Grid server and executed a Start/Stop on the VM. The only entry in the log is:
Quote:
[16:31:07.8722-?-4] Node server started - listening on port 41141

If I restart the Grid service (from inside the VM) - and remove my IP addresses - the log looks like:
Quote:
[16:24:32.2103-?-4] Node server started - listening on port 41141
[16:24:33.226-Core-4] Client connection accepted from [ip removed]:39794
[16:24:33.226-Core-4] Creating server-side handler for [ip removed]:39794
[16:24:33.226-Core-4] Describing self to [ip removed]:39794
[16:25:14.6009-Core-13] Client connection accepted from [ip removed]:65096
[16:25:14.6009-Core-13] Creating server-side handler for [ip removed]:65096
[16:25:14.6009-Core-13] Describing self to [ip removed]:65096


We have had similar issues with IIS availability immediately after a VM start-up. Azure will report that the VM is in a "Started" state, but the internal components haven't settled quite yet. I would not be surprised if something similar is happening here.

Let me know if I can test anything out for you. Thanks for your help.

-Seth
Remco
#4 Posted : Tuesday, February 11, 2014 2:47:35 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 7,123

Thanks: 957 times
Was thanked: 1286 time(s) in 1193 post(s)
Hi Seth -

I've just done some extra testing around VM instances in Azure. The VM I've been testing with seems to be working OK, so I have a feeling this is something specific to your instance's configuration. Probably the NCrunch service is probing for network interfaces before all of these interfaces are fully available.

In theory, changing the service startup type of the 'NCrunch Grid Node Service' to 'Automatic (Delayed Start)' should cause the service to start 2 minutes after the machine has finished booting. I hope this will work around the problem. Can you confirm whether it does the trick?


Thanks!

Remco
SethO
#5 Posted : Tuesday, February 11, 2014 4:04:05 PM(UTC)
Rank: Newbie

Groups: Registered
Joined: 12/19/2013(UTC)
Posts: 4
Location: United States of America

Last night, I went to my Services Manager and set the NCrunch Grid Node Service to "Automatic (Delayed Start)" and shut everything down. This morning, after the VMs woke up, I connected w/o having to manually restart the service. This is good!

I'll keep monitoring the issue and will post back if have any other problems.

Thanks for the help,

-Seth
Remco
#6 Posted : Tuesday, February 11, 2014 10:04:25 PM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 7,123

Thanks: 957 times
Was thanked: 1286 time(s) in 1193 post(s)
Great! Thanks for taking the time to share the results :)
GreenMoose
#7 Posted : Wednesday, April 30, 2014 7:19:43 AM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 6/17/2012(UTC)
Posts: 507

Thanks: 145 times
Was thanked: 66 time(s) in 64 post(s)
FYI: I had the same issue, changing it to delayed start on all Azure machines solved it for me as well.

Related uservoice request to avoid having to redo it after each NCrunch upgrade - https://ncrunch.uservoic...rvice-settings-for-grid

(setting it as delayed start via cmd line - sc config NCrunchGridNode start= delayed-auto)
1 user thanked GreenMoose for this useful post.
Remco on 4/30/2014(UTC)
Users browsing this topic
Guest
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

YAF | YAF © 2003-2011, Yet Another Forum.NET
This page was generated in 0.064 seconds.
Trial NCrunch
Take NCrunch for a spin
Do your fingers a favour and supercharge your testing workflow
Free Download