Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Setting password in registry breaks automated NCrunch GridNode installation
josh-grant
#1 Posted : Thursday, May 13, 2021 7:54:18 AM(UTC)
Rank: Newbie

Groups: Registered
Joined: 5/13/2021(UTC)
Posts: 7
Location: Australia

Thanks: 2 times
Hi all!

I'm an engineer whose been working on an automated NCrunch solution that runs in AWS on an windows EC2 instance. I've had a few challenges automating the setup and service start of NCrunch, specifically when setting the Password parameter in the registry after the NCrunch MSI installation finishes. If I leave the password blank then it "just works". Bare in mind I'm trying to create an immutable solution where every night the EC2 instance that is running NCrunch terminates and in the morning a new EC2 instance spins up (Scheduled Autoscaling).

Unfortunately alot of the NCrunch documentation assumes that when setting up distributed processing, that the server/instance/vm is long lived or persistent but I'm trying to do a completely silent installation. This doesn't quite work nicely when creating an automated solution in the cloud. I have tried a number of different ways to try and get NCrunch to work WITH a password but I feel there's some issues when I set the password (which I first hash and encrypt locally before setting it in the registry so that its not in plain test) and then when NCrunch service starts up, I then try to connect to it via my visual studio and then gets stuck on "Negotiating" as it can't complete the handshake of the initial connection.

When I set the password as empty, and I connect, it has no issues.

Some more info on my setup:

* Running latest GridNode server on a Windows server 2019 EC2 instance in AWS
* As part of bootup script (userdata) I run a powershell script that does the following:
- Runs msiexec to install the gridnode server msi silently
- Next I set all the required registry keys for NCrunch (Listening Port, Password, Processors etc)
- Sleep 15 seconds
- Set the gridnode server windows service to Automatic
- Start/Restart the service

Once all the above finishes I then try to connect via my Visual Studio. With a Password set I just get stuck on "Negotiating". Without a Password set I get connected no problems.

Is there something I'm doing wrong or has anyone else had experience with trying to create a fully automated solution on an immutable server and getting stuck on the "Negotiating" step when trying to connect?

Any help would be greatly appreciated as I've exhausted all the NCrunch documentation :(

**FYI No issues with versions or version conflicts. My issue has finally been boiled down to this Password parameter

Regards
Josh
Remco
#2 Posted : Thursday, May 13, 2021 10:08:44 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 6,334

Thanks: 822 times
Was thanked: 1099 time(s) in 1042 post(s)
Hi Josh,

Thanks for posting. Can you share more detail about how you're obtaining the string value of the Password key that you're setting in the registry? Unless this is done in a manner that is identical to the Grid Node configuration tool, the encryption will be seeded differently on the grid node vs the client, and the connection will fail to negotiate. The best way to be sure would be to use the configuration tool on the Grid Node to set the password, then copy the string value out of the password key in the registry and place this in your script.
1 user thanked Remco for this useful post.
josh-grant on 5/13/2021(UTC)
josh-grant
#3 Posted : Thursday, May 13, 2021 12:11:19 PM(UTC)
Rank: Newbie

Groups: Registered
Joined: 5/13/2021(UTC)
Posts: 7
Location: Australia

Thanks: 2 times
Thankyou for the quick reply :)

Here's what I've tried so far with regards to obtaining and setting the password within the registry:

1. When the server first spins up, within the boot script (userdata) it first retrieves an encrypted password from AWS secrets manager, then sets the value as a variable.
2. That variable is then used, to be set as one of the registry values for the NCrunch grid node server Password.
3. Once registry is set, the NCrunch grid node service is then started and then the script ends.
4. At some time after I then attempt to connect via visual studio (which yea gets stuck on "Negotiating")

I've also tried to test just setting a plain text variable but then locally hashing (SHA256) and encrypting it to which I then do steps 2. 3. and 4. with the same issue.

That does make a lot of sense if NCrunch is encrypting the Password in a particular way. Tomorrow when I'm back at work, I'll try doing what you suggested:

Quote:
The best way to be sure would be to use the configuration tool on the Grid Node to set the password, then copy the string value out of the password key in the registry and place this in your script.


Thank you again for assisting, will update once I've tried the above :)

josh-grant
#4 Posted : Thursday, May 13, 2021 1:39:16 PM(UTC)
Rank: Newbie

Groups: Registered
Joined: 5/13/2021(UTC)
Posts: 7
Location: Australia

Thanks: 2 times
I just tried the above but unfortunately I still get the same issue, stuck on Negotiating. This is the log out put that's consistent with the issue:

Logs from Grid Node Server:
Code:

[PID:6908 23:48:27.2497 ?-4] Node server started - listening on port 8888
[PID:6908 23:48:30.7967 ?-10] Unable to complete opening a server-side socket connection because the socket was closed before a connection could be fully established

What's strange is that when I RDP onto the Grid Node server and open the NCrunch configuration tool, if I update the Listening Port, I can then connect via Visual
Studio successfully. By updating the Listening Port, I just change the port, press OK then change the port back to the original and Press OK.

Logs from Grid Node Server:
Code:

[PID:7336 23:52:34.7116 ?-4] Node server started - listening on port 8880
[PID:7336 23:52:48.0706 ?-10] Sending 16 bytes of authentication data
[PID:7336 23:52:48.0706 ?-10] Server-side connection established on ****

Is there another way of setting the Listening Port and Password and other configuration parameters during the initial .msi installation? instead of having to set the registry keys?
Remco
#5 Posted : Thursday, May 13, 2021 11:24:39 PM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 6,334

Thanks: 822 times
Was thanked: 1099 time(s) in 1042 post(s)
I suspect this may be caused by a hung socket. We get these in our own dev environments, and they tend to be caused by how the O/S handles sockets left open by terminated processes.

Basically, you can have a top-level .EXE (or service) that opens up a TCP listening socket on a specific port. That EXE then spawns a sub-process (i.e. NCrunch.BuildHost), which spawns other sub-processes (VBCSCompiler.exe). Let's say the top level EXE hosting the socket gets suddenly terminated and doesn't have a chance to properly close the socket. Normally, you'd expect the O/S to release the socket so that a newly spawned EXE could then make use of it ... but actually this doesn't happen, because somehow existence of the sub-processes seems to cause the O/S to keep the sockets open. Probably there is some kind of inheritance happening in there.

In short, make sure you terminate ALL sub-processes launched by the grid node if you terminated the grid node service. This includes the platform related processes like the Roslyn compiler, .NET Host, etc.

josh-grant;15463 wrote:

Is there another way of setting the Listening Port and Password and other configuration parameters during the initial .msi installation? instead of having to set the registry keys?


The MSI is not a very flexible way to go when customising things. You might want to look at just extracting the ZIP file of the Grid Node instead and using NCrunch.GridNode.Console.EXE instead of the service. As long as you're still able to set up the registry keys, this will give you more direct control over the installation and life-cycle of the node. It looks to me like you've managed to resolve the password issue now, so I think it's just about getting a clean socket from here.
josh-grant
#6 Posted : Friday, May 14, 2021 12:21:17 AM(UTC)
Rank: Newbie

Groups: Registered
Joined: 5/13/2021(UTC)
Posts: 7
Location: Australia

Thanks: 2 times
Oh that definitely sounds like what's happening, I'll try the .zip installation + console.exe to see instead and see if I can get past the hung sockets issue.

Thank you again for the quick responses and help with this! this is the final 1% to get this all fully automated which will be a huge benefit to the teams I work with :)

Will update with my findings!
josh-grant
#7 Posted : Monday, May 17, 2021 5:11:25 AM(UTC)
Rank: Newbie

Groups: Registered
Joined: 5/13/2021(UTC)
Posts: 7
Location: Australia

Thanks: 2 times
Hi there,

Thanks again for the troubleshooting suggestions, I did get some success but only temporarily :(

Quote:
The MSI is not a very flexible way to go when customising things. You might want to look at just extracting the ZIP file of the Grid Node instead and using NCrunch.GridNode.Console.EXE instead of the service. As long as you're still able to set up the registry keys, this will give you more direct control over the installation and life-cycle of the node. It looks to me like you've managed to resolve the password issue now, so I think it's just about getting a clean socket from here.


What I tried next was the following:

* Unzipped the gridnode server package to C:\Ncrunch Grid Node Server
* Ran the following:
Code:
sc create "NCrunch Grid Node Service" binPath= "C:\NCrunch Grid Node Server\NCrunch.GridNode.Service.exe" start= auto

(FYI You might want to update the manual installation docs to use "sc.exe" instead of "sc" as I got set-content issues in newer versions of powershell)
* I then tried running the console.exe but it wasn't listening on the correct port.
* I then hesitantly used the wizard (as my solution can't use wizards, I just wanted to get something working again) to force a port change
* I then started the console.exe again and it had the correct port.
* I tried to connect via my visual studio and just stuck on connecting :(

Today I also just added a "restart-computer" at the end of my original boot script to see if that would help with clearing sockets, which seemed to work the first 2 times I terminated the EC2 instance and waited for a new one to pop up, to which I could connect successfully. On the 3rd termination as I have never been able to have a stable Ncrunch server, it failed and now I'm back to the "Negotiating" issue :(

I gotta admit, this is really frustrating and I may have to ditch this effort if I can get this automated, which is frustrating because it sometimes works, then sometimes doesnt :(
Remco
#8 Posted : Monday, May 17, 2021 8:08:31 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 6,334

Thanks: 822 times
Was thanked: 1099 time(s) in 1042 post(s)
josh-grant;15466 wrote:

* Ran the following:
Code:
sc create "NCrunch Grid Node Service" binPath= "C:\NCrunch Grid Node Server\NCrunch.GridNode.Service.exe" start= auto

(FYI You might want to update the manual installation docs to use "sc.exe" instead of "sc" as I got set-content issues in newer versions of powershell)


When using NCrunch.GridNode.Console.exe, you won't need to register the service. Actually, registering the service could give you trouble, as you might end up with two different processes fighting over the same listening port. Basically, you can skip this step.

That's a good tip about sc.exe. I've updated the instructions.

josh-grant;15466 wrote:

* I then tried running the console.exe but it wasn't listening on the correct port.


All settings for the node (console or service) go via the registry. You'll want to make sure this is updated with your chosen settings and password well before you start the application.

josh-grant;15466 wrote:

* I then hesitantly used the wizard (as my solution can't use wizards, I just wanted to get something working again) to force a port change


I agree the wizard doesn't make sense in your scenario. It's intended for streamlining configuration of a manually installed node with some basic settings, but isn't suitable for specific configuration changes. For this one, it's better to use the grid node's configuration tool instead (with your registry automation for the proper boot sequence).

josh-grant;15466 wrote:

* I tried to connect via my visual studio and just stuck on connecting :(


I know this probably seems like I'm reading off a script, but check the firewalls. If the client is stuck on 'Connecting...' it means no TCP connection could be established at all. This usually points to a network issue of some kind.

josh-grant;15466 wrote:

Today I also just added a "restart-computer" at the end of my original boot script to see if that would help with clearing sockets, which seemed to work the first 2 times I terminated the EC2 instance and waited for a new one to pop up, to which I could connect successfully. On the 3rd termination as I have never been able to have a stable Ncrunch server, it failed and now I'm back to the "Negotiating" issue :(


Not sure about this pattern. Does your script change anything in the settings on boot? Is there perhaps something in the boot sequence that might mess with the node? Try turning on detailed logging in the node settings and see if you get any indication in the logs of a connection trying to happen.

josh-grant;15466 wrote:

I gotta admit, this is really frustrating and I may have to ditch this effort if I can get this automated, which is frustrating because it sometimes works, then sometimes doesnt :(


I think it's fair to be frustrated on this one.

We didn't design the grid node with this use case in mind. Generally, when people set up grid nodes in cloud based instances, they'll install the node and configure the machine, then either clone the pre-configured instance as needed or just leave it asleep to be woken as required. One of the biggest problems of the MS tool stack IMHO is that it is heavily reliant on system install state which makes fully programmable instances difficult to automate. At the time the grid nodes were originally developed, everyone was using VS2012, which needed to be present on the grid node in one form or another (i.e. build tools). Having an easy to automate installation of the NCrunch grid node seemed of little use when the node would likely require hours of manual configuration and setup anyway.

I would very much like the toolset to be a lighter, simpler thing to install. It's definitely getting better, so that's positive at least.

If setting up nodes in this way becomes more popular, we could look at making things simpler when it comes to settings. Like letting the node use an XML file instead of the registry, or letting you pass in configuration overrides via command-line.
josh-grant
#9 Posted : Monday, May 17, 2021 8:34:10 AM(UTC)
Rank: Newbie

Groups: Registered
Joined: 5/13/2021(UTC)
Posts: 7
Location: Australia

Thanks: 2 times
Honestly solid replies :D

Yea I quadruple checked network settings and made sure that was eliminated before starting the original thread :) I totally understand with regards to how difficult Microsoft made things for themselves over the years haha, but it is getting alot better these days :D

You've already been a huge help so far and I've gotten some really good insight. I definitely think if some parameters could be passed in as arguments during installation like $password and $port etc, it would be a huge win and would mean I could also scale this solution to other teams alot easier too. I did actually try creating an image of a working configuration and then just trying to run it fresh on a new EC2 instance but ended up getting into the same issue with "Negotiating"

(Unable to complete opening a server-side socket connection because the socket was closed before a connection could be fully established)

And this doesn't always happen either which kinda annoys me more haha, it sometimes works and sometimes doesn't :/

I haven't totally given up yet on trying to 100% automate this haha - it's just this one last issue with not completing a server-side socket connection...

Thank you again so far for the assistance on this too, really appreciated
Remco
#10 Posted : Wednesday, May 19, 2021 12:44:12 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 6,334

Thanks: 822 times
Was thanked: 1099 time(s) in 1042 post(s)
josh-grant;15468 wrote:

You've already been a huge help so far and I've gotten some really good insight. I definitely think if some parameters could be passed in as arguments during installation like $password and $port etc, it would be a huge win and would mean I could also scale this solution to other teams alot easier too. I did actually try creating an image of a working configuration and then just trying to run it fresh on a new EC2 instance but ended up getting into the same issue with "Negotiating"

(Unable to complete opening a server-side socket connection because the socket was closed before a connection could be fully established)


Do the logs on the server provide any more information on this?
josh-grant
#11 Posted : Wednesday, May 19, 2021 2:38:05 AM(UTC)
Rank: Newbie

Groups: Registered
Joined: 5/13/2021(UTC)
Posts: 7
Location: Australia

Thanks: 2 times
Unfortunately with Logging set to "DETAILED", I only get:

Quote:
(Unable to complete opening a server-side socket connection because the socket was closed before a connection could be fully established)


as the last entry in logs (No log output after this point)


** Update:

I've managed to kind of work around this issue by setting the service to "DelayedStartAutomatic" and then waiting a period (tests have shown, after around the 30-35minute mark) before trying to connect to NCrunch via Visual Studio. The last few days have proven to be successful with this "delay" which isn't a huge issue as I schedule a new server to spin up at 7am every morning and Devs don't start connecting till around 8:30-9am so it gives the NCrunch configuration and service some time to get ready.

Even with digging into Process monitor and Event viewer events to maybe understand a bit more on what's happening under the hood, but haven't seemed to find anything obvious. I'm a bit pedantic but I'm happy with achieving a relatively automated solution to spin up Ncrunch in AWS :) If you're interested this was a general idea on the specs:

- Using C4.2Xlarge EC2 type (CPU intensive)
- Windows Server 2019 Base
- Running in internal network / VPC
- Security Group / Firewall only allows NCrunch port ingress for connectivity and egress to internet and internal network
- This solution runs inside an Autoscaling Group so that I can schedule terminations and new instances every morning (Off at 7pm, new instance at 7am, weekdays)
- This solution also uses a "launch template" or bootstrap script that automates a silent install of vs_buildtools.msi and ncrunch_grid_node_server.msi, Modifies registry values and then finally waits a few seconds before starting up the NCrunch service

Thanks again for the help!
Remco
#12 Posted : Wednesday, May 19, 2021 7:03:33 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 6,334

Thanks: 822 times
Was thanked: 1099 time(s) in 1042 post(s)
josh-grant;15470 wrote:

I've managed to kind of work around this issue by setting the service to "DelayedStartAutomatic" and then waiting a period (tests have shown, after around the 30-35minute mark) before trying to connect to NCrunch via Visual Studio. The last few days have proven to be successful with this "delay" which isn't a huge issue as I schedule a new server to spin up at 7am every morning and Devs don't start connecting till around 8:30-9am so it gives the NCrunch configuration and service some time to get ready.


I have a theory that this might be due to a network initialisation delay related to the AWS system. I vaguely remember having had similar problems in situations where I've worked with AWS in years prior. It may be a driver issue of some kind. Probably it's a common problem and there is likely to be some tips around on how to solve it or reduce the delay.

josh-grant;15470 wrote:

- Using C4.2Xlarge EC2 type (CPU intensive)
- Windows Server 2019 Base
- Running in internal network / VPC
- Security Group / Firewall only allows NCrunch port ingress for connectivity and egress to internet and internal network
- This solution runs inside an Autoscaling Group so that I can schedule terminations and new instances every morning (Off at 7pm, new instance at 7am, weekdays)
- This solution also uses a "launch template" or bootstrap script that automates a silent install of vs_buildtools.msi and ncrunch_grid_node_server.msi, Modifies registry values and then finally waits a few seconds before starting up the NCrunch service


Great choice of configuration. Nice to be able to spin them up as you need them. I bet you're noticing the improvements in testing responsiveness.
1 user thanked Remco for this useful post.
josh-grant on 5/19/2021(UTC)
Users browsing this topic
Guest
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

YAF | YAF © 2003-2011, Yet Another Forum.NET
This page was generated in 0.126 seconds.
Trial NCrunch
Take NCrunch for a spin
Do your fingers a favour and supercharge your testing workflow
Free Download