Thanks Remco for the continued top notch support. I much appreciate your prompt replies and I regret sending so many questions your way.
Remco;10894 wrote:The problems you've described I think are quite different in nature to Matthew's. The exception you've provided looks to be thrown in the server side processing code. I don't think that this error is network related. Probably you've managed to find a flaw in this code that may be surfaced by a certain sequence of actions or possibly the structure of your solution. How often do you see this error? Has it started occurring recently, or has it been going on for some time?
We are just getting started with trying out the node farm and I've been seeing it since we started. It does seem to happen more often after we've first had a few successful test runs. We're not yet at the point of continuous testing, but have instead been mostly manually kicking off SpecFlow test suite trials and using NCrunch to parallelize the load. In terms of possibly unusual usage, those tests run for a minute or longer, and use Selenium to drive FireFox. This does all seem to work though.
Part of why I thought it might be network related is because today I've been working via VPN, and my connection has seemed slow and slightly unstable. At the same time I've been seeing that Object Not Set error pop up on all of the nodes (but not simultaneously) even early on while they were still first "Initializing". The prior occurrences however were while I was on the office LAN.
Remco;10894 wrote:I may have a more concrete explanation for the problem of nodes being stuck in 'Negotiating'. When NCrunch runs tests on the grid node, it spawns a number of child processes (i.e. nCrunch.TestHost, nCrunch.BuildHost, etc). Under later versions of VS, these child processes can indirectly spawn new child processes through the tool stack (for example, building a project always starts up VBCSCompiler.exe). These third-tier child processes don't have a lifespan under NCrunch's control, but are often still considered to be children of the grid node root process in the Windows process hierarchy. If a process is suddenly terminated or restarted without completely closing down open sockets, Windows will usually close the sockets itself when the process terminates. However, there is an exception to this if the process being terminated has children that are still active, in which case Windows keeps the sockets reserved for some reason and anything remotely connecting to them gets a 'ghost' connection.
I've yet to find a clean way to handle the above scenario. The grid node does clean up its socket connections on termination, but sometimes a forceful restart may kick in before this happens. Keep an eye out for this and see if it might be the cause of some of your problems.
Ah, this makes some sense. Indeed we're on VS 2017. I don't mind some kind of additional workaround process to mitigate that issue, but what should we do? Is there anything better than restarting the VM when this happens? Is there any way to restrict our usage to lessen the chance we trigger it? It would be much better if we didn't have to restart the NCrunch service, but could instead manually kill the 'ghost' somehow.