Great, this is really useful information.
With the node being at 100% CPU, the problem is definitely related to its ability to keep up with the number of execution threads assigned to it. If you drop the
max number of processing threads on the VM node to something lower (i.e. 2 or 3), you may see an improvement in response times.
As a general target you'll want to see the node's CPU fluctuating between 70% and 100% CPU while it's under maximum load. If it stays stuck at 100% CPU, then most likely it's unable to keep up with the demand. This might be a good thing for overall throughput (i.e. you're not wasting any CPU), but it's bad for response times, as tests and builds will take longer to finish. The node also needs a bit of CPU to be able to manage the connections and exchange data with clients.
A useful test is sometimes to write a series of very CPU intensive tests (just a big for-loop that concatenates strings is often enough), then set these to work on the grid node. Compare the execution time of these tests with what you see when running them locally on your workstation. If there is a big difference in the processing times, then it's a sure sign that the node is either underpowered relative to the workstation, or it's overloaded.
What is the nature of the virtualisation environment you're using to run the node VMs? Do you have any opportunity to consolidate resources into fewer but larger VMs? Doing so will reduce the overhead of the engine and it may give you a bit more power to work with.
Considering the size of your solution and the specification of your workstations, the benefits of using the NCrunch grid for you may be limited to very specific scenarios that would depend upon what you are trying to test. Your workstations will easily outperform the node VMs in their current spec, and they aren't shared resources, so you may want them to be handling the bulk of your processing while delegating very specific tests to the nodes. Underpowered nodes can sometimes be useful for running tests that are problematic or inconvenient to run on workstations. For example, if you have tests that need to interact with the UI, having them popping up windows all over a workstation's desktop can be a problem for continuous testing, but not when they are run on a remote server. It may also be useful to use the nodes to test the performance of code that may behave differently in environments with constrained resources (i.e. testing for race conditions in multi-threaded code). You can use
capabilities to determine where the tests should be run on the grid.
Something else to consider is that for some solutions, additional processing power can have limited (or even negative) benefit. For example, if you have a solution with 20,000 tests with a total execution time of 20 seconds to run synchronously end-to-end, the overhead of splitting up and managing all of these tests across multiple machines would likely increase the overall processing time well beyond the normal 20 seconds. This is in contrast to a solution with 20,000 tests where each test takes around 5 seconds to run (total end-to-end time of 28 hours), where splitting the tests up across multiple machines could achieve a very significant reduction in overall processing time.