Hi, thanks for posting.
Regarding the long startup times that have been causing issues for you, can you provide any more information on which activities during startup seem to be taking a long time? Broadly, there are 3 different things the engine needs to do before it can run tests:
1. Load the projects that are open in VS (progress is described by the 'Loading projects' indication shown on any of the major tool windows)
2. Build the projects and analyse their assemblies for tests (happens once the engine has loaded everything, but needs to be done before tests can be executed. This is shown with all the build/analysis tasks in the processing queue).
3. Bootstrap the engine and load the cache file (described by all the tasks displayed on the loading indicator except for the loading of projects)
Possibly it's a combination of all 3, but knowing whether any of the above is more significantly contributing to the problem will help us with our own performance optimisation.
Regarding the nodes not taking much of the load off local, do you notice this more on the initial run-through of the engine, or does it seem to be a pattern across your entire NCrunch session?
The engine doesn't usually initiate connections to the nodes until it has completed its bootstrap sequence. Because there is some synchronisation that needs to happen before a node can service any tasks, this can mean that the nodes will lag a bit behind local in their initial responsiveness.
The integration with grid nodes is structured as a pull-based system. Once a node is synchronised and ready for work, it will send a request to a connected client to provide it with work. On receiving this request, the client examines the contents of its processing queue and will send through a set of tasks that the node is able to process. Because this transfer of tasks involves network overhead, the offloading of work onto a node is always slower and less responsive that the client's local processor (which is able to pull tasks straight out of memory). To be able to complete a task, the node also needs to be able to transfer all results (including coverage and trace data) over the network. The faster the connection, the less of an issue this generally is.
One potential issue that can arise is related to the capacity of the engine itself to coordinate and process tasks. In your use case, the engine needs to coordinate and process the results of 56k tests over the space of a single pass. That's a lot of tests, and each one has its own set of coverage data that needs to be merged and mapped into a local database. Because all this data is generated using background runners (and grid nodes), it's possible for the background runners to outrun the engine's ability to coordinate the work and process results in a timely way.
It's possible to track this by keeping your cursor over the NCrunch spinner in the corner of your IDE, where you'll be able to see the core engine load and the number of tasks being processed. If the core load is sitting at or near 100% but the tasks being processed is not near the max of the bar, this means the engine is overloaded and adding more capacity (in terms of grid nodes) is not likely to be beneficial. In this situation, because the nodes are sitting down the end of a network connection, they tend to have less priority than the local processor and are more likely to be underutilised. Unfortunately, we don't have any firm guidelines on how much load the engine can handle, as this is extremely variable depending on the environment and the characteristics of your solution. If you have a vast number of fast executing tests with each covering a reasonable amount of code, you are more likely to hit the limits of the engine sooner than if your tests are chunkier and more isolated.
In terms of getting a general overview of your run, I highly recommend doing an export of the Timeline report using the export button on the Tests Window after you've completed a full pass of all your tests.