Hi Sheryl -
If the grid nodes have their capacity configured correctly, there shouldn't be a significant increase in time for the test execution, unless we're measuring very short periods of time (ms).
You'll notice that in the Processing Queue, NCrunch will tend to group tests together in batches. This is done because there is overhead involved in calling into the test framework (i.e. NUnit), so NCrunch will create small groups of tests that can run sequentially in a batch based on their execution time. If the 8 tests you've mentioned each take 50ms to execute, they'll likely be placed in the same batch and run sequentially, as this is much more efficient than trying to separate them and call into the test framework 8 times. The situation changes when the tests have a longer execution time (i.e. 4 seconds). In this case, you'll likely see more batches and less grouping, allowing more concurrency. There are other variables here, such as the duration of any test fixture setup routine, but this is the general idea.
So if you're looking at one batch of 8 fast tests in the Processing Queue, then this will always be executed on a single grid node, as it's more efficient to just run the tests together.
If you're looking at say, 4 batches of 2 slower tests each in the Processing Queue, and the first grid node has a capacity of 4 processing threads, then the first node to request work from the client will receive all 4 batches and will run them concurrently. Because the node runs the tests concurrently, there should be little overall difference when compared with other nodes picking up the other batches.
The only time you may see a significant increase in execution time with this setup is where the grid nodes have been configured for a maximum capacity that is well beyond their actual capability. For example, if you have a low-spec single-CPU VM configured with 8 max processing threads, then this node will try to concurrently execute more tasks than it has the CPU to efficiently process. This will result in a significantly higher overall execution time because the node does not have enough power to meet the demand, and the other nodes are sitting idle.