Hi,
The situation you've described is one of the primary problem areas NCrunch's distributed processing is intended to address, so I think it would be very much worth your time investigating this to see if you can make it work for you.
The NCrunch solution itself has a total test running time of over 3 hours. This is cut down to about 30 minutes using 38 processors over two machines, with over 99% of tests run in the first 5 minutes and a near-certainty of a green build in even less time. The impact of this on development speed is very significant - it saves hours and hours of time every day.
The benefits you'll receive from distributed processing will depend very much on:
1. The nature of the tests (particularly whether they can be run in parallel on the same machine)
2. The speed of the available hardware
3. The speed of the network connection between the NCrunch client and the available grid nodes
4. The method of allocation of grid nodes between different clients
A critical consideration is the number of nodes in your grid. Virtualization essentially gives you two options, and a whole lot in between:
Option 1: Lots of small nodes (with maybe only a single virtual CPU, or very few CPUs)
Option 2: A few big nodes (with lots of virtual CPUs + fast I/O)
Because the NCrunch engine needs to replicate your solution across the grid and coordinate work between the nodes, having less nodes will mean less overhead and faster response times. This means that you'd be better off with a giant 32-core opteron with plenty of RAM and a good SSD, vs 50 low-spec VMs each running a single CPU.
If your tests are able to run in parallel on the same machine without tripping each other up, then definitely go for the big box. If the tests require exclusive use of system resources and can't share a single machine well, then go for a greater number of smaller machines. If you have some tests that don't share well but others that do, then try something in between. Once you have a VM image set up for this, you can freely spin up difference sized instances to experiment.
In answer to your last question, NCrunch IS able to share nodes between clients. The engine works using a rotating pull system where each node is responsible for pulling work round-robin from every connected client. This means the resources are split quite fairly, but this does increase the amount of coordination necessary, especially if you have a high latency connection between your grid and the connected clients. To overcome this, you can partition the grid so that each client has their own set of dedicated nodes. It is possible to balance this with sharing nodes between clients to try and give more efficient use of resources, for example, you may have 5 devs sharing 10 nodes, while another 5 devs share another 10.
I've tested the NCrunch engine with up to 100 task processors operating over a moderate (200k lines of code) sized solution on high-end hardware, and found that it was able to keep up after learning the dimensions of the tests. I expect that in a real-world scenario the limitations will be set much more by how well the tests can be paralleled and how much good hardware is available.