Wow, you've been missing out :)
Generally you are better off going for bigger, chunkier nodes rather than many smaller nodes, unless you are doing something niche (like lots of Exclusively used resouces or Serial testing). This is because of the build cost that each machine needs to pay.
We've experienced better performance using non virtualised servers rather than cloud hosted ones, though there are some clear downsides to this in terms of scalability and ease of setup etc.
In terms of the actual specification, this depends entirely on your budget and the size of the test suites you're running. Note that your codebase will expand over time and you'll need to account for this. Choosing more cores will reduce your end-to-end testing times for the entire suite, while choosing higher clock speeds will give you faster inline results while working in your code. If you can, try to get a server with a large amount of RAM .. you can then use this to attach a RAM drive to hold the NCrunch workspaces (this gives a significant boost in performance).
Note that if you choose a node with a lower specification than your client machine, it may actually give worse performance on the inline test results when you have it connected (though the overall throughput will definitely be higher).
An interesting thing to try would be to set yourself up with some temporary cloud based machines of varying capability to see how the capacity affects your experience. In this way, you can get an idea of what will work best for your situation before committing to it.