I'm using NUnit 3.13.2.
I can confirm that the tests that seem to cause the biggest problems don't have very expensive setup (relative to other tests that run fine). Actually, I just did a test run (they ran automatically as they were impacted by code changes). Three different fixtures ran, as three different tasks, containing 285, 291, and 286 tests. The Expected Processing Times on these tasks was 00:08:09.666, 00:08:23.093, and 00:14:57.182, respectively. The length of those expected times, alone, I would expect the engine to cause to be broken up into many parallel tasks. The interesting thing about these tasks is that the "Actual Processing Time" seemed to be WAY off, 00:40:24.300, 00:41:36.638, and 00:41:34.934. I wasn't keeping a sharp eye on the clock, but I think they must have finished in something much closer to the expected times. Also, the Expected vs Actual for the individual tests within those tasks seemed to match for most if not all.
I guess I should mention that we've built up a somewhat complex test infrastructure over the years, with TestBase<T>, UnitTest<T>, and Integration<Test> abstract base classes, which define abstract/virtual methods for overriding, and which get NUnit attribute decorations, for the purpose of centralizing structure and logic, and easily sharing behavior. Most every test fixture we have inherits from one of these base classes. This is just to point out that so many of the tests are doing a lot of the same setup logic, yet this problem only seems to affect a few fixtures.
The particular fixtures that are exhibiting this problem are themselves derived classes of a particular abstract base class that defines most of the actual test methods, which is why the test counts are so similar. This structure mirrors the structure of the production code classes, which have very similar structures, function, and constraints. So every tests defined in the base class is performed 3 times, once for each derived class. I wonder if this structure is somehow affecting how NCrunch is deciding to batch the tests within.
Another potential complication is that many of these tests are parameterized with a ValueSourceAttribute. These tests are checking, e.g., whether the expected behavior is exhibited for the current user when he has a specific role in the system, and so some of these tests could be run an upward of 30 times apiece. (This is much of the cause for those large total test counts.) I do expect that NCrunch may have to run a single parameterized test method up to X times in a single task. However, I definitely don't expect them all to run in a single task.
So there are some extra details to chew on. BTW, these are the same test fixtures that I mentioned in my last post about having failed quickly in a prior run. This time, all of these tests had succeeded in their prior run, so that's not the problem here (and may not be a problem at all).