Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Optimize reuse of TestHost*.exe processes
GreenMoose
#1 Posted : Friday, June 15, 2018 2:37:20 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 6/17/2012(UTC)
Posts: 390

Thanks: 85 times
Was thanked: 40 time(s) in 39 post(s)
[v3.17.0.2]
I have recently come back to work in a large solution (with 14k tests) and I notice whenever I run non-trivial amount of tests the throughput with NCrunch is dreadfully slow (compared to how I remembered it).

I noticed that TestHost processes are constantly being created/terminated. If I recall correctly, NCrunch tend to reuse processes more often back in the days?

See screencast video example http://recordit.co/xA0AKihvAB
("Rows" at top right = Sql statements towards db, the tests themselves should be fairly CPU relaxed since SQL Server should do most of the job so the CPU spikes should be when the NHibernate factory is initialized).
Video Screenshot

It seems like pretty much every "Run X Tests" batch will spawn a new TestHost process (which in turn takes about 8s to initialize due to NHibernate config spin-up).
I have tried to set settings "as high as I can" (8 core machine) but it does not seem to matter:
NCrunch Perf Settings


Core load is pretty much idle at 0%, I've disabled all grid nodes and solution have 15 test projects (in order to avoid having 1 giant test project taking forever to analyze/compile).

Any ideas?

Thanks.
Remco
#2 Posted : Saturday, June 16, 2018 12:54:59 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 5,277

Thanks: 695 times
Was thanked: 856 time(s) in 814 post(s)
Thanks for sharing this problem. Is there any chance you could submit a bug report? The volume of log data here might reduce the effectiveness of the bug report system, but there's still a chance I might be able to find useful information in there.

Are you making use of any NCrunch.Framework attributes?
GreenMoose
#3 Posted : Monday, June 18, 2018 7:07:07 AM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 6/17/2012(UTC)
Posts: 390

Thanks: 85 times
Was thanked: 40 time(s) in 39 post(s)
Remco;12345 wrote:
Thanks for sharing this problem. Is there any chance you could submit a bug report? The volume of log data here might reduce the effectiveness of the bug report system, but there's still a chance I might be able to find useful information in there.

Bug report submitted.

Remco;12345 wrote:
Are you making use of any NCrunch.Framework attributes?

Yes: Explicit, ExclusivelyUses, RequiresCapability and Serial it seems.
Remco
#4 Posted : Tuesday, June 19, 2018 1:32:52 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 5,277

Thanks: 695 times
Was thanked: 856 time(s) in 814 post(s)
Thanks for sharing the report.

In the approximately 2 minute capture included in the report, the engine successfully re-used test processes 11 times, and in 4 cases needed to start new ones. I couldn't find anything unusual about the process signatures that would force the engine to unnecessarily restart them.

How have you configured NHibernate to start up? Is this being done once per process using static state?

If you have the resources, it might be worth further increasing the number of processes you have in your process pool. It's quite possible that the distribution of tests between your multiple test projects is creating a scenario where the engine is pushing back and forward between different types of process to support its concurrency demands, and that the process pool isn't able to store enough of them to work efficiently.
GreenMoose
#5 Posted : Tuesday, June 19, 2018 6:48:36 AM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 6/17/2012(UTC)
Posts: 390

Thanks: 85 times
Was thanked: 40 time(s) in 39 post(s)
Remco;12351 wrote:
How have you configured NHibernate to start up? Is this being done once per process using static state?

Yeah, once per process using static state.

When I rerun it this morning to verify (although I rebooted computer yesterday) I cannot reproduce it either (it reuses the processes and does not look anything like the screencast video in my initial post).
I tried wiping NCrunch cache as well but it didn't change the outcome. Maybe overall memory usage in the OS could be the culprit.

Anyhow I'll leave this for now since it seems to be working, thanks.
GreenMoose
#6 Posted : Tuesday, June 19, 2018 1:42:52 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 6/17/2012(UTC)
Posts: 390

Thanks: 85 times
Was thanked: 40 time(s) in 39 post(s)
A follow-up when I did a test with grid nodes connected, just for FYI :)

I created a simple counter in a shared file for tests explicitly using the Database (which requires NHibernate spin-up). This file was not touch by non-db tests, and the number in it was bumped on each new process.
I had 1 local runner and 2 grid node runners, i.e. total 3 test hosts.

The overall duration for for about 6 000 Database tests was about 15 minutes, and the counter ended up at 72, meaning at least a total of 72*8 seconds = ~10 minutes (divided by 3 nodes then if we assume all had same amount of Db tests distributed) was spent in static process initialization.
Remco
#7 Posted : Tuesday, June 19, 2018 11:34:50 PM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 5,277

Thanks: 695 times
Was thanked: 856 time(s) in 814 post(s)
When using grid nodes, the 'Max test runners to pool' setting is controlled by the configuration on the grid node itself (the client setting won't affect the remote machines). So unless you have this setting up quite high on your grid nodes, the pool probably isn't large enough to accommodate 15 test projects.

The logic of NCrunch regarding the re-use of processes is as follows:

When NCrunch needs a test process, it constructs a signature containing critical information about the test process, including the test project being tested, the version of the source code, the project/assembly references included, environment variables, target platform, etc.

NCrunch first checks the process pool to see if there is any process with a matching signature. If there is, this process is retrieved from the pool and is used for the test run.

If not, NCrunch will start a new process with the specified signature.

When the test run completes, the process is stored in the process pool. If the process pool has exceeded the maximum size, the oldest process in the pool will be terminated and removed from the pool.

The maximum size of the process pool is determined by the max number of processing threads, plus the 'Max test runners to pool' setting. NCrunch has two process pools, one for build tasks and one for test tasks. This means you'll likely see quite a few NCrunch processes running at any one time, especially with lots of execution threads.

In your case, you have 15 test projects all with their own tests being executed, likely interspersed throughout the test pipeline. This means that the 'Max test runners to pool' setting must be set to 15*6-6 = 75 to avoid any chance of a process being discarded throughout the run. In practice, it should be possible to get a clear run with a value much much lower than this, because it's extremely unlikely that the test pipeline would be arranged in a way that swaps all 6 execution threads between every test project in sequence. It's something that you'll need to feel out, as the optimal value depends entirely on the execution sequence of your tests, which of course is determined by the running time of your tests and your patterns of change throughout the solution.
GreenMoose
#8 Posted : Thursday, June 21, 2018 7:01:49 AM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 6/17/2012(UTC)
Posts: 390

Thanks: 85 times
Was thanked: 40 time(s) in 39 post(s)
Thanks. Yeah it's a pity that the tradeoff of having manageable test projects (i.e. reasonable compile/process times) has the side effect of causing large batch test runs go slower due to static one-time initializations.

I guess the basic restriction here one cannot load multiple test projects in 1 process, for technical reasons? (e.g. relying on NUnit test runner etc.)
(Otherwise I guess a "burst mode setting" used when e.g. "run all tests" or using ncrunch.exe could have 1 process with all test projects?)

Another way maybe could be to have a way, via attribute, "burst mode" or such, that schedules tests prioritized "per test project" ? E.g. if I have that attribute on all my db tests (which all uses same "database" resource explicitly thus cannot be run in parallel, but they all share same static spin-up time due to db access), and I have these test in 6 test assemblies, testProcess1 is dedicated to these tests from project1 until they are done, the continuing with queue containing project2 tests etc. ?
Remco
#9 Posted : Thursday, June 21, 2018 7:12:25 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 5,277

Thanks: 695 times
Was thanked: 856 time(s) in 814 post(s)
It's only really possible to have one test project per process. This is because each test project can have an entirely different dependency structure or even target platform, so it wouldn't be reliable for us to try and re-use the same process to load multiple test projects. Test runners also don't do this outside of NCrunch, so we'd be opening the door to some really random compatibility issues.

The root issue here, as you've identified, is with the prioritisation of tests. In an ideal world, NCrunch would be smart enough to know that recycling test processes in your situation is very expensive, and it would feed metrics from this into the test pipeline to balance the recycling of processes with the need to return relevant results quickly. In practice, this is extremely challenging to do because NCrunch has no reliable way of knowing that your test processes are expensive, since the spin-up time is inside the execution of the first test in the batch. NCrunch's test pipeline builder is also one of the most complex areas of the product and it's also the hardest to squeeze performance out of. It would be counterproductive for the pipeline builder to be brilliantly intelligent with its creation of the pipeline, but be so slow that it would be faster to run the tests end-to-end than actually arrange them in optimal sequence.

Something I could suggest is to try and organise your tests between projects so that tests with a similar running time will be inside the same project. In this way, NCrunch will tend to cluster the execution tasks for these tests together in the processing queue and you'll see less recycling of processes. I guess the other option might be to purchase more RAM or boost the size of your page file so that you can afford to keep a very large process pool.
GreenMoose
#10 Posted : Thursday, June 21, 2018 7:50:20 AM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 6/17/2012(UTC)
Posts: 390

Thanks: 85 times
Was thanked: 40 time(s) in 39 post(s)
Remco;12369 wrote:
Something I could suggest is to try and organise your tests between projects so that tests with a similar running time will be inside the same project.

We are pretty heavy with integration testing (~6 000 db tests) so this is almost what we started out with, but it makes TDDing with NCrunch pretty much impossible due to compilation/process times of that large test project (took over 1 minute when I started to split it up into multiple test assemblies).

Remco;12369 wrote:
It would be counterproductive for the pipeline builder to be brilliantly intelligent with its creation of the pipeline, but be so slow that it would be faster to run the tests end-to-end than actually arrange them in optimal sequence.

But wouldn't it make sense to have 2 different "pipiline builder" modes? One that focuses on feedback time when TDDing in vstudio and another one focusing on burst mode when "run all tests" via e.g. ncrunch.exe ?
Given you have a fairly complicated test bed with static test assy initialization costs off course.

I think such "simple pipieline mode" with ncrunch.exe would be awesome on a CI system with connected grid nodes since I basically can add many low cost 1-task grid nodes (e.g. docker containers) and it will only speed up performance due to process reuse, not draining it :)

(For instance a test run of only my 6k db tests with NCrunch on CI system now takes 45 minutes, a complete test run (12 500 tests) with TC built in test runner takes 47 minutes (without code coverage though, but on TeamCity this isn't that important since we don't have any coverage integration with TC).

It would be awesome if I could add 1 test node, and practically have 22 minutes instead, or 4 and get 11 minutes etc. But as long as test node is over busy with process initializations I guess this remains a distant dream :)
(I have not yet tested grid node usage in practice due to the node-config issue though, maybe it already results in impressive throughput differences)
Remco
#11 Posted : Thursday, June 21, 2018 10:57:48 PM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 5,277

Thanks: 695 times
Was thanked: 856 time(s) in 814 post(s)
What you're describing is essentially this setting - Pipeline Optimisation Priority.

Though the setting value of 'Throughput' is currently inadequate for you because it doesn't factor in the cost of reinitialising test processes.

Every solution is different. In a scenario where a solution contains many long running tests without expensive process initialisation, the current system works very well because all available capacity is spent on useful execution rather than waste. if we were to revise the approach to assume that all process initialisation were expensive, then such a scenario wouldn't function as well because longer tests could be run later in the pipeline, and the overall run time could be much longer. So what we really want is an adaptive system.

Even though we've made big improvements to the console tool over the last few months, it is still technically an adapted headless continuous engine. When working continuously under VS, the pipelining of tests is much simpler because you can just prioritise the ones that are important. This is because it's usually safe to assume that the user doesn't want fast end-to-end times; they would prefer to get the most relevant results earlier instead. In the console tool scenario, the problem becomes much harder. Suddenly we need to find ways to compress the pipeline to maximise concurrency and reduce waste as much as possible while early reporting of relevance is much less important.

So realistically, we're staring at a mountain of work here. And probably quite a bit of redesign. This will require much thought and long term planning. I don't think a magic configuration setting is the answer to this problem.
Users browsing this topic
Guest
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

YAF | YAF © 2003-2011, Yet Another Forum.NET
This page was generated in 0.090 seconds.