Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Is "out of process" host impacted on grid nodes?
GreenMoose
#1 Posted : Tuesday, January 20, 2015 7:14:13 AM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 6/17/2012(UTC)
Posts: 503

Thanks: 142 times
Was thanked: 66 time(s) in 64 post(s)
Just upgraded to 2.11.0 and while I see an incredible boost in vstudio performance/memory consumption, my grid nodes came to a crawling halt.

On azure nodes with 7.5GB RAM I previously used up to 20 task runners but after upgrade they all seem to not wanting process and memory usage was > 90%. I decreased them to 5 task runners and yet again they are fast.

Could this be an issue related to the "out of process" design change or am I experiencing something else?

(Since we have many projects you previously suggested of having a lot of task runners to avoid them having to startup and do the 10s NHibernate-initialization every time depending on which test-project etc. is being executed, but I will try out with 5 and see how it goes).

Thanks.
Remco
#2 Posted : Tuesday, January 20, 2015 7:33:08 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 6,976

Thanks: 931 times
Was thanked: 1257 time(s) in 1170 post(s)
The 2.11 upgrade didn't really target the grid code, but the overall scope of change has been very large and the grid nodes do run on much of the same code as the client, so though I think it's unlikely 2.11 could be responsible for this, I'm not going to rule it out just yet.

It's great to hear that the engine separation greatly improved your VS performance. I had a feeling this would work well in your environment :)

Something to watch out for is the 'spin up' time after a node is first initialised. When the node service starts, it needs to re-index and hash all the files in its snapshot storage. Depending upon the amount of solution data stored on this node, this can take quite a while (15-30 minutes is not unusual for an overloaded node). If you've recently upgraded the node, it might be worth trying another test an hour later to see if you get the same result.

I wonder if it may be worth loading 2.10 back onto the nodes for a side-by-side test with 2.11 to confirm if the performance problem was caused by the upgrade. In theory it should be possible to manually unzip 2.10 to a different location and fire it up using the grid node console runner - if this works, it will save you needing to uninstall 2.11 on the node. We'll need to narrow this problem down through the process of elimination until I can come up with a theory on why the nodes are crawling ...

A few more questions that may help to narrow this down (assuming you can still recreate the problem):
- When the nodes start to crawl, what do you observe for the grid node process itself on the crawling nodes? Does it seem to be using the CPU very heavily? Does the CPU utilisation indicate a single thread overworking?
- By saying memory usage was > 90%, did this seem higher than what you normally observed for the grid node process? Is the memory utilisation concentrated in the grid node process, or distributed throughout the task runner EXEs?
- When the nodes are crawling, try switching the filters on and off in the Tests Window. Does the NCrunch UI seem responsive? Or do you need to wait forever for any filter changes to take effect?
- What do you observe in the processing queue while the grid is crawling? Are the nodes attempting to process a large amount of work, or do they have an unusually minimal number of tasks in progress?
GreenMoose
#3 Posted : Tuesday, January 20, 2015 8:13:21 AM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 6/17/2012(UTC)
Posts: 503

Thanks: 142 times
Was thanked: 66 time(s) in 64 post(s)
Remco;6800 wrote:
It's great to hear that the engine separation greatly improved your VS performance. I had a feeling this would work well in your environment :)

Yeah it feels almost magical :). Before along with RS 9 (I miss the time when they were focused on bugfixes instead of keeping pushing in new features) vstudio 2013 came up to 2GB+ pretty rapidly, adding 1 file then took 35 seconds with NCrunch enabled and ~15s with NCrunch disabled.
Now it seems to be pretty steady < 1GB so far. Great work done here!

Remco;6800 wrote:
Something to watch out for is the 'spin up' time after a node is first initialised. When the node service starts, it needs to re-index and hash all the files in its snapshot storage. Depending upon the amount of solution data stored on this node, this can take quite a while (15-30 minutes is not unusual for an overloaded node). If you've recently upgraded the node, it might be worth trying another test an hour later to see if you get the same result.


Hrm that might be it, I just recalled that the other local grid nodes not using 20 task runners were "crawling" as well. Now I bumped up the Azure grid nodes back to 20 task runners and memory consumption seems normal (~80%) and the running tests actually "feels" faster than before upgrade :).

So I will keep an eyes out if this happen again and let next upgrade have their "spin up" time before reporting crawling issues.

Thanks.
1 user thanked GreenMoose for this useful post.
Remco on 1/20/2015(UTC)
Users browsing this topic
Guest
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

YAF | YAF © 2003-2011, Yet Another Forum.NET
This page was generated in 0.048 seconds.
Trial NCrunch
Take NCrunch for a spin
Do your fingers a favour and supercharge your testing workflow
Free Download