Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Exit Code 2 issue when connecting to grid nodes
Phonesis
#1 Posted : Friday, July 22, 2016 8:59:54 AM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 4/14/2016(UTC)
Posts: 32
Location: United Kingdom

Was thanked: 3 time(s) in 3 post(s)
Hi Remco,

We are getting Exit Code 2 issues all of a sudden on some of our NCrunch Grid runs in TeamCity. We have a single controller machines which calls 3 grid nodes. In the Build Log we are seeing:

[05:05:18][Step 5/5] [05:05:19.0355-Core-5] All projects have been loaded
[05:05:19][Step 5/5] [05:05:19.6056-Core-5] Queuing 37 tests for passive execution
[05:05:19][Step 5/5] [05:05:19.9796-?-22] Ceasing to send messages because of an error (was the connection closed?): System.NullReferenceException: Object reference not set to an instance of an object.
[05:05:19][Step 5/5]
[05:05:19][Step 5/5] Server stack trace:
[05:05:19][Step 5/5] at .Write(Byte[] , Int32 , Int32 )
[05:05:19][Step 5/5] at nCrunch.Core.Grid.Connectivity.Zlib.DeflateStream.Write(Byte[] buffer, Int32 offset, Int32 count)
[05:05:19][Step 5/5] at nCrunch.Core.Grid.Connectivity.BidirectionalStream.Write(Byte[] buffer, Int32 offset, Int32 count)
[05:05:19][Step 5/5] at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Object[]& outArgs)
[05:05:19][Step 5/5] at System.Runtime.Remoting.Messaging.StackBuilderSink.AsyncProcessMessage(IMessage msg, IMessageSink replySink)
[05:05:19][Step 5/5]
[05:05:19][Step 5/5] Exception rethrown at [0]:
[05:05:19][Step 5/5] at System.Runtime.Remoting.Proxies.RealProxy.EndInvokeHelper(Message reqMsg, Boolean bProxyCase)
[05:05:19][Step 5/5] at nCrunch.Core.Grid.Connectivity.BidirectionalStream. .EndInvoke(IAsyncResult )
[05:05:19][Step 5/5] at nCrunch.Core.Grid.Connectivity.BidirectionalStream.EndWrite(IAsyncResult asyncResult)
[05:05:19][Step 5/5] at nCrunch.Core.Grid.Connectivity.GridMessageSender. (IAsyncResult )
[05:05:19][Step 5/5] [05:05:20.0046-Core-5] Connection established with remote grid node at UK-DEVBA05
[05:05:19][Step 5/5] [05:05:20.0306-?-22] Ceasing to send messages because of an error (was the connection closed?): System.NullReferenceException: Object reference not set to an instance of an object.
[05:05:19][Step 5/5]
[05:05:19][Step 5/5] Server stack trace:
[05:05:19][Step 5/5] at .Write(Byte[] , Int32 , Int32 )
[05:05:19][Step 5/5] at nCrunch.Core.Grid.Connectivity.Zlib.DeflateStream.Write(Byte[] buffer, Int32 offset, Int32 count)
[05:05:19][Step 5/5] at nCrunch.Core.Grid.Connectivity.BidirectionalStream.Write(Byte[] buffer, Int32 offset, Int32 count)
[05:05:19][Step 5/5] at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Object[]& outArgs)
[05:05:19][Step 5/5] at System.Runtime.Remoting.Messaging.StackBuilderSink.AsyncProcessMessage(IMessage msg, IMessageSink replySink)
[05:05:19][Step 5/5]
[05:05:19][Step 5/5] Exception rethrown at [0]:
[05:05:19][Step 5/5] at System.Runtime.Remoting.Proxies.RealProxy.EndInvokeHelper(Message reqMsg, Boolean bProxyCase)
[05:05:19][Step 5/5] at nCrunch.Core.Grid.Connectivity.BidirectionalStream. .EndInvoke(IAsyncResult )
[05:05:19][Step 5/5] at nCrunch.Core.Grid.Connectivity.BidirectionalStream.EndWrite(IAsyncResult asyncResult)
[05:05:19][Step 5/5] at nCrunch.Core.Grid.Connectivity.GridMessageSender. (IAsyncResult )
[05:05:19][Step 5/5] [05:05:20.0826-Core-5] Connection established with remote grid node at UK-DEVBA04
[05:05:19][Step 5/5] [05:05:20.0946-Core-5] Connection established with remote grid node at UK-DEVBA06
[05:05:21][Step 5/5] [05:05:21.9839-?-1] Reporting engine execution results
[05:05:24][Step 5/5] [05:05:24.4353-?-1] Shutting down engine
[05:05:41][Step 5/5] [05:05:41.97-?-1] Returning result: TestFailure
[05:05:42][Step 5/5] Process exited with code 2
[05:05:43][Step 5/5] Step Rerun failed tests using NCrunch grid (Command Line) failed


The issue seems intermittent in the sense it doesn't always happen. Maybe 1 in 3 times or something right now. Any ideas?
Remco
#2 Posted : Friday, July 22, 2016 9:16:39 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 7,145

Thanks: 959 times
Was thanked: 1290 time(s) in 1196 post(s)
Hi, thanks for sharing this issue.

The grid connectivity exception you're seeing above is a red herring. NCrunch will report grid connectivity issues into the log but it won't consider them to be errors for the purposes of the console tool exit codes.

Connectivity issues with grid nodes can happen in many different areas of the protocol (depending on where the connection was cut). When these occur, the engine will simply recover by attempting to reconnect while continuing to run tests on other available nodes.

I suspect that this is being caused by an intermittent test failure. Do you have any tests showing in the report as failed?
Phonesis
#3 Posted : Friday, July 22, 2016 10:28:50 AM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 4/14/2016(UTC)
Posts: 32
Location: United Kingdom

Was thanked: 3 time(s) in 3 post(s)
Remco;9025 wrote:
Hi, thanks for sharing this issue.

The grid connectivity exception you're seeing above is a red herring. NCrunch will report grid connectivity issues into the log but it won't consider them to be errors for the purposes of the console tool exit codes.

Connectivity issues with grid nodes can happen in many different areas of the protocol (depending on where the connection was cut). When these occur, the engine will simply recover by attempting to reconnect while continuing to run tests on other available nodes.

I suspect that this is being caused by an intermittent test failure. Do you have any tests showing in the report as failed?



Failures are reported in the HTML output but they are from previous runs so hard to determine if a particular test is the cause. No reference to a particular test failing in the build log either. It seems to load the files up ok, connect to grid nodes, then bomb out. But will work fine if I try to re-run it again. It just seems random/intermittent.
Remco
#4 Posted : Friday, July 22, 2016 11:32:11 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 7,145

Thanks: 959 times
Was thanked: 1290 time(s) in 1196 post(s)
Phonesis;9026 wrote:

Failures are reported in the HTML output but they are from previous runs so hard to determine if a particular test is the cause. No reference to a particular test failing in the build log either. It seems to load the files up ok, connect to grid nodes, then bomb out. But will work fine if I try to re-run it again. It just seems random/intermittent.


The reporting of data from previous test runs in the HTML report shouldn't normally happen. The tool should run through all tests selected by the chosen engine mode, and any test that isn't run should have a status of 'Not Run'. Are you certain the failures you're seeing on the report are old ones? If this is indeed the case then we should investigate this.
Phonesis
#5 Posted : Friday, July 22, 2016 11:51:21 AM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 4/14/2016(UTC)
Posts: 32
Location: United Kingdom

Was thanked: 3 time(s) in 3 post(s)
Remco;9028 wrote:
Phonesis;9026 wrote:

Failures are reported in the HTML output but they are from previous runs so hard to determine if a particular test is the cause. No reference to a particular test failing in the build log either. It seems to load the files up ok, connect to grid nodes, then bomb out. But will work fine if I try to re-run it again. It just seems random/intermittent.


The reporting of data from previous test runs in the HTML report shouldn't normally happen. The tool should run through all tests selected by the chosen engine mode, and any test that isn't run should have a status of 'Not Run'. Are you certain the failures you're seeing on the report are old ones? If this is indeed the case then we should investigate this.



In the AllResults.Html report the Build Results tab shows that no project was run and every project in solution is yellow / not run.

In the Test Results tab, it shows results but they are from a previous run. We cache the output so it is able to know what tests previously failed etc.
Remco
#6 Posted : Friday, July 22, 2016 12:21:43 PM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 7,145

Thanks: 959 times
Was thanked: 1290 time(s) in 1196 post(s)
Phonesis;9031 wrote:

In the AllResults.Html report the Build Results tab shows that no project was run and every project in solution is yellow / not run.


This seems like it may be the source of all the problems. Is your end-to-end run entirely dependent on remote grid nodes? If so, a serious connection issue may very well make the engine decide that there is no way to run any more tests or builds, so it just stops early..
Phonesis
#7 Posted : Friday, July 22, 2016 12:58:15 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 4/14/2016(UTC)
Posts: 32
Location: United Kingdom

Was thanked: 3 time(s) in 3 post(s)
Remco;9032 wrote:
Phonesis;9031 wrote:

In the AllResults.Html report the Build Results tab shows that no project was run and every project in solution is yellow / not run.


This seems like it may be the source of all the problems. Is your end-to-end run entirely dependent on remote grid nodes? If so, a serious connection issue may very well make the engine decide that there is no way to run any more tests or builds, so it just stops early..



It is yeah. Doesn't run locally, just on grid machines. Interesting.
Remco
#8 Posted : Friday, July 22, 2016 11:43:32 PM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 7,145

Thanks: 959 times
Was thanked: 1290 time(s) in 1196 post(s)
I've been having a bit of a think about this issue ... I think it highlights a hole in the processing behaviour in the console tool.

If the console tool is fully dependent on grid nodes for its processing, and there is a connection problem that results in these nodes being disconnected, the tool will assume that processing cannot continue and it will terminate the run. The intention here was to prevent the tool from hanging indefinitely if it had tasks queued that couldn't be processed on any available node (which it actually did do in early iterations of the design).

Of course, in a situation where grid connections could be soon re-established, it would make sense for the tool to wait for a fixed period of time to see if it could re-establish connection before terminating its run. This would require a code change to the tool. I've noted it down for review.

You might find that you can reduce the chances of this happening in your build by adding additional grid nodes, or turning on local processing (even 1 processing thread would be enough). This would keep the testing session alive until it reconnects to your main node.
Phonesis
#9 Posted : Saturday, July 23, 2016 12:20:16 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 4/14/2016(UTC)
Posts: 32
Location: United Kingdom

Was thanked: 3 time(s) in 3 post(s)
Remco;9041 wrote:
I've been having a bit of a think about this issue ... I think it highlights a hole in the processing behaviour in the console tool.

If the console tool is fully dependent on grid nodes for its processing, and there is a connection problem that results in these nodes being disconnected, the tool will assume that processing cannot continue and it will terminate the run. The intention here was to prevent the tool from hanging indefinitely if it had tasks queued that couldn't be processed on any available node (which it actually did do in early iterations of the design).

Of course, in a situation where grid connections could be soon re-established, it would make sense for the tool to wait for a fixed period of time to see if it could re-establish connection before terminating its run. This would require a code change to the tool. I've noted it down for review.

You might find that you can reduce the chances of this happening in your build by adding additional grid nodes, or turning on local processing (even 1 processing thread would be enough). This would keep the testing session alive until it reconnects to your main node.


Ah ok that would make sense as we only recently turned the local processing thread count down to 0 in the config file. So it seems to tally.

I'll update it to 1 thread as per your suggestion. Shouldn't be a problem for us.

Many thanks.
1 user thanked Phonesis for this useful post.
Remco on 7/23/2016(UTC)
Phonesis
#10 Posted : Monday, July 25, 2016 8:59:20 AM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 4/14/2016(UTC)
Posts: 32
Location: United Kingdom

Was thanked: 3 time(s) in 3 post(s)
I've set the following config file values on the controller machine:

<MaxNumberOfProcessingThreads>1</MaxNumberOfProcessingThreads>
<MaxTestRunnerProcessesToPool>1</MaxTestRunnerProcessesToPool>


Is the MaxTestRunnerProcessesToPool one required for this too?


One other query. We use a shared drive to store screenshots of failing tests (used by Selenium WebDriver). It seems that the controller machine (which is also a TeamCity build agent) is able to do this and use the shared drive to store screenshots when a test is being run by it. However, the grid machines do not appear to be doing this even though they have access to the shared drive. Am I right in saying that NCrunch should in theory be logged into the grid machines as the same user as the TeamCity build agent process? Or is there more to it? Perhaps some NCrunch specific user or something?
Remco
#11 Posted : Monday, July 25, 2016 10:52:21 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 7,145

Thanks: 959 times
Was thanked: 1290 time(s) in 1196 post(s)
There isn't really a need to adjust the MaxTestRunnerProcessesToPool - the default here is probably OK. If you set this to more than 1, the engine may use slightly more memory when executing and the performance may be a bit better if you have multiple test projects in your solution. Setting it to zero will serious degrade performance.

The grid nodes will be logged into the user account specified in the windows services settings - which you can freely change. By default, I think it normally is set to the SYSTEM account, which probably doesn't have access to network shares. Adjusting this logged in user may help with accessing the shared drive.
Phonesis
#12 Posted : Monday, July 25, 2016 3:11:34 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 4/14/2016(UTC)
Posts: 32
Location: United Kingdom

Was thanked: 3 time(s) in 3 post(s)
Remco;9046 wrote:
There isn't really a need to adjust the MaxTestRunnerProcessesToPool - the default here is probably OK. If you set this to more than 1, the engine may use slightly more memory when executing and the performance may be a bit better if you have multiple test projects in your solution. Setting it to zero will serious degrade performance.

The grid nodes will be logged into the user account specified in the windows services settings - which you can freely change. By default, I think it normally is set to the SYSTEM account, which probably doesn't have access to network shares. Adjusting this logged in user may help with accessing the shared drive.



Nice one.
Users browsing this topic
Guest
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

YAF | YAF © 2003-2011, Yet Another Forum.NET
This page was generated in 0.087 seconds.
Trial NCrunch
Take NCrunch for a spin
Do your fingers a favour and supercharge your testing workflow
Free Download