Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Test run stops midway with "No data received... Forcing disconnection"
applieddev
#1 Posted : Wednesday, October 14, 2020 8:34:50 AM(UTC)
Rank: Newbie

Groups: Registered
Joined: 11/13/2019(UTC)
Posts: 8
Location: United Kingdom

We are seeing some test builds stop abruptly on TeamCity with "No data received from MACHINENAME within connection timeout period. Forcing disconnection."

We queued 29 tests for parallel execution on 1 NCrunch server node
"Queuing 29 tests for passive execution"
Several tests run (passes)
then
"Sending processing instructions to node MACHINENAME for 1 tasks"
then 1 minute later we get
"No data received from MACHINENAME within connection timeout period. Forcing disconnection."

no other tests builds are trying to run on that NCrunch node.

the test results do not show the tests that were requested to run but failed to run
instead it adds them to the number of Ignored tests.

when we re-run the same batch of tests on the same NCrunch node later, then they usually work as expected.

This issue does not happen frequently, but it does cause issue as only a few tests are run and no fails are shown.
We would want the tests that were requested to run, but failed to run, to be counted as fails so we know to rerun those ones.

How best can we debug the cause of the issue?
Where can we adjust the connection timeout period?
Is it possible to mark the tests as failed (and not as ignored) when they have been queued for execution but failed to execute?

thanks
Remco
#2 Posted : Wednesday, October 14, 2020 1:41:42 PM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 6,155

Thanks: 795 times
Was thanked: 1053 time(s) in 1002 post(s)
Hi, thanks for sharing this issue.

NCrunch has a hard coded timeout period of 60 seconds. If there is no data sent down the line for 30 seconds, a ping will be sent to ensure the connection stays open. For a node to disconnect in this manner feels like a network issue of some kind.

What kind of connection do you have to the grid node server? Is this on a local network? Or perhaps a VPN? Do you have a way of testing your network?

The way we handle grid node disconnections is by waiting 60 seconds for a disconnected node to reconnect before declaring it dead and ending the run with unexecuted tests marked as unexecuted. Establishing a completion status is tremendously challenging when dealing with an unreliable network and testing conditions and filters that may be different depending on which nodes are connected. For your run to be completing early like this, the node would need to be unreachable for at least 2 minutes.

When run without the /TeamCityDisableTestNotRunFailureReporting parameter the console tool will return an exitcode of 5 when it detects tests that were not executed as part of the run. This gives a way to identify your failure case without NCrunch needing to artificially force the tests to fail (which we can't do). You may need to combine this with the /IgnoredFilteredTestsInRunFailureReporting parameter so that the tool doesn't consider tests as ignored if they were excluded using a filter.
applieddev
#3 Posted : Thursday, October 15, 2020 10:02:24 AM(UTC)
Rank: Newbie

Groups: Registered
Joined: 11/13/2019(UTC)
Posts: 8
Location: United Kingdom

thanks Remco,
our network engineers assure me that there is no network issues

i have removed "/TeamCityDisableTestNotRunFailureReporting" and added "/IgnoredFilteredTestsInRunFailureReporting"
just need to wait for the issue to resurface to see

we have found that after restarting NCrunch server node, the issue doesnt reoccur for awhile
is it possible for a long running NCrunch server node to run out of memory after many test runs?
Remco
#4 Posted : Thursday, October 15, 2020 10:30:50 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 6,155

Thanks: 795 times
Was thanked: 1053 time(s) in 1002 post(s)
applieddev;14998 wrote:

we have found that after restarting NCrunch server node, the issue doesnt reoccur for awhile
is it possible for a long running NCrunch server node to run out of memory after many test runs?


The core node server is designed with the intention to run long term without a restart, but the tech stack that sits under it is massive and very complex. The nodes do also need to run unstable code with high frequency, so a regular restart is definitely highly recommended. It might be worth setting up a small script on the machine to restart the service every night, or maybe the machine too for best results.
applieddev
#5 Posted : Friday, October 16, 2020 2:55:28 PM(UTC)
Rank: Newbie

Groups: Registered
Joined: 11/13/2019(UTC)
Posts: 8
Location: United Kingdom

we finally managed to get the issue again

Test results: Tests passed: 0, ignored: 30;
where the 30 "ignored" were requested, but failed to run due to "No data received from MACHINENAME within connection timeout period. Forcing disconnection."

is that expected outcome?

I guess we would need to treat these "ignored" tests same as we would fails and require them to re-run
Remco
#6 Posted : Friday, October 16, 2020 11:24:54 PM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 6,155

Thanks: 795 times
Was thanked: 1053 time(s) in 1002 post(s)
Do you have the logs and exported reports from this this run? I'd really like to take a closer look at how these tests are being reported. My understanding is that they should be reported with a 'Not run' status rather than ignored. If you could zip up the reports and logs and send them through the NCrunch contact form, I'll take a closer look at what's going on here.
Users browsing this topic
Guest
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

YAF | YAF © 2003-2011, Yet Another Forum.NET
This page was generated in 0.213 seconds.