Test run stops midway with "No data received... Forcing disconnection" - Build/Test Issues

Welcome Guest! To enable all features please Login or Register.

Notification

Error

NCrunch Forum » General Support » Build/Test Issues » Test run stops midway with "No data received... Forcing disconnection"

Test run stops midway with "No data received... Forcing disconnection"

Options

Previous Topic Next Topic

applieddev		#1 Posted : Wednesday, October 14, 2020 8:34:50 AM(UTC)
Rank: Member Groups: Registered Joined: 11/13/2019(UTC) Posts: 19 Location: United Kingdom Was thanked: 1 time(s) in 1 post(s)		We are seeing some test builds stop abruptly on TeamCity with "No data received from MACHINENAME within connection timeout period. Forcing disconnection." We queued 29 tests for parallel execution on 1 NCrunch server node "Queuing 29 tests for passive execution" Several tests run (passes) then "Sending processing instructions to node MACHINENAME for 1 tasks" then 1 minute later we get "No data received from MACHINENAME within connection timeout period. Forcing disconnection." no other tests builds are trying to run on that NCrunch node. the test results do not show the tests that were requested to run but failed to run instead it adds them to the number of Ignored tests. when we re-run the same batch of tests on the same NCrunch node later, then they usually work as expected. This issue does not happen frequently, but it does cause issue as only a few tests are run and no fails are shown. We would want the tests that were requested to run, but failed to run, to be counted as fails so we know to rerun those ones. How best can we debug the cause of the issue? Where can we adjust the connection timeout period? Is it possible to mark the tests as failed (and not as ignored) when they have been queued for execution but failed to execute? thanks
Back to top

User Profile View All Posts by User View Thanks

Remco		#2 Posted : Wednesday, October 14, 2020 1:41:42 PM(UTC)
Rank: NCrunch Developer Groups: Administrators Joined: 4/16/2011(UTC) Posts: 7,307 Thanks: 988 times Was thanked: 1327 time(s) in 1230 post(s)		Hi, thanks for sharing this issue. NCrunch has a hard coded timeout period of 60 seconds. If there is no data sent down the line for 30 seconds, a ping will be sent to ensure the connection stays open. For a node to disconnect in this manner feels like a network issue of some kind. What kind of connection do you have to the grid node server? Is this on a local network? Or perhaps a VPN? Do you have a way of testing your network? The way we handle grid node disconnections is by waiting 60 seconds for a disconnected node to reconnect before declaring it dead and ending the run with unexecuted tests marked as unexecuted. Establishing a completion status is tremendously challenging when dealing with an unreliable network and testing conditions and filters that may be different depending on which nodes are connected. For your run to be completing early like this, the node would need to be unreachable for at least 2 minutes. When run without the /TeamCityDisableTestNotRunFailureReporting parameter the console tool will return an exitcode of 5 when it detects tests that were not executed as part of the run. This gives a way to identify your failure case without NCrunch needing to artificially force the tests to fail (which we can't do). You may need to combine this with the /IgnoredFilteredTestsInRunFailureReporting parameter so that the tool doesn't consider tests as ignored if they were excluded using a filter.
Back to top

User Profile View All Posts by User View Thanks

applieddev		#3 Posted : Thursday, October 15, 2020 10:02:24 AM(UTC)
Rank: Member Groups: Registered Joined: 11/13/2019(UTC) Posts: 19 Location: United Kingdom Was thanked: 1 time(s) in 1 post(s)		thanks Remco, our network engineers assure me that there is no network issues i have removed "/TeamCityDisableTestNotRunFailureReporting" and added "/IgnoredFilteredTestsInRunFailureReporting" just need to wait for the issue to resurface to see we have found that after restarting NCrunch server node, the issue doesnt reoccur for awhile is it possible for a long running NCrunch server node to run out of memory after many test runs?
Back to top

User Profile View All Posts by User View Thanks

Remco		#4 Posted : Thursday, October 15, 2020 10:30:50 AM(UTC)
Rank: NCrunch Developer Groups: Administrators Joined: 4/16/2011(UTC) Posts: 7,307 Thanks: 988 times Was thanked: 1327 time(s) in 1230 post(s)		applieddev;14998 wrote: we have found that after restarting NCrunch server node, the issue doesnt reoccur for awhile is it possible for a long running NCrunch server node to run out of memory after many test runs? The core node server is designed with the intention to run long term without a restart, but the tech stack that sits under it is massive and very complex. The nodes do also need to run unstable code with high frequency, so a regular restart is definitely highly recommended. It might be worth setting up a small script on the machine to restart the service every night, or maybe the machine too for best results.
Back to top

User Profile View All Posts by User View Thanks

applieddev		#5 Posted : Friday, October 16, 2020 2:55:28 PM(UTC)
Rank: Member Groups: Registered Joined: 11/13/2019(UTC) Posts: 19 Location: United Kingdom Was thanked: 1 time(s) in 1 post(s)		we finally managed to get the issue again Test results: Tests passed: 0, ignored: 30; where the 30 "ignored" were requested, but failed to run due to "No data received from MACHINENAME within connection timeout period. Forcing disconnection." is that expected outcome? I guess we would need to treat these "ignored" tests same as we would fails and require them to re-run
Back to top

User Profile View All Posts by User View Thanks

Remco		#6 Posted : Friday, October 16, 2020 11:24:54 PM(UTC)
Rank: NCrunch Developer Groups: Administrators Joined: 4/16/2011(UTC) Posts: 7,307 Thanks: 988 times Was thanked: 1327 time(s) in 1230 post(s)		Do you have the logs and exported reports from this this run? I'd really like to take a closer look at how these tests are being reported. My understanding is that they should be reported with a 'Not run' status rather than ignored. If you could zip up the reports and logs and send them through the NCrunch contact form, I'll take a closer look at what's going on here.
Back to top

User Profile View All Posts by User View Thanks

Users browsing this topic
Guest

NCrunch Forum » General Support » Build/Test Issues » Test run stops midway with "No data received... Forcing disconnection"

Forum Jump

You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.