Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Regular disconnects from the grid nodes clients
samholder
#1 Posted : Thursday, April 3, 2025 12:55:56 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 5/11/2012(UTC)
Posts: 96

Thanks: 28 times
Was thanked: 12 time(s) in 12 post(s)
Hi,

We are seeing regular forced disconnects in our grid node clients in the logs. We would like to understand what the disconnects mean so we can start to try and do something about it.

The message we get logged is:

[PID:7888 08:57:37.0204 ?-20] No data received from MACHINE_1 within connection timeout period. Forcing disconnection.
[PID:7888 08:57:37.0354 GridMessageReceiver-30] Connection to MACHINE_1 closed by remote

We would liek to understand if this happens due to long running tests (ie if the test is taking its time to do something then would the server think its died, or does the server continue to talk to the client even if the tests are waiting on something to happen) or if there is some other cause for this?

We are looking for options to understand why this might be happening as it seems to be making our tests take a lot longer to run.

Cheers

Sam


Remco
#2 Posted : Thursday, April 3, 2025 1:34:11 PM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 7,255

Thanks: 979 times
Was thanked: 1316 time(s) in 1221 post(s)
Hi, thanks for sharing this issue.

The connection to grid nodes is essentially a barebones TCP connection, with some encryption over the top if a password is set on the node.

Both the client and the server each use dedicated threads to write and read to/from the socket. The system is designed such that if no data is sent for 30 seconds, a keepalive packet is automatically sent by the dedicated thread. This cannot be interfered with by other activity on the system, unless it's REALLY heavily over capacity to the point where nothing on the system is effectively running (NCrunch can do this if you run too much at once, extreme thread starvation is bad for stability).

On the receiver side, if no data is received from the socket for 60 seconds, an automatic disconnection is triggered as it's assumed that the socket is dead.

Network issues like this are often challenging to troubleshoot, since there's so many ways connectivity can go wrong and usually you only get one result (no data received). The grid protocol used by NCrunch has been in place for over a decade now and it should be reliable. It may be worth doing some deductive testing over your network to see if anything is interfering with the connection.
Users browsing this topic
Guest
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

YAF | YAF © 2003-2011, Yet Another Forum.NET
This page was generated in 0.026 seconds.
Trial NCrunch
Take NCrunch for a spin
Do your fingers a favour and supercharge your testing workflow
Free Download