Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Why is not NCrunch aborting build/analysis on pending grid nodes when no work remains?
GreenMoose
#1 Posted : Friday, December 21, 2018 2:17:12 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 6/17/2012(UTC)
Posts: 461

Thanks: 126 times
Was thanked: 58 time(s) in 56 post(s)
[v3.23.0.9]

I have some cases where local test runner has completed, but it still waits for slow grid nodes to finish building/analyzing projects it seems.
Is there any reason for this even though "(local)" is done executing ? I guess it must know somehow that there is no work remaining and should, at least in theory, be able to tell grid node stop processing building/analyzing?

localOnLeft

Thanks.
Remco
#2 Posted : Friday, December 21, 2018 10:56:10 PM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 5,818

Thanks: 752 times
Was thanked: 972 time(s) in 926 post(s)
Technically, no. There is no good reason for this.

There are also other things that the grid is doing that can be further optimised. For example, right now we run an analysis task for a built project on every machine in the grid. Really, we only need to do this once, as we only take the result from the first analysis task and discard the others. These are things that I hope we'll have a chance to optimise in future.

The reason this optimisation hasn't happened yet is because there's been other lower hanging fruit that is just so much sweeter. For example, right now on many projects we're taking about 3 times as long to perform post-build processing as the projects actually take to compile. Improving this could result in cutting NCrunch build times in half or even better. So this is where we've been more focused lately :)

Anyway, truncating the console tool execution is something we have on our list. I can't promise when we'll get to it, but it's a known issue.
1 user thanked Remco for this useful post.
GreenMoose on 1/7/2019(UTC)
GreenMoose
#3 Posted : Tuesday, January 15, 2019 4:12:56 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 6/17/2012(UTC)
Posts: 461

Thanks: 126 times
Was thanked: 58 time(s) in 56 post(s)
FWIW, just had an issue where a grid node had some perf. issues and held up the TC build agent for 23 minutes when the 2 tests impacted were completed in 3, which sortof defeats the purpose of having "impacted tests only" feature on the CI server, so I created a feature request for aborting grid node processing and hope for an upcoming vote flood for it :)

https://ncrunch.uservoic...-in-ncrunch-console-if-n
1 user thanked GreenMoose for this useful post.
Remco on 1/15/2019(UTC)
GreenMoose
#4 Posted : Thursday, March 21, 2019 7:18:09 AM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 6/17/2012(UTC)
Posts: 461

Thanks: 126 times
Was thanked: 58 time(s) in 56 post(s)
Just a follow-up related to this. Yesterday a build took 1 hour and TC said it was hanging due to build agent not receiving any build progress for a long time.
32 minutes of it this time was spent waiting for a grid node to complete initialization (see below for example).
The grid node having issues also spent 13 minutes in "Grid node synchronization".

So maybe there are some places missing "TC progress report" by NCrunch Console, causing TC to think build is hanging on when slow grid nodes are in use.

Anyhow, my workaround will be to disable the slower grid nodes causing the issues from time to time (both sharing same VM host which is likely the reason for its resources to be exhausted).
tests completed

slow grid node completed
Remco
#5 Posted : Thursday, March 21, 2019 10:01:11 PM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 5,818

Thanks: 752 times
Was thanked: 972 time(s) in 926 post(s)
GreenMoose;13237 wrote:

So maybe there are some places missing "TC progress report" by NCrunch Console, causing TC to think build is hanging on when slow grid nodes are in use.


Was there any other activity during the hang-up that NCrunch was reporting internally but was not being reported to TC? I'm just trying to identify whether this is a genuine hang (i.e. we got stuck on a task for a while and had nothing to report), or whether we're neglecting to keep TC informed.
GreenMoose
#6 Posted : Friday, March 22, 2019 6:30:46 AM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 6/17/2012(UTC)
Posts: 461

Thanks: 126 times
Was thanked: 58 time(s) in 56 post(s)
Remco;13244 wrote:
Was there any other activity during the hang-up that NCrunch was reporting internally but was not being reported to TC? I'm just trying to identify whether this is a genuine hang (i.e. we got stuck on a task for a while and had nothing to report), or whether we're neglecting to keep TC informed.

According to build log there was complete silence for 23 minutes after 13:06:


Code:

...
[12:46:25] :	 [Step 14/20] [Core-16] Max number of test runner processes to pool = '1'
[12:46:25] :	 [Step 14/20] [Core-16] Log verbosity = 'Low'
...
[12:46:47] :	 [Step 14/20] [Core-16] Connection established with remote grid node at gridnode1
[12:46:47] :	 [Step 14/20] [Core-16] Connection established with remote grid node at gridnode2

< a bunch of tests executing, no logs at all from gridnode2, only gridnode1>


[13:06:43] :	 [Step 14/20] ProjName.DataRemover.Tests : ProjName.DataRemover.Tests.MiscCustomersRemoverFixture._Fixture_
[13:06:43] :		 [ProjName.DataRemover.Tests : ProjName.DataRemover.Tests.MiscCustomersRemoverFixture._Fixture_] [Test Output]

<some test output from above test, but no more timestamps until 13:29:14 below>

[13:29:14] :	 [Step 14/20] [Core-219] Sending processing instructions to node gridnode2 for 1 tasks
[13:29:36] :	 [Step 14/20] [Core-309] Grid node gridnode2 reports task completed: [LocalBuildTask: [SnapshotComponent: ProjName.Application.WebProxyModels, 35, 42991853], ProcessingSucceeded, gridnode2, 9c74511f-3095-4471-8e09-4b6af3d4b4f7]
<giridnode2 build/analyzing steps>
[13:39:48] :	 [Step 14/20] [Core-343] Cleaning up workspace: D:\TmpTcWd\_tmp\NcWc\9672\28
[13:39:51] :	 [Step 14/20] [?-1] Returning result: OK
[13:39:51] :	 [Step 14/20] Exit code: #0


Remco
#7 Posted : Friday, March 22, 2019 10:19:21 PM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 5,818

Thanks: 752 times
Was thanked: 972 time(s) in 926 post(s)
Ok, although I can agree that the experience here is certainly suboptimal, I think there is an argument for saying that this is probably behaviour as designed.

TC is reporting this build as likely hung, because in a manner of speaking, it probably is. NCrunch is stuck on a restricted resource on the machine so that a specific task it's performing is taking longer than expected. Unless there is some kind of progress information to report on this task (and it's a task that regularly blocks up builds), there probably isn't a more sensible way for us to report this.
Users browsing this topic
Guest
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

YAF | YAF © 2003-2011, Yet Another Forum.NET
This page was generated in 0.040 seconds.