Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

Tests don't run on the grid from TC after upgrade
samholder
#1 Posted : Tuesday, September 11, 2018 8:09:37 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 5/11/2012(UTC)
Posts: 94

Thanks: 28 times
Was thanked: 12 time(s) in 12 post(s)
I've just upgraded all our grid nodes from 3.09 to 3.20. I also updated all corresponding console tools on the TC build agents. I updated the licence on the console tools.

When I run the builds which should use the grid nodes I get this output in the TC build log:

[20:53:18][Step 16/25] [InitialisationQueueTask-27] Not restoring NuGet packages for 'D:\BuildAgent-02\work\ff7c024d120413d4\MyProj\proj1.csproj' because package restore mode is disabled
[20:53:18][Step 16/25] [InitialisationQueueTask-27] Loading project: D:\BuildAgent-02\work\ff7c024d120413d4\MyProj\proj1.csproj
[20:53:19][Step 16/25] [?-1] Shutting down engine
[20:53:22][Step 16/25] [InitialisationQueueTask-152] Not restoring NuGet packages for 'D:\BuildAgent-02\work\ff7c024d120413d4\MyProj\proj2.csproj' because package restore mode is disabled
[20:53:22][Step 16/25] [InitialisationQueueTask-152] Loading project: D:\BuildAgent-02\work\ff7c024d120413d4\MyProj\proj2.csproj
...similar messages for the other projects....
[20:53:34][Step 16/25] [Core-15] All projects have been loaded
[20:53:36][Step 16/25] [GridMessageReceiver-115] Connection to gridnode1 closed by remote
[20:53:36][Step 16/25] [GridMessageReceiver-17] Connection to gridnode2 closed by remote
[20:53:36][Step 16/25] [GridMessageReceiver-35] Connection to gridnode3 closed by remote
[20:53:38][Step 16/25] [?-1] Returning result: OK
[20:53:38][Step 16/25] Process exited with code 0
[20:53:38][Step 16/25] Step Run tests with NCrunch (Command Line) failed

obviously its not actually running any of the tests...

Have I missed some step? Is there some config I need to change? How can I diagnose what this issue might be?
samholder
#2 Posted : Tuesday, September 11, 2018 8:53:17 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 5/11/2012(UTC)
Posts: 94

Thanks: 28 times
Was thanked: 12 time(s) in 12 post(s)
Ah, think I figured this out. when I looked at the overall build (and not just focussing on the build log), it said

> The NCrunchCacheStoragePath must be set when using the console tool.

and adding this to the config seems to have made it happy again.
samholder
#3 Posted : Tuesday, September 11, 2018 9:52:33 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 5/11/2012(UTC)
Posts: 94

Thanks: 28 times
Was thanked: 12 time(s) in 12 post(s)
So there seems to be some issue still.

Firstly although it claimed to have connected to 3 grid nodes, it only ever gave tests to one of them.

secondly although it ran all the tests successfully it reported the build as failed.

This was in the logs at the start:

[21:50:39][Step 16/25] [Core-13] All projects have been loaded
[21:50:41][Step 16/25] [Core-13] Connection established with remote grid node at gridnode1
[21:50:41][Step 16/25] [Core-13] Connection established with remote grid node at gridnode2
[21:50:41][Step 16/25] [Core-13] Connection established with remote grid node at gridnode3
[21:51:21][Step 16/25] [Core-58] Sending processing instructions to node gridnode3 for 3 tasks
[21:51:23][Step 16/25] [Core-164] Grid node gridnode3 reports task completed: [LocalBuildTask: [SnapshotComponent: Quoting.DbUp, 6, 42935185], ProcessingFailed, gridnode3, f16b7080-36e1-46ca-9935-ce373a873293]
[21:51:25][Step 16/25] [Core-212] Grid node gridnode3 reports task completed: [LocalBuildTask: [SnapshotComponent: Quoting.EsUp, 3, 13253221], ProcessingFailed, gridnode3, cba1bbda-5966-4281-a873-498589925dde]
[21:51:25][Step 16/25] [Core-64] Grid node gridnode3 reports task completed: [LocalBuildTask: [SnapshotComponent: Quoting.Infrastructure, 5, 66052903], ProcessingFailed, gridnode3, 82ad25ee-0fdf-4aba-b00f-19e41596d02e]
[21:51:45][Step 16/25] [Core-130] Sending processing instructions to node gridnode2 for 3 tasks
[21:51:47][Step 16/25] [Core-170] Grid node gridnode2 reports task completed: [LocalBuildTask: [SnapshotComponent: Quoting.DbUp, 6, 42935185], ProcessingFailed, gridnode2, c7c591c3-efa9-49d7-888f-929cc497f58c]
[21:51:47][Step 16/25] [Core-188] Sending processing instructions to node gridnode1 for 3 tasks
[21:51:49][Step 16/25] [Core-173] Grid node gridnode2 reports task completed: [LocalBuildTask: [SnapshotComponent: Quoting.EsUp, 3, 13253221], ProcessingFailed, gridnode2, 2e3d6a6d-befb-4bab-8661-56e236e29a79]
[21:51:49][Step 16/25] [Core-65] Grid node gridnode2 reports task completed: [LocalBuildTask: [SnapshotComponent: Quoting.Infrastructure, 5, 66052903], ProcessingFailed, gridnode2, 50d37dc9-3ce2-439d-8300-db17d0c24561]

after the test run details:

[22:28:52][Step 16/25] [?-1] Reporting engine execution results
[22:28:57][Step 16/25] ##teamcity[buildProblem description='NCrunch result: BuildFailure' timestamp='2018-09-11T22:28:57.762+0100']
[22:28:57][Step 16/25] [?-1] Shutting down engine
[22:28:58][Step 16/25] [GridMessageReceiver-68] Connection to gridnode1 closed by remote
[22:28:58][Step 16/25] [GridMessageReceiver-216] Connection to gridnode2 closed by remote
[22:28:58][Step 16/25] [GridMessageReceiver-219] Connection to gridnode3 closed by remote
[22:29:01][Step 16/25] [?-1] Returning result: OK
[22:29:01][Step 16/25] Process exited with code 0
[22:29:01][Step 16/25] Step Run tests with NCrunch (Command Line) failed

so it looks like some processing failed for some of the projects on two of the grid nodes. How can I diagnose what this issue is? Everything worked on all the grid nodes before the upgrade...
Remco
#4 Posted : Tuesday, September 11, 2018 11:35:26 PM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 6,976

Thanks: 931 times
Was thanked: 1257 time(s) in 1170 post(s)
Hi, thanks for sharing this issue.

The console tool has seen major development since the v3.9 release. The UX has changed completely and the handling of many of the conditions inside the run has also changed to better reflect the needs of CI systems.

The first thing I would do is increase your log verbosity so you can see more about what the tool is doing. You can do this just by passing '-LogVerbosity Detailed' into the tool through the command line.

The second thing is to take a look at the HTML reports written by the tool when it completes the run. You can find these in the tool's output directory usually situated under your solution directory. The reports are now very extensive and will give you a much more legible overview of the run than you'll get from the trace logs.

Finally, if you're using the tool under Team City, there is now integration with TC that should give you extra information. You can also add some tabs into the TC configuration to easily access more information about the run.

Once you get the build failure message, it should be possible to assess the situation further. Something to watch for is that the tool no longer restores packages for you - it's assumed that you do this as an extra step within your CI, since that's the much faster way to do it.
samholder
#5 Posted : Wednesday, September 12, 2018 1:11:34 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 5/11/2012(UTC)
Posts: 94

Thanks: 28 times
Was thanked: 12 time(s) in 12 post(s)
Enabling verbose logging doesn't tell me what the build issue is on the failing servers.

The HTML reports say that building was successful (it only seems to report from the one successful node). The errors/warnings tab doesn't have anything in it. it looks like ss far as its concerned everything worked, but build status is failed.

Funnily one of the failing grid nodes just started working for no apparent reason. No changes. Makes me suspicious that its something to do with the nuget packages.

I've tried restricting the run to only use one grid node that doesn't work so it'll report the build error, and I just get a report that says every build is 'not run' in the AllResults.html and the log just has same as above:

[21:51:49][Step 16/25] [Core-173] Grid node gridnode2 reports task completed: [LocalBuildTask: [SnapshotComponent: Quoting.EsUp, 3, 13253221], ProcessingFailed, gridnode2, 2e3d6a6d-befb-4bab-8661-56e236e29a79]

We are restoring the packages (and its still building with MSBuild first and thats working), but all tests processing is on the grid, the build machines do no processing, but I'm still seeing some stuff that seems to indicate the NCrunch is doing some build analysis on the build agent. I see some warnings like:

[12:14:14] : [Step 16/25] [InitialisationQueueTask-46] Build analysis output: Dependency "Something.Core, Version=3.0.339.0, Culture=neutral, PublicKeyToken=null".
[12:14:14] : [Step 16/25] [InitialisationQueueTask-46] Build analysis output: Could not resolve this reference. Could not locate the assembly "Something.Core, Version=3.0.339.0, Culture=neutral, PublicKeyToken=null". Check to make sure the assembly exists on disk. If this reference is required by your code, you may get compilation errors.

This is ok I think as some components are built with this version but generally parts reference the next version and the build works in ms build.

I'm coming up short being able to establish what the issue could be and there don't seem to be any error messages to help.

I've tried using the grid node from VS on my local machine (disabling all other processing options including local). In the tests window I just get '2 projects failed to build' in the window and nothing else. They don't get red 'X's and if I click on them there is nothing reported in the output pane. In the processing queue I see tasks which say that the projects failed to build, which have red 'X's and if I select these rown the output pane has this:

[15:14:27.7795-NodeProcessor-109] ERROR (Internal): nCrunch.TaskRunner.Ipc.IpcConnectionClosedException: The connection has been closed
at nCrunch.TaskRunner.Ipc.Fast.IpcReader.Initialise()
at nCrunch.TaskRunner.Ipc.Fast.IpcStream.Connect(String ipcName, Boolean isHost)
at nCrunch.TaskRunner.Ipc.Fast.FastIpcClient.Connect(String ipcName)
at nCrunch.Core.ProcessManagement.DefaultProcessLoader.ConnectToProcess(ExternalProcess externalProcess, ProcessLoadParameters parameters, Action`1 outOfBandMessageHandler)
at nCrunch.Core.ProcessManagement.ExternalProcessManager..()
at nCrunch.Common.PerformanceTracking.PerfTracker.TrackActivity(String name, Action activity)
at nCrunch.Core.ProcessManagement.ExternalProcessManager.(ProcessorArchitecture , ProcessLoadParameters )

[15:14:27.7795-NodeProcessor-109] ERROR (Internal): nCrunch.TaskRunner.Ipc.IpcConnectionClosedException: The connection has been closed
at nCrunch.Core.ProcessManagement.ExternalProcessManager.(ProcessorArchitecture , ProcessLoadParameters )
at nCrunch.Core.ProcessManagement.ExternalProcessManager.LoadExternalProcess(ProcessLoadParameters parameters, GridClientId client)
at nCrunch.Core.BuildManagement.BuildProcessLauncher..()
at nCrunch.Common.PerformanceTracking.PerfTracker.TrackActivity(String name, Action activity)
at nCrunch.Core.BuildManagement.BuildProcessLauncher.(Action`1 , ProcessorArchitecture , GridClientId , BuildSystemParameters , IList`1 , Nullable`1 , GridAddress )
at nCrunch.Core.BuildManagement.BuildProcessLauncher.BuildComponentInExternalProcess(ComponentBuildParameters parameters, VisualStudioVersion vsVersion, GridClientId client, IList`1 customEnvironmentVariables, Guid taskId, GridAddress clientAddress)
at nCrunch.Core.BuildManagement.BuildEnvironment.Build(SnapshotComponent snapshotComponentToBuild, IList`1 referencedComponents, GridClientId gridClientId, IList`1 customEnvironmentVariables, IPlatformBuildExtender extender, Guid taskId, GridAddress clientAddress)
at nCrunch.Core.Processing.BuildTaskLogic.DoProcessTaskAndReturnSuccessFlag()
at nCrunch.GridNode.NodeTaskProcessor..()
at nCrunch.Common.ErrorHandler.DoWithErrorHandling(Action action, Object context)

but no actual build error message


Any ideas?
samholder
#6 Posted : Wednesday, September 12, 2018 2:55:54 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 5/11/2012(UTC)
Posts: 94

Thanks: 28 times
Was thanked: 12 time(s) in 12 post(s)
So rebooting the machine seems to cured the ills.

Not sure what was going on
Remco
#7 Posted : Thursday, September 13, 2018 1:27:39 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 6,976

Thanks: 931 times
Was thanked: 1257 time(s) in 1170 post(s)
There's something very suspicious happening here.

We've just in the last 24 hours experienced the exact same issue in our own environment, shortly after installing the latest version of .NET Core on our grid nodes. It only affected our .NET Core test projects.

And just like that, it suddenly vanished without apparent reason.

The error described here is caused by a task runner process failing to initialise. There's a huge range of platform issues that can prevent this, much of which is completely external to NCrunch and outside of its control.

This looks to me like either a platform-level stability issue that has been newly introduced (and resolves itself after a reboot), or it's an intermittent issue that we're likely to see again.

In the case of the former, there's no path for us to investigate this, and even if we understand it fully, we likely won't be able to fix it.

In the case of the latter, we'll need to wait for it to show up again so we can try and narrow it down and get more detailed information.
Users browsing this topic
Guest
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

YAF | YAF © 2003-2011, Yet Another Forum.NET
This page was generated in 0.068 seconds.
Trial NCrunch
Take NCrunch for a spin
Do your fingers a favour and supercharge your testing workflow
Free Download