Magnus Lidbom;18422 wrote:
I would love to give you the information you are asking for, but for a week trying to get clear information from NCrunch about which tests are actually the problem has eluded me. Run after run with no problem, then suddenly one failure, or a torrent with hundreds of them.
There is precious little information in the NCrunch logs in the UI. And if I turn on logging to file on a grid node with anything above summary, there is such a torrent of logging that I struggle mightily to find anything in the deluge.
I spent a week believing that the problem was in our code, but I honestly very much doubt that now.
Tests ranging from using my custom attributes to tests using another custom attribute to tests using plain Fact or Theory have been crashing right and left since I upgraded with little to no discernible pattern other than that as long as I only use one grid node everything is rock solid. But goodness the frustration it took before I realized that as long as there was no distribution there was no problem....
Understood. Problems of this nature have sadly formed quite a large part of my life for the last 17 years, and I understand that to call them _infuriating_ is an understatement. Because we're exchanging quite often on this forum, I feel it's best I level with you a bit here.
Imagine having the above thrown at you on a Sunday, with a week of frustration behind it, along with no clear steps to reproduce and a lot of implied blame. I do my best to help people with issues like this here, but it's never nice to be treated like a target.
With that aside, I've done my best to analyse this issue and understand it. The solution you've shared didn't manage to pass its DB tests for me, but fortunately that wasn't necessary, as the problem is around test discovery and execution, not the results.
I wasn't able to reproduce the stack trace you've shared, but I COULD reproduce the other stability related errors when discovering the AccountManagement tests on one machine and executing them on another. I've had a look through the custom attributes in your code and through the related Xunit serialization logic and I've managed to draw a few conclusions about what is happening.
When Xunit discovers tests, there are a couple of very important things that it does:
1. It generates unique identifiers for each test case. These aren't just method names, they also include parameters details for theory cases and other information used to identify a test no matter which domain it is in.
2. It serializes details about the test, capturing it in such a state that the same test can be deserialized in a different process and executed consistently
This is a process that is generally transparent when all is well, but introduces a large amount of very necessary complexity, because we need to be able to find a test in one process/environment, then transfer it to be run in another one, and have all the results make sense. For tests with simple scalar parameters, it's not so bad, but when you start deconstructing the sorts of types that people pass into test parameters, it gets easier to understand why there is a higher chance for things to go wrong here. Serialization of arbitrary types is complex. In this situation, when it fails, it's hard to get reliable information that can help with solving the problem.
It's for this reason that I am a strong advocate of keeping your test parameters as simple as possible. If you need to build complicated types to represent your tests, it's better to use strings instead and instantiate the types inside the test code itself.
Anyway, the root cause of these problems seems to be that the serialized data for your tests is including the assembly path as it exists on the machine responsible for discovering the tests. As you've identified, this path is not necessarily the same on the machine executing them, and this causes stability problems such as the inability to identify the tests being run, and the potential for type resolution issues during deserialization.
Xunit.Sdk.SerializationHelper contains logic that includes the assembly file path in the serialized data. So this problem is not specific to your code - it's a
broader issue that I've now raised with the Xunit team to see if they can suggest a way to solve it. I've managed to reproduce the issue myself with a small code sample making use of an IXunitSerializable parameter in a theory testcase.
In Compze.Tests.Infrastructure.XUnit.TestFrameworkExtensions.XFactAttributeTestCaseDiscoverer I've tried removing the logic that manually constructs test IDs, and I found that this significantly helped with stability to the point where I was no longer able to immediately reproduce these issues in the Compze solution. This doesn't mean that adjusting this logic makes everything OK, because it's not. Right now any Xunit3 test metastructure that drags in a System.Type can hit the above problem with distributed processing. I also have serious concerns about these custom attributes and the way that they work.
It's a general policy in NCrunch that we don't officially support people implementing custom test attributes. This is regardless of framework (NUnit, MSTest, Xunit), and it's for a very good reason: When you're doing this, you are fundamentally changing the way the toolset works, and it's utterly impossible for NCrunch (or its developers) to predict exactly how everything will behave. Put simply, we can't warrant that NCrunch will work correctly when you do this. There's no amount of testing we can do or support we can offer that will provide any guarantee of stability with a modified test framework.
That doesn't mean that it can't work. It just means that if you do this, please don't expect help from us. I really can't stress this enough. When you implement your own test framework attributes with the intention of changing framework behaviour, you are playing with fire. It can result in you living in a horrible hellish nightmare of weird intermittent problems that are nearly impossible to analyse and can appear or disappear seemly at will. The past week of your life will likely have been very instructive in this. With the weight of all my experience in this area, I will tell you that whatever syntactic benefits you hope to achieve with this approach, it's never worth it.
I'm hopeful that we'll hear back from the Xunit team on this issue. Right now I can't find a way to fix this in NCrunch. For the time being, for anyone affected by this, I recommend one of the following:
- Avoid using custom types on test parameter lines
OR
- Avoid multi-node execution (distributed with a single node and no local should be fine)
OR
- Rollback to Xunit v2