Is there a way to visualize the test pipeline? - General Feedback

Welcome Guest! To enable all features please Login or Register.

Notification

Error

NCrunch Forum » Crunch Discussion » General Feedback » Is there a way to visualize the test pipeline?

Is there a way to visualize the test pipeline?

Options

Previous Topic Next Topic

Ralf Koban		#1 Posted : Monday, September 14, 2015 3:08:39 PM(UTC)
Rank: Advanced Member Groups: Registered Joined: 5/19/2014(UTC) Posts: 44 Location: Germany Thanks: 4 times Was thanked: 10 time(s) in 9 post(s)		Hi Remco, is there a way to get the test pipeline visualized? I would like to see which tests are run in parallel (even accross the grid), when they started / finished and how long they took. I would also like to see the compile/building time. The reason behind is that I have some longer running integration tests (compared to the unit tests) and I would like to see whether it's feasible to modify the test pipeline in a way that I reserve some grid nodes exclusively for those longer running integration tests. Best regards, Ralf
Back to top

User Profile View All Posts by User View Thanks

Remco		#2 Posted : Monday, September 14, 2015 9:11:55 PM(UTC)
Rank: NCrunch Developer Groups: Administrators Joined: 4/16/2011(UTC) Posts: 7,330 Thanks: 991 times Was thanked: 1334 time(s) in 1237 post(s)		Hi Ralf, The problem with this approach is that the pipeline is very dynamic. NCrunch can't accurately predict how long some tasks will take, so it relies on an approach where grid nodes and the local engine each 'pull' tasks out of the queue as their resources become available. At the moment the only way to get a picture of what is happening in the NCrunch is to examine the processing queue while the engine is working. If you set the filters to only show tasks that are being executed, this will allow you to see which tasks are being sent to the nodes and which tasks are holding up the queue.
Back to top

User Profile View All Posts by User View Thanks

Ralf Koban		#3 Posted : Tuesday, September 15, 2015 6:55:00 AM(UTC)
Rank: Advanced Member Groups: Registered Joined: 5/19/2014(UTC) Posts: 44 Location: Germany Thanks: 4 times Was thanked: 10 time(s) in 9 post(s)		Hi Remco, thanks for the explanation. Would it be possible to get the start time of each task in the test processing window? And would it be possible to export the content of the Processing tab into a CSV file? Actually, I don't care about how accurate the information really is, I would like to just get a more general picture of what happened (how the different tasks got distributed over the network, if I can use some slower machines as grid nodes to run only some specific tests, without "slowing down" the overall test run). So I don't need some kind of real-time feedback of the pipeline, it's perfectly fine for me to evaluate the data after a complete build/test run. Best regards, Ralf
Back to top

User Profile View All Posts by User View Thanks

Remco		#4 Posted : Tuesday, September 15, 2015 9:42:11 PM(UTC)
Rank: NCrunch Developer Groups: Administrators Joined: 4/16/2011(UTC) Posts: 7,330 Thanks: 991 times Was thanked: 1334 time(s) in 1237 post(s)		Hi Ralf, I wonder if the best way to handle this may be to use the NCrunch logs. If you set Log Verbosity to Medium (or perhaps High, depending on the detail you need) and set Log To Output Window to True, NCrunch will dump a ton of data in the VS output window. Sifting through this manually would be very painful for a big grid run, but you could use a Regex to slice up the data to retrieve the start and end points of each of the execution tasks. Because each of the messages is time stamped, it is theoretically possible to extract this data .. though it might require some effort. There is actually a work in progress in the NCrunch engine to perform detailed performance monitoring over individual test runs with the intention of doing this sort of tuning. With instability in the ecosystem (new VS, new NUnit, new Xunit) it's tough to say when I'll be able to release this .. but we are moving in this direction.
Back to top

User Profile View All Posts by User View Thanks

Ralf Koban		#5 Posted : Thursday, January 14, 2016 3:15:43 PM(UTC)
Rank: Advanced Member Groups: Registered Joined: 5/19/2014(UTC) Posts: 44 Location: Germany Thanks: 4 times Was thanked: 10 time(s) in 9 post(s)		Hi Remco, just updated to v2.19.0.4 and now I get the detailed performance information I was looking for. Very nice, it really helps me a lot. :) Nevertheless, I have a small question on that. Some steps show a "(not specified)" text, I'm not sure what this actually means. Could you provide me some information about that? Best regards, Ralf
Back to top

User Profile View All Posts by User View Thanks

Remco		#6 Posted : Thursday, January 14, 2016 9:50:13 PM(UTC)
Rank: NCrunch Developer Groups: Administrators Joined: 4/16/2011(UTC) Posts: 7,330 Thanks: 991 times Was thanked: 1334 time(s) in 1237 post(s)		Hi Ralf, I was waiting for someone to ask this question, and I'm glad you did, as it means someone is getting some good use of this feature :) (not specified) basically means 'I don't know'. When NCrunch constructs the performance tracking chart, it correlates every task performed by the engine in pursuit of the end goal (the reporting of the test result). An effort was made to make sure that every significant task is correctly labelled and tracked, but there are always cracks. It wasn't possible to include much of the internal plumbing of the engine (of which the impact is usually negligible). Also, because of the level of concurrency, the engine can actually be tied up doing other work for tasks completely unrelated to the end goal. For example, the engine may be busy merging code coverage data from a different test run, and is therefore unable to allocate the resources to perform other tasks. These situations then show as gaps of '(not specified)'. I'm hoping that the gaps you're observing aren't big ones. If they are, we should investigate further.
Back to top
1 user thanked Remco for this useful post.		Ralf Koban on 1/15/2016(UTC)
User Profile View All Posts by User View Thanks

Ralf Koban		#10 Posted : Friday, January 15, 2016 1:23:21 PM(UTC)
Rank: Advanced Member Groups: Registered Joined: 5/19/2014(UTC) Posts: 44 Location: Germany Thanks: 4 times Was thanked: 10 time(s) in 9 post(s)		Hi Remco, Thanks for the explanation. I was just curious about these steps as they showed up somehow regularly. For the moment I don't have the feeling that those gaps are big ones.
Back to top

User Profile View All Posts by User View Thanks

GreenMoose		#7 Posted : Friday, February 5, 2016 9:43:19 PM(UTC)
Rank: Advanced Member Groups: Registered Joined: 6/17/2012(UTC) Posts: 509 Thanks: 147 times Was thanked: 66 time(s) in 64 post(s)		Remco;8232 wrote: ... I'm hoping that the gaps you're observing aren't big ones. If they are, we should investigate further. Hi, what would candidate as "big" here? Is 14s like this one a non-big one? :) Thanks.
Back to top

User Profile View All Posts by User View Thanks

Remco		#8 Posted : Friday, February 5, 2016 9:55:23 PM(UTC)
Rank: NCrunch Developer Groups: Administrators Joined: 4/16/2011(UTC) Posts: 7,330 Thanks: 991 times Was thanked: 1334 time(s) in 1237 post(s)		GreenMoose;8306 wrote: Hi, what would candidate as "big" here? Is 14s like this one a non-big one? :) That's very big. Looking at its position, I would say that the engine was overloaded when it was switching the data out of the task thread and back to the core engine thread. Are you seeing this regularly?
Back to top

User Profile View All Posts by User View Thanks

Ralf Koban		#11 Posted : Saturday, February 6, 2016 8:41:13 AM(UTC)
Rank: Advanced Member Groups: Registered Joined: 5/19/2014(UTC) Posts: 44 Location: Germany Thanks: 4 times Was thanked: 10 time(s) in 9 post(s)		Hi GreenMoose, I had similar gaps all over my tests and saw them regularly. I came to the conclusion that I completely overloaded my engine. My grid nodes have approx. 8 logical CPU cores and I had set the task capacity to 16 (so for each grid nodes I doubled the task capacity based on its logical CPU cores) , mainly to reduce the build times. As a result my engine was overloaded but based on the nature of my tests that gave me the max. throughput. As I've seen those gaps I thought that they exist probably due to garbage collection, thread scheduling, context switching etc. So I reduced the task capacity for each grid node back to the number of logical CPU cores. In some situations that's kind of "suboptimal" but the overall throughput loss is approx 5 seconds or so at maximum, so it's acceptable for me. The reason is that my long-running tests now no longer run that long (eg. a test that has run for 20 seconds now takes only 10 seconds); so the reduction in task capacity led to a speed-up of test duration and the overall duration is almost the same. However, the large gaps are gone now. I'm not sure who you use the grid resp. how you configured your grid nodes but maybe that's worth looking at. Actually, when I did that I was surprised on the results. Best regards, Ralf
Back to top		\| Edit by user

User Profile View All Posts by User View Thanks

GreenMoose		#9 Posted : Monday, February 8, 2016 9:32:13 AM(UTC)
Rank: Advanced Member Groups: Registered Joined: 6/17/2012(UTC) Posts: 509 Thanks: 147 times Was thanked: 66 time(s) in 64 post(s)		Remco;8307 wrote: GreenMoose;8306 wrote: Hi, what would candidate as "big" here? Is 14s like this one a non-big one? :) That's very big. Looking at its position, I would say that the engine was overloaded when it was switching the data out of the task thread and back to the core engine thread. Are you seeing this regularly? No, but I had some issues generally that day since the entire test suite took extremely long time so something was wrong somewhere. I don't see such today when I re-run it and tests executes "normally". I wonder if it was related to the azure node in some way which had "waiting in processing queue" of over 20minutes then. Ralf Koban wrote: I'm not sure who you use the grid resp. how you configured your grid nodes but maybe that's worth looking at Yes that certainly seems interesting, I also "over allocate" (in terms of 5 tasks on a 4 core computer) e.g. the Azure node I use. And I just noticed that on "local" node, which is where above test was run, I have assigned 7 of 16 cores to NCrunch, and have set max number of processing tasks to 8, thus also over allocating that one. I will keep this in mind and see if I can repro/fix it without over allocating. Thanks for the tip.
Back to top

User Profile View All Posts by User View Thanks

Remco		#12 Posted : Monday, February 8, 2016 10:52:10 AM(UTC)
Rank: NCrunch Developer Groups: Administrators Joined: 4/16/2011(UTC) Posts: 7,330 Thanks: 991 times Was thanked: 1334 time(s) in 1237 post(s)		The 14 second delay in this situation would likely have been caused by an internal task (or series of tasks) tying up the engine's main thread over an extended period of time. In my understanding of the design of the engine, it's unlikely that this could be the result of over allocating and is more likely to be due to something internal taking much longer than it should be. So basically, I think it's a defect. The hard part will be identifying steps to reproduce it. I currently am aware of three ways in which the engine can be overloaded: 1. During initialisation (unavoidable and always temporary) 2. When running a large number of tests with high coverage spread that have never been executed before (i.e. first time run with no cache file) 3. When using the risk/progress indicator with a high number of tests and processing threads (this feature doesn't scale well).
Back to top

User Profile View All Posts by User View Thanks

GreenMoose		#13 Posted : Monday, February 8, 2016 11:13:24 AM(UTC)
Rank: Advanced Member Groups: Registered Joined: 6/17/2012(UTC) Posts: 509 Thanks: 147 times Was thanked: 66 time(s) in 64 post(s)		Remco;8316 wrote: ... I currently am aware of three ways in which the engine can be overloaded: 1. During initialisation (unavoidable and always temporary) 2. When running a large number of tests with high coverage spread that have never been executed before (i.e. first time run with no cache file) 3. When using the risk/progress indicator with a high number of tests and processing threads (this feature doesn't scale well). Right, this scenario should probably have been case 2. I had just enabled the perf. tracing feature and restarted and everything was seemingly "like the first time". Although item 3 might also be cases since I tend to use that in order just to see the progress of what's left and clearly see number of failing tests / build result. (I am not interested in the "risk" graph itself though, and the "remaining time" which is shown upon mouse hover I don't get either, it always shows an extreme time number which I is if I execute everything in serial?) I'll move that window out of my way for now then. Thanks.
Back to top

User Profile View All Posts by User View Thanks

Users browsing this topic
Guest

NCrunch Forum » Crunch Discussion » General Feedback » Is there a way to visualize the test pipeline?

Forum Jump

You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.