Welcome Guest! To enable all features please Login or Register.

Notification

Icon
Error

2 Pages<12
Setting instrumentation mode to Optimised fails to take project references in account
abelb
#21 Posted : Sunday, March 15, 2020 11:24:06 AM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 9/12/2014(UTC)
Posts: 155
Location: Netherlands

Thanks: 19 times
Was thanked: 11 time(s) in 11 post(s)
Quote:
The new instrumentation system shouldn't have much effect on the actual running times of your tests ... but builds and discovery will be much faster :)

Yes, I read that in your extensive blog post as well.

I did some timings, and it seems that the build times, at least for those large F# projects, don't really increase, nor decrease. Discovery is, however, indeed faster. The following timings were taken with a stopwatch and are not very precise. Obviously, there are many circumstances that can influence these numbers. I ran the timings a few times, but a better test would be some way to run it from a command-line and find the timings of each step, I don't know if that is (easily) possible.

Legacy:
Reload, indexing: 0:59
Build (until building msg disappears): 5.44
Start tests: 6:05
Finished: 7:40
Only (re)running all tests in existing threads: 1:12

Optimized
Reload, indexing: 0:58
Build (until building msg disappears): 4:37
Start tests: 4:45
Finished: 6:52
Only (re)running all tests in existing threads: 1:12

Reset, Rebuild & run all (optimized)
Reload, indexing: 0:54
Build (until building msg disappears): 4:31
Start tests: 4:43
Finished: 6:32
Only (re)running all tests in existing threads: 1:09

There are some peculiarities in this solution that may influence these timings:

  • When I remove instrumentation, the two biggest projects go from ~4 min to ~1 min, matching build times of VS
  • When I change something in the big Common F# prj (the one I sent you), build time is around 2 min, rebuild w/o change between 80-90sec
  • This sln has about 7500 tests, each logging about 1-2kb, which probably influences test times (and seems to slow down VS as well, but not sure that's due to logging or something else)
  • I noticed that a lot of non-user code gets instrumented. F# projects typically have a large number of very small methods (like fun a -> a + 10) that get inlined by JIT (if under 16 IL bytes or so), and the result of adding instrumentation to those methods has a detrimental effect to compile+instrument times, and to running times. A suggestion would be to not instrument tiny methods. This would have an advantage outside of F# as well: these timings won't add up anyway and it may (significantly) speed up instrumenting. Downside is that it will also limit code-coverage of such (tiny) methods, but perhaps timings can be removed, and coverage markers remain?
abelb
#22 Posted : Sunday, March 15, 2020 11:25:13 AM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 9/12/2014(UTC)
Posts: 155
Location: Netherlands

Thanks: 19 times
Was thanked: 11 time(s) in 11 post(s)
And another thing: when I remove instrumentation of half of the projects and do a full rebuild of the whole solution, I get the System.InvalidProgramException : Common Language Runtime detected an invalid program. back. Still researching that one. It happens in a tail-recursive rec function of one of the assemblies that is still instrumented. Surprisingly, that same code doesn't fail when instrumenting all projects, even though the failing code has no dependency on a project that wasn't instrumented (afaict).

Setting the offending prj to Legacy solves it atm.
Remco
#23 Posted : Sunday, March 15, 2020 10:51:29 PM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 7,177

Thanks: 968 times
Was thanked: 1298 time(s) in 1203 post(s)
It's interesting that you aren't noticing much of an improvement in the build times. It's possible that this is due to the nature of FSharp. I've noticed that FSharp assemblies seem to have a huge amount of generated code. Probably, the mechanics of this generation is bogging down the instrumentor. Do you notice much difference with the 'Detect stack overflow' setting disabled or the 'Analyse line execution times' disabled?

You'll find it much easier to extract timing data using the 'Execution Steps' pane in the Tests Window. You may need to have a test automatically run to be able to extract the build timings.

abelb;14524 wrote:
And another thing: when I remove instrumentation of half of the projects and do a full rebuild of the whole solution, I get the System.InvalidProgramException : Common Language Runtime detected an invalid program. back. Still researching that one. It happens in a tail-recursive rec function of one of the assemblies that is still instrumented. Surprisingly, that same code doesn't fail when instrumenting all projects, even though the failing code has no dependency on a project that wasn't instrumented (afaict).

Setting the offending prj to Legacy solves it atm.


Are you able to isolate this? We do have some special handling around tail methods ... maybe we're getting something wrong..
abelb
#24 Posted : Monday, March 16, 2020 12:04:51 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 9/12/2014(UTC)
Posts: 155
Location: Netherlands

Thanks: 19 times
Was thanked: 11 time(s) in 11 post(s)
Quote:
It's possible that this is due to the nature of FSharp. I've noticed that FSharp assemblies seem to have a huge amount of generated code.

This is correct. Any let-bound function in a module, nested, curried becomes an auto-generated class that essentially captures the closure as its state. These classes have one or move Invoke methods that execute the captured function. F# goes through great lengths to try to minimize the size of these functions, and programmers tend to be very aware of whether or not a closure needs to be captured.

For instance, [1;2;3] |> List.map (fun x -> x + 1) will create an anonymous function, i.e. a class with Invoke for the function with the body x + 1 (this isn't 100% true, and technically such an anonymous function is also created for |> (piping), but that gets source-inlined, so it isn't visible in the IL).

Such coding is idiomatic in F#. The result is that there are a lot of very small classes with very small Invoke methods. That isn't much of a problem, because JIT will inline most of them anyway, so after JIT'ing they disappear. If I inspect the code after instrumentation, I can see that instrumentation is added to all methods. For analyzing execution times, I doubt you can capture any reasonable timings from pure IL (without calls into other methods). In fact, if I inspect "Hot spots", even in tests that run long (1 sec or so), all but a few "outer" methods report 0ms (that's another point for debate, why ms? For line-level timings that seems rather course). Any other line always starts at 15 or 16ms, then 31 or 32, then around 45-48, so there seems to be some logic that only aggregates a certain minimum of execution times.

I know that this is very complex machinery and my guess is that, considering it is a new feature, you are looking into ways to make it even more useful. My 2p would be: stop collecting data from micro-methods that are pure. Even with line-level timings on, that won't make sense in whatever language anyway.

Quote:
Do you notice much difference with the 'Detect stack overflow' setting disabled or the 'Analyse line execution times' disabled?

I did notice in the added instrumentation that indeed the stack-overflow stubs are also added to the tiniest (pure) methods. One optimization would be to somehow stop doing that when a method is pure (that is: when it doesn't call into any other method), or when a method is merely a wrapper, I mean, its arguments are passed immediately to the next method unchanged (JIT detects this and lifts that method, so it doesn't add to the stack, hence "detect SO" is redundant in that case, but this may be much trickier than detecting a pure method).

I will experiment a bit more with the settings.

Quote:
Are you able to isolate this? We do have some special handling around tail methods ... maybe we're getting something wrong..

It seems to be flaky, sometimes it happens, sometimes it doesn't, regardless the settings. When I isolate the project where it happens, I can't repro it. The method where it goes wrong has multiple exit points, but all are in tail position (i.e., at ends of if, elif, else and switch-case statements).

Quote:
You'll find it much easier to extract timing data using the 'Execution Steps' pane in the Tests Window

I thought these timings were off, but now I see that "32.547 ms" means 32.5s, not 32.5ms. The dot is confusing and is a European dot, apparently. I've misinterpreted that in the past... my bad.
abelb
#25 Posted : Monday, March 16, 2020 12:10:32 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 9/12/2014(UTC)
Posts: 155
Location: Netherlands

Thanks: 19 times
Was thanked: 11 time(s) in 11 post(s)
(edited), the line-level timings are shown in ms, and I understand that they don't add up (they are cumulative for the line and everything after), but that would mean that the largest timing should be the entry point, and that it ought to be somewhat similar to the total execution time of the whole test (at least when run sequentially, I'm not talking multi-threaded code here).

However, here's a test that runs in 7s, but the highest listed Hot Spot shows 431.214ms, which is 431s, right? Or does ms stand for micro-seconds? But then it would be way too short. Also, this line is not the entry-point, so I admit I don't fully understand how the timings are accumulated.



If I had any guess about it, the timings are accumulated, but if a certain line gets called many times, and execution time is small on each call, the rounding error becomes huge.

Note: if I exclude JIT (by running in an existing task runner process), the timings are consistently off like above.
abelb
#26 Posted : Monday, March 16, 2020 1:52:35 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 9/12/2014(UTC)
Posts: 155
Location: Netherlands

Thanks: 19 times
Was thanked: 11 time(s) in 11 post(s)
Interestingly, when I set the instrumentation settings courser, only on the outer projects, the timings are even more off, with the top method being close to 800.000ms, and total running time down to 5s (in itself logical, because there's now less overhead, in fact, without any overhead, it runs in ~2s).

The "big question" ;) is: how to interpret a hot spot showing ~800s when the whole test runs in 5s? What is the metric used for the first column in the hot spots?
Remco
#27 Posted : Tuesday, March 17, 2020 8:38:49 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 7,177

Thanks: 968 times
Was thanked: 1298 time(s) in 1203 post(s)
I'll try to explain in detail how NCrunch captures these times, in the hope that a deeper understanding of this may allow the data you're examining to make more sense in its context.

NCrunch captures line-level timings by measuring the wall-time of each line of code, in milliseconds (ms, 1000 in a second). This is done using Win32 GetTickCount which is called between every line of code.

This structure has no special handling for things like recursion, deep nested calls and tight loops of user code. As such, it's subject to a certain level of distortion depending on what you're measuring. Because GetTickCount and the instrumentation around it does have a performance cost, code that is of very low weight can become much heavier when instrumented, sometimes by as much as 10-1000x. For this reason, it can be worth using coverage suppression comments on performance critical algorithms that are relying heavily on your own code for their speed.

Recursion naturally increases the measurement geometrically because deeper nest levels are measured multiple times through the recursive stack. The deeper the recursion, the higher the distortion.

Another area where you're likely to see considerably distortion with F# is that due to the lack of granularity of the coverage capture (we measure at line level, not statement level), you're going to see distortion when multiple statements are presented on a single line of code. When those statements call one another (i.e. generated nested functions, lambdas), this can actually cause a very similar effect to the recursion distortion, as we end up capturing the wall-time of the top level caller and stacking this with the call time of the nested function.

It's likely that these issues aren't going to make a great deal of sense when you're looking at performance times, and I will accept that they could be considered buggy. The issue is that we have some serious limitations to deal with when passively capturing performance data at IL-level. Every consideration we code into the instrumentation has a direct and measurable impact on the performance of either the instrumentation system or the code executing under test. For example, identifying a 'special' method that should be excluded from instrumentation is a very expensive thing to achieve, because usually this involves deconstructing additional areas of the assembly, such as method attributes. This involves extra indexing, additional memory allocation, and in some cases expensive text matching on type names and attribute names.

An example of the above is ExcludeFromCodeCoverageAttribute. To know that a method is marked by this attribute, we need to completely index all attributes in the assembly, map these to their applicable methods, then perform an optimised comparison to find when a method implements the attribute, just on the off chance that it is there. Then we also need to do this at type level. For this reason, any assembly that contains a reference to this attribute is going to be instrumented slower than it would otherwise be. That's just the reality of IL processing. Everything has a cost. I would expect that trying to find ways to exclude specific F# methods from instrumentation may end up being more expensive than just instrumenting them in the first place, if we could even find way to do it reliably.

It's very plain that our handling for performance measurement in F# could be better. I would like it to be better. But it's not going to come cheap, and sadly the number of NCrunch users that use F# is very small compared to those using C#. I would instead recommend disabling the metrics in the context where they don't make sense, or perhaps take them with a grain of salt.
Remco
#28 Posted : Tuesday, March 17, 2020 8:39:58 AM(UTC)
Rank: NCrunch Developer

Groups: Administrators
Joined: 4/16/2011(UTC)
Posts: 7,177

Thanks: 968 times
Was thanked: 1298 time(s) in 1203 post(s)
Regarding the issue you've been experiencing with the CLR exception, I'm very interested in examining this if you can isolate it. It might be related to the execution order of your code. It's possible that churn mode could surface this, as it cycles the test runners using different execution sequences.
abelb
#29 Posted : Monday, March 30, 2020 1:23:17 PM(UTC)
Rank: Advanced Member

Groups: Registered
Joined: 9/12/2014(UTC)
Posts: 155
Location: Netherlands

Thanks: 19 times
Was thanked: 11 time(s) in 11 post(s)
I've meanwhile tried to isolate it a few times, but can't (yet) find a continuously reproducible way. Somehow it seems to matter whether I do a full reset with certain settings or not (i.e., with one set of settings, it raises the CLR exception always, with another it still raises, until I do a full reset and then it disappears. The inverse is also true, it remains absent if I change settings, until I do a full reset. Sometimes it disappears after appearing in one run, even if the offending dll is not touched).

Quote:
This is done using Win32 GetTickCount which is called between every line of code.

This is a very rough measurement, limited to 10-16ms precision (looking at the timings, it seems to be 16ms on my system). Wouldn't it be better to use something similar to System.Diagnostics.Stopwatch, which uses a high-resolution timer internally (unless not available)? The Stopwatch.ElapsedTicks is more precise than the Win32 GetTickCount. On many systems that gives a far higher accuracy (on my system, 0.3ms, or 307nanoseconds).

The very odd timings I see sometimes may well be due to what you are writing, though I would have expected some way of "startTicks - endTicks" (your wall-clock time), which, if proper rounding is applied, shouldn't ever give a total that's larger than 10x - 100x the overall running time of the whole test. If anything, the start-method should closely equal the running time of the test, and anything below that should yield lower times. Much like when running it through a profiler.

Quote:
An example of the above is ExcludeFromCodeCoverageAttribute.

Unf. in F# this is not a way to go, since until now you cannot apply attributes to let-bindings or local functions. But I understand your reasoning that it may end up being slower instead of faster.

Quote:
That's just the reality of IL processing. Everything has a cost. I would expect that trying to find ways to exclude specific F# methods from instrumentation may end up being more expensive than just instrumenting them in the first place, if we could even find way to do it reliably.

Also very understandable. But my idea was more along these lines, for both F# and C#:

  • Given a source-code line with IL that is pure (no method calls) and has a certain max IL size (say, 16 bytes)
  • Mark the line as visited as usual
  • Do not mark the line for timings, default any timings to zero, and/or mark them as undefined
  • That way, the very valuable "this line was visited markers" remain, but the larger injections for timings can be skipped

I'd argue that this is a less-hard task to do than querying a more complex set of rules. It is somewhat similar to the JIT-inlining rules, but without its complexity. After all, the number of IL instructions that are expensive and/or that are calls (call, callvirt etc) is relatively small.

But I sympathise with this not being your highest priority right now.

On a related note, I noticed something odd when I checked the Execution Steps. Sometimes, it appears to build a project multiple times, which, if true, would explain at least in part why I see a performance degradation. However, this doesn't always happen, only sometimes:

Users browsing this topic
Guest
2 Pages<12
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

YAF | YAF © 2003-2011, Yet Another Forum.NET
This page was generated in 0.088 seconds.
Trial NCrunch
Take NCrunch for a spin
Do your fingers a favour and supercharge your testing workflow
Free Download