There will certainly be situations where such a filter would give inconsistent results. Another case could be tests where the engine is not yet aware of their normal execution time (i.e. they've never been run before).
I'd expect whoever uses this to be very broad in how they apply it. For example, making sure that they consider a long running test to be measured in minutes rather than seconds, as variations in run time might result in the test being categorised differently.
NCrunch does already do some manipulation of test run times around debugging, as tests being stopped by a debugger will have an abnormal execution time.
There will always be situations where variable test run times would make this feature completely impractical. I think it should be left to the user to decide if their solution is suitable for such a filter. In the worst case scenario, they may need to occasionally run some tests manually.