Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Streamline MemoryExtension Trim and Trim(char) by removing calls to TrimStart/End and avoiding unnecessary Slice. #19959

Merged
merged 2 commits into from
Sep 22, 2018

Conversation

ahsonkhan
Copy link

@ahsonkhan ahsonkhan commented Sep 14, 2018

BenchmarkDotNet=v0.11.1.739-nightly, OS=Windows 10.0.17744
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
  [Host] : .NET Core ? (CoreCLR 4.6.26913.07, CoreFX 4.6.26913.0), 64bit RyuJIT

Job=InProcess  Toolchain=InProcessToolchain  

Before:

Method Data Mean Error StdDev Allocated
Trim "" 13.138 ns 0.3064 ns 0.5904 ns 0 B
TrimStartEnd "" 12.525 ns 0.2815 ns 0.3351 ns 0 B
TrimChar "" 10.401 ns 0.2290 ns 0.2638 ns 0 B
TrimStartEndChar "" 9.809 ns 0.2176 ns 0.2138 ns 0 B
Trim "   abcdefg   " 22.113 ns 0.4640 ns 0.4340 ns 0 B
TrimStartEnd "   abcdefg   " 21.659 ns 0.3144 ns 0.2626 ns 0 B
TrimChar "   abcdefg   " 15.874 ns 0.1309 ns 0.1160 ns 0 B
TrimStartEndChar "   abcdefg   " 15.644 ns 0.3407 ns 0.2845 ns 0 B
Trim "  abcdefg  " 19.968 ns 0.0545 ns 0.0425 ns 0 B
TrimStartEnd "  abcdefg  " 20.067 ns 0.1905 ns 0.1689 ns 0 B
TrimChar "  abcdefg  " 14.405 ns 0.1835 ns 0.1717 ns 0 B
TrimStartEndChar "  abcdefg  " 14.458 ns 0.1644 ns 0.1457 ns 0 B
Trim " abcdefg " 18.606 ns 0.3105 ns 0.2752 ns 0 B
TrimStartEnd " abcdefg " 18.802 ns 0.2130 ns 0.1888 ns 0 B
TrimChar " abcdefg " 12.374 ns 0.1894 ns 0.1772 ns 0 B
TrimStartEndChar " abcdefg " 12.842 ns 0.2829 ns 0.3966 ns 0 B
Trim "abcdefg" 15.090 ns 0.3258 ns 0.3200 ns 0 B
TrimStartEnd "abcdefg" 14.909 ns 0.3049 ns 0.2852 ns 0 B
TrimChar "abcdefg" 10.989 ns 0.2348 ns 0.2197 ns 0 B
TrimStartEndChar "abcdefg" 11.225 ns 0.1960 ns 0.1834 ns 0 B

After:

Method Data Mean Error StdDev Allocated
Trim **** 1.592 ns 0.0404 ns 0.0337 ns 0 B
TrimStartEnd 1.642 ns 0.0098 ns 0.0082 ns 0 B
TrimChar 1.120 ns 0.0394 ns 0.0308 ns 0 B
TrimStartEndChar 1.093 ns 0.0511 ns 0.0502 ns 0 B
Trim ** abcdefg ** 11.772 ns 0.0684 ns 0.0639 ns 0 B
TrimStartEnd abcdefg 12.366 ns 0.3084 ns 0.3428 ns 0 B
TrimChar abcdefg 5.810 ns 0.0442 ns 0.0413 ns 0 B
TrimStartEndChar abcdefg 5.961 ns 0.0849 ns 0.0753 ns 0 B
Trim ** abcdefg ** 9.552 ns 0.1697 ns 0.1417 ns 0 B
TrimStartEnd abcdefg 10.650 ns 0.1403 ns 0.1312 ns 0 B
TrimChar abcdefg 4.984 ns 0.0738 ns 0.0654 ns 0 B
TrimStartEndChar abcdefg 4.467 ns 0.1236 ns 0.1850 ns 0 B
Trim ** abcdefg ** 7.277 ns 0.0279 ns 0.0233 ns 0 B
TrimStartEnd abcdefg 7.771 ns 0.1833 ns 0.1882 ns 0 B
TrimChar abcdefg 4.007 ns 0.0726 ns 0.0679 ns 0 B
TrimStartEndChar abcdefg 3.283 ns 0.0475 ns 0.0421 ns 0 B
Trim abcdefg 4.133 ns 0.0746 ns 0.0661 ns 0 B
TrimStartEnd abcdefg 3.947 ns 0.0812 ns 0.0678 ns 0 B
TrimChar abcdefg 1.896 ns 0.0609 ns 0.0570 ns 0 B
TrimStartEndChar abcdefg 1.649 ns 0.0409 ns 0.0341 ns 0 B

cc @jkotas, @AndyAyersMS

@adamsitnik, can we do something about how string data shows up in the BDN output for GitHub so its clearly visible, especially if it contains white space? Surrounding it with quotes might help (I manually formatted the output in the "Before" case above).

[MemoryDiagnoser]
[InProcess]
public class Benchmark
{
    private static void Main(string[] args)
    {
        Summary summary = BenchmarkRunner.Run<Benchmark>();
    }

    [Params("", " abcdefg ", "  abcdefg  ", "   abcdefg   ", "abcdefg")]
    public string Data;

    [Benchmark]
    public int Trim() => Data.AsSpan().Trim().Length;

    [Benchmark]
    public int TrimStartEnd() => Data.AsSpan().TrimStart().TrimEnd().Length;

    [Benchmark]
    public int TrimChar() => Data.AsSpan().Trim(' ').Length;

    [Benchmark]
    public int TrimStartEndChar() => Data.AsSpan().TrimStart(' ').TrimEnd(' ').Length;
}

@ahsonkhan
Copy link
Author

@dotnet-bot test Ubuntu x64 Checked CoreFX Tests

@jkotas
Copy link
Member

jkotas commented Sep 14, 2018

You can take every other method in the framework, mark with AggressivelyInlining, and show that that it is improving perf using microbenchmark. I do not think we should be doing that - the microbenchmark are not capturing the code bloat that this is producing and that will slow down real world scenarions.

@ahsonkhan
Copy link
Author

You can take every other method in the framework, mark with AggressivelyInlining, and show that that it is improving perf using microbenchmark.

Good point. I have generally seen diminishing returns (even in microbenchmarks) when adding this attribute unnecessarily. However, the perf delta seemed significant here.

show that that it is improving perf using microbenchmark

I tried benchmarking a slightly more advance scenario (return average line length of all source files within corefx - FYI, it's 32):
104 ms vs 80 ms

Similarly, I benchmarked parsing a large list of strings as double (mostly infinity/-infinity/NaN) and saw a 15% improvement there as well (66 vs 57 ms).

I do not think we should be doing that - the microbenchmark are not capturing the code bloat that this is producing and that will slow down real world scenarios.

Do you have some real world scenarios in mind (rather than the ones I made up) that heavily use Trim so I can measure performance while taking the code bloat into account? Otherwise, I will wait for ML.NET to consume this API and measure then:
https://github.com/dotnet/machinelearning/pull/863/files#r217577338

@jkotas
Copy link
Member

jkotas commented Sep 14, 2018

I will wait for ML.NET to consume this API and measure

This is a good plan. If you can demonstrate that applying AggressiveInlining here materially improves ML.NET performance, we can reconsider.

I see ML.NET using the Trim methods in pretty big and complex methods (for example: https://github.com/dotnet/machinelearning/pull/863/files#diff-4f49b4fddbc210a67cbeb1678bbaf21cR775). Big methods like this will suffer from register pressure. I can bet money on that force-inlining more non-trivial methods like TrimWhiteSpace into already big methods is going to regress the performance, not improve it.

@adamsitnik
Copy link
Member

@adamsitnik, can we do something about how string data shows up in the BDN output for GitHub so its clearly visible, especially if it contains white space? Surrounding it with quotes might help (I manually formatted the output in the "Before" case above).

@ahsonkhan Sure, I would be happy to approve such PR ;)

@ahsonkhan
Copy link
Author

This is a good plan. If you can demonstrate that applying AggressiveInlining here materially improves ML.NET performance, we can reconsider.

I wasn't able to observe any significant improvement or regression since a lot of their workloads are on the order of seconds and the cost/savings of trimming isn't material (hides in the noise). Isolating the methods that use Trim isn't easy.

I see ML.NET using the Trim methods in pretty big and complex methods (for example: https://github.com/dotnet/machinelearning/pull/863/files#diff-4f49b4fddbc210a67cbeb1678bbaf21cR775). Big methods like this will suffer from register pressure. I can bet money on that force-inlining more non-trivial methods like TrimWhiteSpace into already big methods is going to regress the performance, not improve it.

Note, however, I am not suggesting we mark Trim itself as AggressiveInline, just the helper methods TrimStart/End (the implementation detail of Trim). It just happens to be that those APIs are public as well.
Since Trim isn't marked as AI, large/complex methods calling Trim shouldn't be negatively affected.

I couldn't find example complex methods that use Trim heavily elsewhere. The samples I wrote to try to mimic use of this API benefited from this change.

I compared the following method to one that uses TrimStart and one that doesn't trim at all. We go from 23% overhead of trimming, down to 12% overhead. TrimStart overhead jumps from 4% to 6%.

private static (double, long) ReturnAvgLengthTrim(List<string> allFiles)
{
    long length = 0;
    long trimAmount = 0;
    foreach (string file in allFiles)
    {
        ReadOnlySpan<char> span = file.AsSpan();
        int prevLength = span.Length;
        span = span.Trim();
        trimAmount += prevLength - span.Length;
        int index = span.IndexOf("\\\\".AsSpan());
        if (index != -1)
        {
            span = span.Slice(0, index);
            prevLength = span.Length;
            span = span.Trim();
            trimAmount += prevLength - span.Length;
        }

        if (!span.IsWhiteSpace())
            length += span.Length;

        if (span.Length > 0)
        {
            char first = span[0];
            switch (first)
            {
                case 'f':
                    trimAmount++;
                    break;
                case 'i':
                    trimAmount--;
                    break;
                case 'v':
                    length++;
                    break;
                case 'r':
                    length--;
                    break;
            }
        }
    }
    return (length / allFiles.Count, trimAmount);
}

To your point, we do slightly regress performance when inlining TrimStart. We could create internal copies of these methods, mark them as AI, as implementation details of Trim, and remove the AI attribute from the public TrimStart/End APIs.

@jkotas, thoughts? Should we close this PR, regardless?

@jkotas
Copy link
Member

jkotas commented Sep 20, 2018

Trim isn't marked as AI, large/complex methods calling Trim shouldn't be negatively affected.

Trim is trivial method, so the JIT will be likely to inline it most of the time. Once it inlines it, it will see the TrimStart/End methods marked with AggresiveInlining and it will try hard to inline those as well because of AggresiveInlining flag told it to do so.

If you just want to make the non-inlined Trim better, you should change it to streamlined implementation. Something like this:

ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar)
{
    int start = 0;
    for (; start < span.Length; start++)
    {
        if (span[start] != trimChar)
            break;
    }
    int end = span.Length - 1;
    for (; end >= start; end--)
    {
        if (span[end] != trimChar)
            break;
    }
    return span.Slice(start, end - start + 1);
}

It will avoid the extra call and it will also avoid other minor inefficiencies from passing the arguments around and calling Slice twice instead of just once.

@ahsonkhan
Copy link
Author

Trim is trivial method, so the JIT will be likely to inline it most of the time.

Good point.

If you just want to make the non-inlined Trim better, you should change it to streamlined implementation.

Made this change.

Thanks for the feedback.

@ahsonkhan ahsonkhan changed the title Mark TrimStart and TrimEnd as Aggressively Inline to improve perf Streamline MemoryExtension Trim and Trim(char) by removing calls to TrimStart/End and avoiding unnecessary Slice. Sep 20, 2018
@ahsonkhan ahsonkhan merged commit 1fd5cae into dotnet:master Sep 22, 2018
@ahsonkhan ahsonkhan deleted the InlineTrim branch September 22, 2018 00:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants