Streamline MemoryExtension Trim and Trim(char) by removing calls to TrimStart/End and avoiding unnecessary Slice. #19959

ahsonkhan · 2018-09-14T01:05:30Z

BenchmarkDotNet=v0.11.1.739-nightly, OS=Windows 10.0.17744
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
  [Host] : .NET Core ? (CoreCLR 4.6.26913.07, CoreFX 4.6.26913.0), 64bit RyuJIT

Job=InProcess  Toolchain=InProcessToolchain

Before:

Method	Data	Mean	Error	StdDev	Allocated
Trim	""	13.138 ns	0.3064 ns	0.5904 ns	0 B
TrimStartEnd	""	12.525 ns	0.2815 ns	0.3351 ns	0 B
TrimChar	""	10.401 ns	0.2290 ns	0.2638 ns	0 B
TrimStartEndChar	""	9.809 ns	0.2176 ns	0.2138 ns	0 B
Trim	" abcdefg "	22.113 ns	0.4640 ns	0.4340 ns	0 B
TrimStartEnd	" abcdefg "	21.659 ns	0.3144 ns	0.2626 ns	0 B
TrimChar	" abcdefg "	15.874 ns	0.1309 ns	0.1160 ns	0 B
TrimStartEndChar	" abcdefg "	15.644 ns	0.3407 ns	0.2845 ns	0 B
Trim	" abcdefg "	19.968 ns	0.0545 ns	0.0425 ns	0 B
TrimStartEnd	" abcdefg "	20.067 ns	0.1905 ns	0.1689 ns	0 B
TrimChar	" abcdefg "	14.405 ns	0.1835 ns	0.1717 ns	0 B
TrimStartEndChar	" abcdefg "	14.458 ns	0.1644 ns	0.1457 ns	0 B
Trim	" abcdefg "	18.606 ns	0.3105 ns	0.2752 ns	0 B
TrimStartEnd	" abcdefg "	18.802 ns	0.2130 ns	0.1888 ns	0 B
TrimChar	" abcdefg "	12.374 ns	0.1894 ns	0.1772 ns	0 B
TrimStartEndChar	" abcdefg "	12.842 ns	0.2829 ns	0.3966 ns	0 B
Trim	"abcdefg"	15.090 ns	0.3258 ns	0.3200 ns	0 B
TrimStartEnd	"abcdefg"	14.909 ns	0.3049 ns	0.2852 ns	0 B
TrimChar	"abcdefg"	10.989 ns	0.2348 ns	0.2197 ns	0 B
TrimStartEndChar	"abcdefg"	11.225 ns	0.1960 ns	0.1834 ns	0 B

After:

Method	Data	Mean	Error	StdDev	Allocated
Trim	****	1.592 ns	0.0404 ns	0.0337 ns	0 B
TrimStartEnd		1.642 ns	0.0098 ns	0.0082 ns	0 B
TrimChar		1.120 ns	0.0394 ns	0.0308 ns	0 B
TrimStartEndChar		1.093 ns	0.0511 ns	0.0502 ns	0 B
Trim	abcdefg	11.772 ns	0.0684 ns	0.0639 ns	0 B
TrimStartEnd	abcdefg	12.366 ns	0.3084 ns	0.3428 ns	0 B
TrimChar	abcdefg	5.810 ns	0.0442 ns	0.0413 ns	0 B
TrimStartEndChar	abcdefg	5.961 ns	0.0849 ns	0.0753 ns	0 B
Trim	abcdefg	9.552 ns	0.1697 ns	0.1417 ns	0 B
TrimStartEnd	abcdefg	10.650 ns	0.1403 ns	0.1312 ns	0 B
TrimChar	abcdefg	4.984 ns	0.0738 ns	0.0654 ns	0 B
TrimStartEndChar	abcdefg	4.467 ns	0.1236 ns	0.1850 ns	0 B
Trim	abcdefg	7.277 ns	0.0279 ns	0.0233 ns	0 B
TrimStartEnd	abcdefg	7.771 ns	0.1833 ns	0.1882 ns	0 B
TrimChar	abcdefg	4.007 ns	0.0726 ns	0.0679 ns	0 B
TrimStartEndChar	abcdefg	3.283 ns	0.0475 ns	0.0421 ns	0 B
Trim	abcdefg	4.133 ns	0.0746 ns	0.0661 ns	0 B
TrimStartEnd	abcdefg	3.947 ns	0.0812 ns	0.0678 ns	0 B
TrimChar	abcdefg	1.896 ns	0.0609 ns	0.0570 ns	0 B
TrimStartEndChar	abcdefg	1.649 ns	0.0409 ns	0.0341 ns	0 B

cc @jkotas, @AndyAyersMS

@adamsitnik, can we do something about how string data shows up in the BDN output for GitHub so its clearly visible, especially if it contains white space? Surrounding it with quotes might help (I manually formatted the output in the "Before" case above).

[MemoryDiagnoser]
[InProcess]
public class Benchmark
{
    private static void Main(string[] args)
    {
        Summary summary = BenchmarkRunner.Run<Benchmark>();
    }

    [Params("", " abcdefg ", "  abcdefg  ", "   abcdefg   ", "abcdefg")]
    public string Data;

    [Benchmark]
    public int Trim() => Data.AsSpan().Trim().Length;

    [Benchmark]
    public int TrimStartEnd() => Data.AsSpan().TrimStart().TrimEnd().Length;

    [Benchmark]
    public int TrimChar() => Data.AsSpan().Trim(' ').Length;

    [Benchmark]
    public int TrimStartEndChar() => Data.AsSpan().TrimStart(' ').TrimEnd(' ').Length;
}

ahsonkhan · 2018-09-14T01:08:40Z

@dotnet-bot test Ubuntu x64 Checked CoreFX Tests

jkotas · 2018-09-14T01:10:46Z

You can take every other method in the framework, mark with AggressivelyInlining, and show that that it is improving perf using microbenchmark. I do not think we should be doing that - the microbenchmark are not capturing the code bloat that this is producing and that will slow down real world scenarions.

ahsonkhan · 2018-09-14T02:24:53Z

You can take every other method in the framework, mark with AggressivelyInlining, and show that that it is improving perf using microbenchmark.

Good point. I have generally seen diminishing returns (even in microbenchmarks) when adding this attribute unnecessarily. However, the perf delta seemed significant here.

show that that it is improving perf using microbenchmark

I tried benchmarking a slightly more advance scenario (return average line length of all source files within corefx - FYI, it's 32):
104 ms vs 80 ms

Similarly, I benchmarked parsing a large list of strings as double (mostly infinity/-infinity/NaN) and saw a 15% improvement there as well (66 vs 57 ms).

I do not think we should be doing that - the microbenchmark are not capturing the code bloat that this is producing and that will slow down real world scenarios.

Do you have some real world scenarios in mind (rather than the ones I made up) that heavily use Trim so I can measure performance while taking the code bloat into account? Otherwise, I will wait for ML.NET to consume this API and measure then:
https://github.com/dotnet/machinelearning/pull/863/files#r217577338

jkotas · 2018-09-14T05:08:20Z

I will wait for ML.NET to consume this API and measure

This is a good plan. If you can demonstrate that applying AggressiveInlining here materially improves ML.NET performance, we can reconsider.

I see ML.NET using the Trim methods in pretty big and complex methods (for example: https://github.com/dotnet/machinelearning/pull/863/files#diff-4f49b4fddbc210a67cbeb1678bbaf21cR775). Big methods like this will suffer from register pressure. I can bet money on that force-inlining more non-trivial methods like TrimWhiteSpace into already big methods is going to regress the performance, not improve it.

adamsitnik · 2018-09-17T05:47:45Z

@adamsitnik, can we do something about how string data shows up in the BDN output for GitHub so its clearly visible, especially if it contains white space? Surrounding it with quotes might help (I manually formatted the output in the "Before" case above).

@ahsonkhan Sure, I would be happy to approve such PR ;)

ahsonkhan · 2018-09-20T03:25:51Z

This is a good plan. If you can demonstrate that applying AggressiveInlining here materially improves ML.NET performance, we can reconsider.

I wasn't able to observe any significant improvement or regression since a lot of their workloads are on the order of seconds and the cost/savings of trimming isn't material (hides in the noise). Isolating the methods that use Trim isn't easy.

I see ML.NET using the Trim methods in pretty big and complex methods (for example: https://github.com/dotnet/machinelearning/pull/863/files#diff-4f49b4fddbc210a67cbeb1678bbaf21cR775). Big methods like this will suffer from register pressure. I can bet money on that force-inlining more non-trivial methods like TrimWhiteSpace into already big methods is going to regress the performance, not improve it.

Note, however, I am not suggesting we mark Trim itself as AggressiveInline, just the helper methods TrimStart/End (the implementation detail of Trim). It just happens to be that those APIs are public as well.
Since Trim isn't marked as AI, large/complex methods calling Trim shouldn't be negatively affected.

I couldn't find example complex methods that use Trim heavily elsewhere. The samples I wrote to try to mimic use of this API benefited from this change.

I compared the following method to one that uses TrimStart and one that doesn't trim at all. We go from 23% overhead of trimming, down to 12% overhead. TrimStart overhead jumps from 4% to 6%.

private static (double, long) ReturnAvgLengthTrim(List<string> allFiles)
{
    long length = 0;
    long trimAmount = 0;
    foreach (string file in allFiles)
    {
        ReadOnlySpan<char> span = file.AsSpan();
        int prevLength = span.Length;
        span = span.Trim();
        trimAmount += prevLength - span.Length;
        int index = span.IndexOf("\\\\".AsSpan());
        if (index != -1)
        {
            span = span.Slice(0, index);
            prevLength = span.Length;
            span = span.Trim();
            trimAmount += prevLength - span.Length;
        }

        if (!span.IsWhiteSpace())
            length += span.Length;

        if (span.Length > 0)
        {
            char first = span[0];
            switch (first)
            {
                case 'f':
                    trimAmount++;
                    break;
                case 'i':
                    trimAmount--;
                    break;
                case 'v':
                    length++;
                    break;
                case 'r':
                    length--;
                    break;
            }
        }
    }
    return (length / allFiles.Count, trimAmount);
}

To your point, we do slightly regress performance when inlining TrimStart. We could create internal copies of these methods, mark them as AI, as implementation details of Trim, and remove the AI attribute from the public TrimStart/End APIs.

@jkotas, thoughts? Should we close this PR, regardless?

jkotas · 2018-09-20T05:02:50Z

Trim isn't marked as AI, large/complex methods calling Trim shouldn't be negatively affected.

Trim is trivial method, so the JIT will be likely to inline it most of the time. Once it inlines it, it will see the TrimStart/End methods marked with AggresiveInlining and it will try hard to inline those as well because of AggresiveInlining flag told it to do so.

If you just want to make the non-inlined Trim better, you should change it to streamlined implementation. Something like this:

ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar)
{
    int start = 0;
    for (; start < span.Length; start++)
    {
        if (span[start] != trimChar)
            break;
    }
    int end = span.Length - 1;
    for (; end >= start; end--)
    {
        if (span[end] != trimChar)
            break;
    }
    return span.Slice(start, end - start + 1);
}

It will avoid the extra call and it will also avoid other minor inefficiencies from passing the arguments around and calling Slice twice instead of just once.

ahsonkhan · 2018-09-20T22:34:11Z

Trim is trivial method, so the JIT will be likely to inline it most of the time.

Good point.

If you just want to make the non-inlined Trim better, you should change it to streamlined implementation.

Made this change.

Thanks for the feedback.

Mark TrimStart and TrimEnd as Aggressively Inline to improve perf

b11944b

ahsonkhan added the area-System.Memory label Sep 14, 2018

ahsonkhan self-assigned this Sep 14, 2018

Remove inlining attribute and streamline Trim

3f6025c

ahsonkhan changed the title ~~Mark TrimStart and TrimEnd as Aggressively Inline to improve perf~~ Streamline MemoryExtension Trim and Trim(char) by removing calls to TrimStart/End and avoiding unnecessary Slice. Sep 20, 2018

jkotas approved these changes Sep 20, 2018

View reviewed changes

ahsonkhan merged commit 1fd5cae into dotnet:master Sep 22, 2018

ahsonkhan deleted the InlineTrim branch September 22, 2018 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streamline MemoryExtension Trim and Trim(char) by removing calls to TrimStart/End and avoiding unnecessary Slice. #19959

Streamline MemoryExtension Trim and Trim(char) by removing calls to TrimStart/End and avoiding unnecessary Slice. #19959

Uh oh!

ahsonkhan commented Sep 14, 2018 •

edited

Loading

Uh oh!

ahsonkhan commented Sep 14, 2018

Uh oh!

jkotas commented Sep 14, 2018 •

edited

Loading

Uh oh!

ahsonkhan commented Sep 14, 2018

Uh oh!

jkotas commented Sep 14, 2018 •

edited

Loading

Uh oh!

adamsitnik commented Sep 17, 2018

Uh oh!

ahsonkhan commented Sep 20, 2018

Uh oh!

jkotas commented Sep 20, 2018

Uh oh!

ahsonkhan commented Sep 20, 2018

Uh oh!

Uh oh!

Streamline MemoryExtension Trim and Trim(char) by removing calls to TrimStart/End and avoiding unnecessary Slice. #19959

Streamline MemoryExtension Trim and Trim(char) by removing calls to TrimStart/End and avoiding unnecessary Slice. #19959

Uh oh!

Conversation

ahsonkhan commented Sep 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahsonkhan commented Sep 14, 2018

Uh oh!

jkotas commented Sep 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahsonkhan commented Sep 14, 2018

Uh oh!

jkotas commented Sep 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamsitnik commented Sep 17, 2018

Uh oh!

ahsonkhan commented Sep 20, 2018

Uh oh!

jkotas commented Sep 20, 2018

Uh oh!

ahsonkhan commented Sep 20, 2018

Uh oh!

Uh oh!

ahsonkhan commented Sep 14, 2018 •

edited

Loading

jkotas commented Sep 14, 2018 •

edited

Loading

jkotas commented Sep 14, 2018 •

edited

Loading