-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Streamline MemoryExtension Trim and Trim(char) by removing calls to TrimStart/End and avoiding unnecessary Slice. #19959
Conversation
@dotnet-bot test Ubuntu x64 Checked CoreFX Tests |
You can take every other method in the framework, mark with AggressivelyInlining, and show that that it is improving perf using microbenchmark. I do not think we should be doing that - the microbenchmark are not capturing the code bloat that this is producing and that will slow down real world scenarions. |
Good point. I have generally seen diminishing returns (even in microbenchmarks) when adding this attribute unnecessarily. However, the perf delta seemed significant here.
I tried benchmarking a slightly more advance scenario (return average line length of all source files within corefx - FYI, it's 32): Similarly, I benchmarked parsing a large list of strings as double (mostly infinity/-infinity/NaN) and saw a 15% improvement there as well (66 vs 57 ms).
Do you have some real world scenarios in mind (rather than the ones I made up) that heavily use Trim so I can measure performance while taking the code bloat into account? Otherwise, I will wait for ML.NET to consume this API and measure then: |
This is a good plan. If you can demonstrate that applying AggressiveInlining here materially improves ML.NET performance, we can reconsider. I see ML.NET using the Trim methods in pretty big and complex methods (for example: https://github.com/dotnet/machinelearning/pull/863/files#diff-4f49b4fddbc210a67cbeb1678bbaf21cR775). Big methods like this will suffer from register pressure. I can bet money on that force-inlining more non-trivial methods like |
@ahsonkhan Sure, I would be happy to approve such PR ;) |
I wasn't able to observe any significant improvement or regression since a lot of their workloads are on the order of seconds and the cost/savings of trimming isn't material (hides in the noise). Isolating the methods that use Trim isn't easy.
Note, however, I am not suggesting we mark Trim itself as AggressiveInline, just the helper methods TrimStart/End (the implementation detail of Trim). It just happens to be that those APIs are public as well. I couldn't find example complex methods that use Trim heavily elsewhere. The samples I wrote to try to mimic use of this API benefited from this change. I compared the following method to one that uses TrimStart and one that doesn't trim at all. We go from 23% overhead of trimming, down to 12% overhead. TrimStart overhead jumps from 4% to 6%. private static (double, long) ReturnAvgLengthTrim(List<string> allFiles)
{
long length = 0;
long trimAmount = 0;
foreach (string file in allFiles)
{
ReadOnlySpan<char> span = file.AsSpan();
int prevLength = span.Length;
span = span.Trim();
trimAmount += prevLength - span.Length;
int index = span.IndexOf("\\\\".AsSpan());
if (index != -1)
{
span = span.Slice(0, index);
prevLength = span.Length;
span = span.Trim();
trimAmount += prevLength - span.Length;
}
if (!span.IsWhiteSpace())
length += span.Length;
if (span.Length > 0)
{
char first = span[0];
switch (first)
{
case 'f':
trimAmount++;
break;
case 'i':
trimAmount--;
break;
case 'v':
length++;
break;
case 'r':
length--;
break;
}
}
}
return (length / allFiles.Count, trimAmount);
} To your point, we do slightly regress performance when inlining TrimStart. We could create internal copies of these methods, mark them as AI, as implementation details of Trim, and remove the AI attribute from the public TrimStart/End APIs. @jkotas, thoughts? Should we close this PR, regardless? |
Trim is trivial method, so the JIT will be likely to inline it most of the time. Once it inlines it, it will see the TrimStart/End methods marked with AggresiveInlining and it will try hard to inline those as well because of AggresiveInlining flag told it to do so. If you just want to make the non-inlined Trim better, you should change it to streamlined implementation. Something like this: ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar)
{
int start = 0;
for (; start < span.Length; start++)
{
if (span[start] != trimChar)
break;
}
int end = span.Length - 1;
for (; end >= start; end--)
{
if (span[end] != trimChar)
break;
}
return span.Slice(start, end - start + 1);
} It will avoid the extra call and it will also avoid other minor inefficiencies from passing the arguments around and calling Slice twice instead of just once. |
Good point.
Made this change. Thanks for the feedback. |
Before:
After:
cc @jkotas, @AndyAyersMS
@adamsitnik, can we do something about how string data shows up in the BDN output for GitHub so its clearly visible, especially if it contains white space? Surrounding it with quotes might help (I manually formatted the output in the "Before" case above).