-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Add optimized String.Join for IList<string> #44032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nits, but logic overall LGTM.
@@ -629,6 +629,10 @@ public static string Join(string? separator, IEnumerable<string?> values) | |||
{ | |||
throw new ArgumentNullException(nameof(values)); | |||
} | |||
if (values is IList<string?> valuesIList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider instead checking IReadOnlyList<T>
. (List<T>
and T[]
both implement IReadOnlyList<T>
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the other thread I am now checking for both List and T[] so we can convert to ROS.
@@ -761,7 +777,7 @@ private static unsafe string JoinCore<T>(char* separator, int separatorLength, I | |||
} | |||
} | |||
|
|||
private static unsafe string JoinCore(char* separator, int separatorLength, string?[] value, int startIndex, int count) | |||
private static unsafe string JoinCore(char* separator, int separatorLength, IList<string?> value, int startIndex, int count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you benchmarked this? In theory this could regress existing usage with arrays, as now every access is via an interface dispatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pinged John offline with tips on using our performance repo's benchmarking suite. John - if you have any other questions re: benchmarking feel free to drop them in the conversation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does appear to be slower, do you see a way to maintain the performance without duplicating this method?
Method | Toolchain | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ArrayTest | new | 85.41 ns | 5.792 ns | 6.438 ns | 85.05 ns | 74.32 ns | 96.83 ns | 1.00 | 0.00 | 0.0185 | - | - | 80 B |
ArrayTest | old | 49.66 ns | 1.659 ns | 1.775 ns | 49.87 ns | 47.21 ns | 52.95 ns | 0.58 | 0.04 | 0.0185 | - | - | 80 B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wanted to special-case string[]
and List<string>
we could convert both to ReadOnlySpan<string>
, which should have array-equivalent performance characteristics. That doesn't help other concrete types that implement IList<string>
, though. (Is this common?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like Levi's suggestion (and probably just for List<string>
, since I think it's unlikely you're in a situation where you're getting a string[]
strongly typed as an IEnumerable<string>
and care about this level of optimization). Is there another input that's inspiring you here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the ROS<string?>
solution would look something like this:
private static void JoinCore(..., ReadOnlySpan<string?> values, ...) { /* ... */ }
IEnumerable<string?> enumerable = ...;
if (enumerable is string?[] arr) { return JoinCore(..., arr, 0, arr.Length); }
else if (enumerable is List<string?> list) { return JoinCore(..., CollectionsMarshal.AsSpan(list), 0, list.Count); }
else { /* fall back to existing enum-based logic */ }
This takes advantage of the fact that T[]
is implicitly convertible to ROS<T>
, and List<T>
is convertible to ROS<T>
via the CollectionsMarshal.AsSpan
API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (enumerable is string?[] arr) { return JoinCore(..., arr, 0, arr.Length); }
Is this check even necessary? As noted, I think it's unlikely you're in a situation where you're getting a string[]
strongly typed as an IEnumerable<string>
and care about this level of optimization. If you have a strongly-typed string[]
, you'll bind to the existing string[]
-based overload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in my teams case we often have both arrays and lists passed as IList. I'm now realizing that this isn't as performant as having it kept as T[] all the way back. I agree that having a string[] typed as a IEnumerable is cases it's types as anything other than string[] there will be a significant perf improvement. I'm happy to remove the enumerable is string?[] arr
case, let me know what you'd like to do.
Method | Job | Toolchain | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD |
---|---|---|---|---|---|---|---|---|---|---|
ArrayTest | Job-EOKGEP | old | 49.50 ns | 2.105 ns | 2.252 ns | 49.43 ns | 45.51 ns | 53.95 ns | 1.00 | 0.00 |
ArrayTest | Job-TTJIGM | new | 49.90 ns | 1.592 ns | 1.833 ns | 50.21 ns | 46.79 ns | 52.93 ns | 1.01 | 0.06 |
List | Job-EOKGEP | old | 125.01 ns | 4.675 ns | 5.196 ns | 123.35 ns | 118.42 ns | 137.19 ns | 1.00 | 0.00 |
List | Job-TTJIGM | new | 65.14 ns | 2.381 ns | 2.742 ns | 65.01 ns | 60.65 ns | 70.58 ns | 0.52 | 0.02 |
ArrayAsIList | Job-EOKGEP | old | 88.90 ns | 2.787 ns | 2.983 ns | 88.37 ns | 84.83 ns | 95.60 ns | 1.00 | 0.00 |
ArrayAsIList | Job-TTJIGM | new | 54.17 ns | 1.419 ns | 1.577 ns | 54.04 ns | 51.30 ns | 57.32 ns | 0.61 | 0.03 |
IEnumerable | Job-EOKGEP | old | 206.12 ns | 17.513 ns | 20.168 ns | 202.55 ns | 177.56 ns | 240.59 ns | 1.00 | 0.00 |
IEnumerable | Job-TTJIGM | new | 190.81 ns | 6.470 ns | 7.451 ns | 192.75 ns | 177.33 ns | 201.28 ns | 0.93 | 0.09 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These numbers look good to me. I don't really have any other feedback on the code. @stephentoub - did the latest iteration address your concerns?
(There's a little more cleanup we can do, such as nixing the startIndex and count parameters and removing some uses of unsafe, but that's best left for a followup PR.)
src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs
Outdated
Show resolved
Hide resolved
@@ -761,7 +784,7 @@ private static unsafe string JoinCore<T>(char* separator, int separatorLength, I | |||
} | |||
} | |||
|
|||
private static unsafe string JoinCore(char* separator, int separatorLength, string?[] value, int startIndex, int count) | |||
private static unsafe string JoinCore(char* separator, int separatorLength, ReadOnlySpan<string?> value, int startIndex, int count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this takes a span, are startIndex and count needed? Callers can slice if necessary. And this implementation will then be faster as it'll be able to iterate over the full size of the span and bounds checks will be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In public static unsafe string Join(char separator, string?[] value, int startIndex, int count)
would i do value.AsSpan(startIndex, count)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
And now that I look at the implementation more carefully to answer your question, I'm realizing the change still needs further work, anyway. Spans can never be null, so the change to the type of value
is removing a null check that should have been there for the public static unsafe string Join(string? separator, string?[] value, int startIndex, int count)
overload. By doing the AsSpan, we'd similarly need to move the argument validation for startIndex and count to the relevant entry point(s) that need the arguments validated.
(Seems then as well like we're missing a test that would have caught that null check?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move this and other public overloads to pass down the value.AsSpan, cleanup the JoinCore implementation, and benchmark again to verify there are no regressions. Do we want to move the argument validation of startIndex and count to the public overloads even though value.ToSpan(startIndex, count)
is doing the same validations (with less helpful error messages)?
Regarding the missing test there is already AssertExtensions.Throws<ArgumentNullException>("value", () => string.Join('|', (string[])null, 0, 0));
I'll RC why this wasn't failing.
I suppose I'll dissent a little bit. Per my earlier comment, I don't think this PR is the right place for the That said, the PR as it stands now looks fine, but if we want to address concerns like bounds check elision and removal of unsafe code we'll still need to submit a follow-up. |
I wasn't suggesting it for perf gains; bounds checking elisions is a bonus. Rather, than change is adding complexity, and it's the antithesis of span to pass one and also pass bounds. In switching to span, we shouldn't incur debt to be cleaned up later. It also highlighted that the change to use span was erroneous without moving additional argument validation, which is part of the refactoring to which you refer. |
Yes. Though I'd be fine if we did it as part of this change, too. It's all touching the same code; a split is a bit artificial. But either way. |
Thanks, @johnthcall. I'll follow-up on my comments. |
src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs
Outdated
Show resolved
Hide resolved
@@ -625,6 +642,16 @@ public static unsafe string Join<T>(string? separator, IEnumerable<T> values) | |||
|
|||
public static string Join(string? separator, IEnumerable<string?> values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we consider also doing these checks in the:
public static string Join(char separator, IEnumerable<T> values)
overload? I'm a little concerned now without that we've created a cliff. Imagine someone writes:
string result = string.Join(",", strings);
and then realizes they can reduce that to:
string result = string.Join(',', strings);
They just lost this optimization.
When joining IEnumerable check whether it is a IList and if so go through the path used for string[].
Fixes #44027