-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Improve Guid parsing performance #20183
Conversation
7ed133e
to
ee3bb66
Compare
@@ -22,17 +23,17 @@ public partial struct Guid : IFormattable, IComparable, IComparable<Guid>, IEqua | |||
//////////////////////////////////////////////////////////////////////////////// | |||
// Member variables | |||
//////////////////////////////////////////////////////////////////////////////// | |||
private int _a; // Do not rename (binary serialization) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change affect binary serialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, it does. I didn't think it did, but alas. Will fix...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
internal void Init(GuidParseThrowStyle canThrow) | ||
public GuidResult(GuidParseThrowStyle canThrow) : this() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The GuidResult constructor is public, but all other GuidResult methods are internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will fix.
|
||
internal void Init(GuidParseThrowStyle canThrow) | ||
public GuidResult(GuidParseThrowStyle canThrow) : this() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: this()
is unnecessary and unusual for structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative would be to initialize all of the other fields manually; would you prefer that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It is fine then. Ignore my comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Thanks.
@@ -1186,7 +1186,7 @@ internal static bool TryParseUInt64(ReadOnlySpan<char> value, NumberStyles style | |||
if ((styles & NumberStyles.AllowHexSpecifier) != 0) | |||
{ | |||
bool overflow = false; | |||
return TryParseUInt64HexNumberStyle(value, styles, info, out result, ref overflow); | |||
return TryParseUInt64HexNumberStyle(value, styles, out result, ref overflow); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: extra space
return false; | ||
} | ||
bytes[i] = (byte)number; | ||
guidByteRef = (byte)byteVal; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a tiny bit faster/smaller to just do
Unsafe.Add(ref result._parsedGuid._d, i) = (byte)byteVal;
and not cache it in guidByteRef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will change. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Significantly improves the performance of parsing all Guid number styles, primarily by avoiding using the StringToInt core helper that was used previously to parse each of the consistuent components, as well as avoiding some unnecessary searches of the strings in order to determine which format is employed. I kept strong compatibility with the existing implementation, down to what exceptions are thrown when (even when they’re a bit strange). However, there are a few cases where the error messages in those exceptions differ from what they previously were, due to ambiguities, and IMO it not being worth making the implementation slower to try to maintain the exact same choice. For example, the string “{0xdddddddd, 0xdddd 0xdddd,{0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd}}” isn’t parsable, because it’s missing a comma between the second and third components, and with whitespace removed the parser will try to parse “0xdddd0xdddd” and fail to do so. Previously that would result in an error message “Additional non-parsable characters are at the end of the string”, and now it’ll result in an error message “Guid string should only contain hexadecimal characters.” Similarly, “(-ddddddd-dddd-dddd-dddd-dddddddddddd)” would previously fail with “Unrecognized Guid format”, and now it’ll also fail with “Guid string should only contain hexadecimal characters.”
c2180fb
to
0574c3c
Compare
* Improve Guid parsing performance Significantly improves the performance of parsing all Guid number styles, primarily by avoiding using the StringToInt core helper that was used previously to parse each of the consistuent components, as well as avoiding some unnecessary searches of the strings in order to determine which format is employed. I kept strong compatibility with the existing implementation, down to what exceptions are thrown when (even when they’re a bit strange). However, there are a few cases where the error messages in those exceptions differ from what they previously were, due to ambiguities, and IMO it not being worth making the implementation slower to try to maintain the exact same choice. For example, the string “{0xdddddddd, 0xdddd 0xdddd,{0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd}}” isn’t parsable, because it’s missing a comma between the second and third components, and with whitespace removed the parser will try to parse “0xdddd0xdddd” and fail to do so. Previously that would result in an error message “Additional non-parsable characters are at the end of the string”, and now it’ll result in an error message “Guid string should only contain hexadecimal characters.” Similarly, “(-ddddddd-dddd-dddd-dddd-dddddddddddd)” would previously fail with “Unrecognized Guid format”, and now it’ll also fail with “Guid string should only contain hexadecimal characters.” * Undo int->uint / short->ushort field changes * Address PR feedback Signed-off-by: dotnet-bot <[email protected]>
* Improve Guid parsing performance Significantly improves the performance of parsing all Guid number styles, primarily by avoiding using the StringToInt core helper that was used previously to parse each of the consistuent components, as well as avoiding some unnecessary searches of the strings in order to determine which format is employed. I kept strong compatibility with the existing implementation, down to what exceptions are thrown when (even when they’re a bit strange). However, there are a few cases where the error messages in those exceptions differ from what they previously were, due to ambiguities, and IMO it not being worth making the implementation slower to try to maintain the exact same choice. For example, the string “{0xdddddddd, 0xdddd 0xdddd,{0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd}}” isn’t parsable, because it’s missing a comma between the second and third components, and with whitespace removed the parser will try to parse “0xdddd0xdddd” and fail to do so. Previously that would result in an error message “Additional non-parsable characters are at the end of the string”, and now it’ll result in an error message “Guid string should only contain hexadecimal characters.” Similarly, “(-ddddddd-dddd-dddd-dddd-dddddddddddd)” would previously fail with “Unrecognized Guid format”, and now it’ll also fail with “Guid string should only contain hexadecimal characters.” * Undo int->uint / short->ushort field changes * Address PR feedback Signed-off-by: dotnet-bot <[email protected]>
* Improve Guid parsing performance Significantly improves the performance of parsing all Guid number styles, primarily by avoiding using the StringToInt core helper that was used previously to parse each of the consistuent components, as well as avoiding some unnecessary searches of the strings in order to determine which format is employed. I kept strong compatibility with the existing implementation, down to what exceptions are thrown when (even when they’re a bit strange). However, there are a few cases where the error messages in those exceptions differ from what they previously were, due to ambiguities, and IMO it not being worth making the implementation slower to try to maintain the exact same choice. For example, the string “{0xdddddddd, 0xdddd 0xdddd,{0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd}}” isn’t parsable, because it’s missing a comma between the second and third components, and with whitespace removed the parser will try to parse “0xdddd0xdddd” and fail to do so. Previously that would result in an error message “Additional non-parsable characters are at the end of the string”, and now it’ll result in an error message “Guid string should only contain hexadecimal characters.” Similarly, “(-ddddddd-dddd-dddd-dddd-dddddddddddd)” would previously fail with “Unrecognized Guid format”, and now it’ll also fail with “Guid string should only contain hexadecimal characters.” * Undo int->uint / short->ushort field changes * Address PR feedback Signed-off-by: dotnet-bot <[email protected]>
* Improve Guid parsing performance Significantly improves the performance of parsing all Guid number styles, primarily by avoiding using the StringToInt core helper that was used previously to parse each of the consistuent components, as well as avoiding some unnecessary searches of the strings in order to determine which format is employed. I kept strong compatibility with the existing implementation, down to what exceptions are thrown when (even when they’re a bit strange). However, there are a few cases where the error messages in those exceptions differ from what they previously were, due to ambiguities, and IMO it not being worth making the implementation slower to try to maintain the exact same choice. For example, the string “{0xdddddddd, 0xdddd 0xdddd,{0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd}}” isn’t parsable, because it’s missing a comma between the second and third components, and with whitespace removed the parser will try to parse “0xdddd0xdddd” and fail to do so. Previously that would result in an error message “Additional non-parsable characters are at the end of the string”, and now it’ll result in an error message “Guid string should only contain hexadecimal characters.” Similarly, “(-ddddddd-dddd-dddd-dddd-dddddddddddd)” would previously fail with “Unrecognized Guid format”, and now it’ll also fail with “Guid string should only contain hexadecimal characters.” * Undo int->uint / short->ushort field changes * Address PR feedback Signed-off-by: dotnet-bot <[email protected]>
Significantly improves the performance of parsing all Guid number styles, primarily by avoiding using the StringToInt helper that was used previously to parse each of the constituent components, as well as avoiding some unnecessary searches of the strings in order to determine which format is employed. Deleted a bunch of code along the way.
I kept strong compatibility with the existing implementation, down to what exceptions are thrown when (even when they’re a bit strange). However, there are a few cases where the error messages in those exceptions differ from what they previously were, due to ambiguities, and IMO it not being worth making the implementation slower to try to maintain the exact same choice. For example, the string “{0xdddddddd, 0xdddd 0xdddd,{0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd}}” isn’t parsable, because it’s missing a comma between the second and third components, and with whitespace removed the parser will try to parse “0xdddd0xdddd” and fail to do so. Previously that would result in an error message “Additional non-parsable characters are at the end of the string”, and now it’ll result in an error message “Guid string should only contain hexadecimal characters.” Similarly, “(-ddddddd-dddd-dddd-dddd-dddddddddddd)” would previously fail with “Unrecognized Guid format”, and now it’ll also fail with “Guid string should only contain hexadecimal characters.”
Benchmark:
Contributes to https://github.com/dotnet/corefx/issues/30612
cc: @jkotas, @ahsonkhan, @pjanotti, @joperezr, @Tornhoof