Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Improve Guid parsing performance #20183

Merged
merged 3 commits into from
Sep 29, 2018
Merged

Conversation

stephentoub
Copy link
Member

Significantly improves the performance of parsing all Guid number styles, primarily by avoiding using the StringToInt helper that was used previously to parse each of the constituent components, as well as avoiding some unnecessary searches of the strings in order to determine which format is employed. Deleted a bunch of code along the way.

I kept strong compatibility with the existing implementation, down to what exceptions are thrown when (even when they’re a bit strange). However, there are a few cases where the error messages in those exceptions differ from what they previously were, due to ambiguities, and IMO it not being worth making the implementation slower to try to maintain the exact same choice. For example, the string “{0xdddddddd, 0xdddd 0xdddd,{0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd}}” isn’t parsable, because it’s missing a comma between the second and third components, and with whitespace removed the parser will try to parse “0xdddd0xdddd” and fail to do so. Previously that would result in an error message “Additional non-parsable characters are at the end of the string”, and now it’ll result in an error message “Guid string should only contain hexadecimal characters.” Similarly, “(-ddddddd-dddd-dddd-dddd-dddddddddddd)” would previously fail with “Unrecognized Guid format”, and now it’ll also fail with “Guid string should only contain hexadecimal characters.”

Benchmark:

Benchmark Before (ns) After (ns) Before / After
ParseExactB 207.1 125.1 1.66x
ParseExactD 213.3 126.1 1.69x
ParseExactN 334.9 167.9 1.99x
ParseExactP 216.3 124.8 1.73x
ParseExactX 521.1 335.3 1.55x
ParseB 207.6 120.4 1.72x
ParseD 206.2 118.4 1.74x
ParseN 325.1 150 2.17x
ParseP 204.3 118.1 1.73x
ParseX 522.5 341.8 1.53x
TryParseDInvalidHex 228.4 127.2 1.80x
TryParseXInvalidOverflow 28,883.20 188.7 153.06x
using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Attributes.Jobs;
using BenchmarkDotNet.Running;

[InProcess]
public class Test
{
    [Benchmark] public static Guid ParseExactB() => Guid.ParseExact("{329cf900-b138-45a9-a027-353c52eb96d0}", "B");
    [Benchmark] public static Guid ParseExactD() => Guid.ParseExact("329cf900-b138-45a9-a027-353c52eb96d0", "D");
    [Benchmark] public static Guid ParseExactN() => Guid.ParseExact("329cf900b13845a9a027353c52eb96d0", "N");
    [Benchmark] public static Guid ParseExactP() => Guid.ParseExact("(329cf900-b138-45a9-a027-353c52eb96d0)", "P");
    [Benchmark] public static Guid ParseExactX() => Guid.ParseExact("{0x329cf900,0xb138,0x45a9,{0xa0,0x27,0x35,0x3c,0x52,0xeb,0x96,0xd0}}", "X");

    [Benchmark] public static Guid ParseB() => Guid.Parse("{329cf900-b138-45a9-a027-353c52eb96d0}");
    [Benchmark] public static Guid ParseD() => Guid.Parse("329cf900-b138-45a9-a027-353c52eb96d0");
    [Benchmark] public static Guid ParseN() => Guid.Parse("329cf900b13845a9a027353c52eb96d0");
    [Benchmark] public static Guid ParseP() => Guid.Parse("(329cf900-b138-45a9-a027-353c52eb96d0)");
    [Benchmark] public static Guid ParseX() => Guid.Parse("{0x329cf900,0xb138,0x45a9,{0xa0,0x27,0x35,0x3c,0x52,0xeb,0x96,0xd0}}");

    [Benchmark] public static bool TryParseDInvalidHex() => Guid.TryParse("329cf900-b138-45a9-a027-353c52eb96dZ", out Guid g);
    [Benchmark] public static bool TryParseXInvalidOverflow() => Guid.TryParse("{0x329cf9001,0xb138,0x45a9,{0xa0,0x27,0x35,0x3c,0x52,0xeb,0x96,0xd0}}", out Guid g);

    public static void Main() => BenchmarkRunner.Run<Test>();
}

Contributes to https://github.com/dotnet/corefx/issues/30612
cc: @jkotas, @ahsonkhan, @pjanotti, @joperezr, @Tornhoof

@@ -22,17 +23,17 @@ public partial struct Guid : IFormattable, IComparable, IComparable<Guid>, IEqua
////////////////////////////////////////////////////////////////////////////////
// Member variables
////////////////////////////////////////////////////////////////////////////////
private int _a; // Do not rename (binary serialization)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change affect binary serialization?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, it does. I didn't think it did, but alas. Will fix...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


internal void Init(GuidParseThrowStyle canThrow)
public GuidResult(GuidParseThrowStyle canThrow) : this()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The GuidResult constructor is public, but all other GuidResult methods are internal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will fix.


internal void Init(GuidParseThrowStyle canThrow)
public GuidResult(GuidParseThrowStyle canThrow) : this()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: this() is unnecessary and unusual for structs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative would be to initialize all of the other fields manually; would you prefer that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It is fine then. Ignore my comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Thanks.

@@ -1186,7 +1186,7 @@ internal static bool TryParseUInt64(ReadOnlySpan<char> value, NumberStyles style
if ((styles & NumberStyles.AllowHexSpecifier) != 0)
{
bool overflow = false;
return TryParseUInt64HexNumberStyle(value, styles, info, out result, ref overflow);
return TryParseUInt64HexNumberStyle(value, styles, out result, ref overflow);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: extra space

return false;
}
bytes[i] = (byte)number;
guidByteRef = (byte)byteVal;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a tiny bit faster/smaller to just do
Unsafe.Add(ref result._parsedGuid._d, i) = (byte)byteVal; and not cache it in guidByteRef.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will change. Thanks.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Significantly improves the performance of parsing all Guid number styles, primarily by avoiding using the StringToInt core helper that was used previously to parse each of the consistuent components, as well as avoiding some unnecessary searches of the strings in order to determine which format is employed.

I kept strong compatibility with the existing implementation, down to what exceptions are thrown when (even when they’re a bit strange).  However, there are a few cases where the error messages in those exceptions differ from what they previously were, due to ambiguities, and IMO it not being worth making the implementation slower to try to maintain the exact same choice.  For example, the string “{0xdddddddd, 0xdddd 0xdddd,{0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd}}” isn’t parsable, because it’s missing a comma between the second and third components, and with whitespace removed the parser will try to parse “0xdddd0xdddd” and fail to do so.  Previously that would result in an error message “Additional non-parsable characters are at the end of the string”, and now it’ll result in an error message “Guid string should only contain hexadecimal characters.”  Similarly, “(-ddddddd-dddd-dddd-dddd-dddddddddddd)” would previously fail with “Unrecognized Guid format”, and now it’ll also fail with “Guid string should only contain hexadecimal characters.”
@stephentoub stephentoub merged commit c04fee1 into dotnet:master Sep 29, 2018
@stephentoub stephentoub deleted the guidparse branch September 29, 2018 23:12
dotnet-maestro-bot pushed a commit to dotnet-maestro-bot/corefx that referenced this pull request Sep 29, 2018
* Improve Guid parsing performance

Significantly improves the performance of parsing all Guid number styles, primarily by avoiding using the StringToInt core helper that was used previously to parse each of the consistuent components, as well as avoiding some unnecessary searches of the strings in order to determine which format is employed.

I kept strong compatibility with the existing implementation, down to what exceptions are thrown when (even when they’re a bit strange).  However, there are a few cases where the error messages in those exceptions differ from what they previously were, due to ambiguities, and IMO it not being worth making the implementation slower to try to maintain the exact same choice.  For example, the string “{0xdddddddd, 0xdddd 0xdddd,{0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd}}” isn’t parsable, because it’s missing a comma between the second and third components, and with whitespace removed the parser will try to parse “0xdddd0xdddd” and fail to do so.  Previously that would result in an error message “Additional non-parsable characters are at the end of the string”, and now it’ll result in an error message “Guid string should only contain hexadecimal characters.”  Similarly, “(-ddddddd-dddd-dddd-dddd-dddddddddddd)” would previously fail with “Unrecognized Guid format”, and now it’ll also fail with “Guid string should only contain hexadecimal characters.”

* Undo int->uint / short->ushort field changes

* Address PR feedback

Signed-off-by: dotnet-bot <[email protected]>
dotnet-maestro-bot pushed a commit to dotnet-maestro-bot/corert that referenced this pull request Sep 29, 2018
* Improve Guid parsing performance

Significantly improves the performance of parsing all Guid number styles, primarily by avoiding using the StringToInt core helper that was used previously to parse each of the consistuent components, as well as avoiding some unnecessary searches of the strings in order to determine which format is employed.

I kept strong compatibility with the existing implementation, down to what exceptions are thrown when (even when they’re a bit strange).  However, there are a few cases where the error messages in those exceptions differ from what they previously were, due to ambiguities, and IMO it not being worth making the implementation slower to try to maintain the exact same choice.  For example, the string “{0xdddddddd, 0xdddd 0xdddd,{0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd}}” isn’t parsable, because it’s missing a comma between the second and third components, and with whitespace removed the parser will try to parse “0xdddd0xdddd” and fail to do so.  Previously that would result in an error message “Additional non-parsable characters are at the end of the string”, and now it’ll result in an error message “Guid string should only contain hexadecimal characters.”  Similarly, “(-ddddddd-dddd-dddd-dddd-dddddddddddd)” would previously fail with “Unrecognized Guid format”, and now it’ll also fail with “Guid string should only contain hexadecimal characters.”

* Undo int->uint / short->ushort field changes

* Address PR feedback

Signed-off-by: dotnet-bot <[email protected]>
stephentoub added a commit to dotnet/corefx that referenced this pull request Oct 1, 2018
* Improve Guid parsing performance

Significantly improves the performance of parsing all Guid number styles, primarily by avoiding using the StringToInt core helper that was used previously to parse each of the consistuent components, as well as avoiding some unnecessary searches of the strings in order to determine which format is employed.

I kept strong compatibility with the existing implementation, down to what exceptions are thrown when (even when they’re a bit strange).  However, there are a few cases where the error messages in those exceptions differ from what they previously were, due to ambiguities, and IMO it not being worth making the implementation slower to try to maintain the exact same choice.  For example, the string “{0xdddddddd, 0xdddd 0xdddd,{0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd}}” isn’t parsable, because it’s missing a comma between the second and third components, and with whitespace removed the parser will try to parse “0xdddd0xdddd” and fail to do so.  Previously that would result in an error message “Additional non-parsable characters are at the end of the string”, and now it’ll result in an error message “Guid string should only contain hexadecimal characters.”  Similarly, “(-ddddddd-dddd-dddd-dddd-dddddddddddd)” would previously fail with “Unrecognized Guid format”, and now it’ll also fail with “Guid string should only contain hexadecimal characters.”

* Undo int->uint / short->ushort field changes

* Address PR feedback

Signed-off-by: dotnet-bot <[email protected]>
jkotas pushed a commit to dotnet/corert that referenced this pull request Oct 3, 2018
* Improve Guid parsing performance

Significantly improves the performance of parsing all Guid number styles, primarily by avoiding using the StringToInt core helper that was used previously to parse each of the consistuent components, as well as avoiding some unnecessary searches of the strings in order to determine which format is employed.

I kept strong compatibility with the existing implementation, down to what exceptions are thrown when (even when they’re a bit strange).  However, there are a few cases where the error messages in those exceptions differ from what they previously were, due to ambiguities, and IMO it not being worth making the implementation slower to try to maintain the exact same choice.  For example, the string “{0xdddddddd, 0xdddd 0xdddd,{0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd}}” isn’t parsable, because it’s missing a comma between the second and third components, and with whitespace removed the parser will try to parse “0xdddd0xdddd” and fail to do so.  Previously that would result in an error message “Additional non-parsable characters are at the end of the string”, and now it’ll result in an error message “Guid string should only contain hexadecimal characters.”  Similarly, “(-ddddddd-dddd-dddd-dddd-dddddddddddd)” would previously fail with “Unrecognized Guid format”, and now it’ll also fail with “Guid string should only contain hexadecimal characters.”

* Undo int->uint / short->ushort field changes

* Address PR feedback

Signed-off-by: dotnet-bot <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants