Improve Guid parsing performance #20183

stephentoub · 2018-09-28T17:07:23Z

Significantly improves the performance of parsing all Guid number styles, primarily by avoiding using the StringToInt helper that was used previously to parse each of the constituent components, as well as avoiding some unnecessary searches of the strings in order to determine which format is employed. Deleted a bunch of code along the way.

I kept strong compatibility with the existing implementation, down to what exceptions are thrown when (even when they’re a bit strange). However, there are a few cases where the error messages in those exceptions differ from what they previously were, due to ambiguities, and IMO it not being worth making the implementation slower to try to maintain the exact same choice. For example, the string “{0xdddddddd, 0xdddd 0xdddd,{0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd}}” isn’t parsable, because it’s missing a comma between the second and third components, and with whitespace removed the parser will try to parse “0xdddd0xdddd” and fail to do so. Previously that would result in an error message “Additional non-parsable characters are at the end of the string”, and now it’ll result in an error message “Guid string should only contain hexadecimal characters.” Similarly, “(-ddddddd-dddd-dddd-dddd-dddddddddddd)” would previously fail with “Unrecognized Guid format”, and now it’ll also fail with “Guid string should only contain hexadecimal characters.”

Benchmark:

Benchmark	Before (ns)	After (ns)	Before / After
ParseExactB	207.1	125.1	1.66x
ParseExactD	213.3	126.1	1.69x
ParseExactN	334.9	167.9	1.99x
ParseExactP	216.3	124.8	1.73x
ParseExactX	521.1	335.3	1.55x
ParseB	207.6	120.4	1.72x
ParseD	206.2	118.4	1.74x
ParseN	325.1	150	2.17x
ParseP	204.3	118.1	1.73x
ParseX	522.5	341.8	1.53x
TryParseDInvalidHex	228.4	127.2	1.80x
TryParseXInvalidOverflow	28,883.20	188.7	153.06x

using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Attributes.Jobs;
using BenchmarkDotNet.Running;

[InProcess]
public class Test
{
    [Benchmark] public static Guid ParseExactB() => Guid.ParseExact("{329cf900-b138-45a9-a027-353c52eb96d0}", "B");
    [Benchmark] public static Guid ParseExactD() => Guid.ParseExact("329cf900-b138-45a9-a027-353c52eb96d0", "D");
    [Benchmark] public static Guid ParseExactN() => Guid.ParseExact("329cf900b13845a9a027353c52eb96d0", "N");
    [Benchmark] public static Guid ParseExactP() => Guid.ParseExact("(329cf900-b138-45a9-a027-353c52eb96d0)", "P");
    [Benchmark] public static Guid ParseExactX() => Guid.ParseExact("{0x329cf900,0xb138,0x45a9,{0xa0,0x27,0x35,0x3c,0x52,0xeb,0x96,0xd0}}", "X");

    [Benchmark] public static Guid ParseB() => Guid.Parse("{329cf900-b138-45a9-a027-353c52eb96d0}");
    [Benchmark] public static Guid ParseD() => Guid.Parse("329cf900-b138-45a9-a027-353c52eb96d0");
    [Benchmark] public static Guid ParseN() => Guid.Parse("329cf900b13845a9a027353c52eb96d0");
    [Benchmark] public static Guid ParseP() => Guid.Parse("(329cf900-b138-45a9-a027-353c52eb96d0)");
    [Benchmark] public static Guid ParseX() => Guid.Parse("{0x329cf900,0xb138,0x45a9,{0xa0,0x27,0x35,0x3c,0x52,0xeb,0x96,0xd0}}");

    [Benchmark] public static bool TryParseDInvalidHex() => Guid.TryParse("329cf900-b138-45a9-a027-353c52eb96dZ", out Guid g);
    [Benchmark] public static bool TryParseXInvalidOverflow() => Guid.TryParse("{0x329cf9001,0xb138,0x45a9,{0xa0,0x27,0x35,0x3c,0x52,0xeb,0x96,0xd0}}", out Guid g);

    public static void Main() => BenchmarkRunner.Run<Test>();
}

Contributes to https://github.com/dotnet/corefx/issues/30612
cc: @jkotas, @ahsonkhan, @pjanotti, @joperezr, @Tornhoof

jkotas · 2018-09-28T17:51:47Z

src/System.Private.CoreLib/shared/System/Guid.cs

@@ -22,17 +23,17 @@ public partial struct Guid : IFormattable, IComparable, IComparable<Guid>, IEqua
        ////////////////////////////////////////////////////////////////////////////////
        //  Member variables
        ////////////////////////////////////////////////////////////////////////////////
-        private int _a; // Do not rename (binary serialization)


Does this change affect binary serialization?

Ugh, it does. I didn't think it did, but alas. Will fix...

jkotas · 2018-09-29T14:59:28Z

src/System.Private.CoreLib/shared/System/Guid.cs


-            internal void Init(GuidParseThrowStyle canThrow)
+            public GuidResult(GuidParseThrowStyle canThrow) : this()


Nit: The GuidResult constructor is public, but all other GuidResult methods are internal.

Ok, will fix.

jkotas · 2018-09-29T14:59:55Z

src/System.Private.CoreLib/shared/System/Guid.cs


-            internal void Init(GuidParseThrowStyle canThrow)
+            public GuidResult(GuidParseThrowStyle canThrow) : this()


: this() is unnecessary and unusual for structs.

The alternative would be to initialize all of the other fields manually; would you prefer that?

I see. It is fine then. Ignore my comment.

Ok. Thanks.

jkotas · 2018-09-29T15:03:48Z

src/System.Private.CoreLib/shared/System/Number.Parsing.cs

@@ -1186,7 +1186,7 @@ internal static bool TryParseUInt64(ReadOnlySpan<char> value, NumberStyles style
            if ((styles & NumberStyles.AllowHexSpecifier) != 0)
            {
                bool overflow = false;
-                return TryParseUInt64HexNumberStyle(value, styles, info, out result, ref overflow);
+                return TryParseUInt64HexNumberStyle(value, styles,  out result, ref overflow);


Nit: extra space

jkotas · 2018-09-29T15:16:34Z

src/System.Private.CoreLib/shared/System/Guid.cs

-                    return false;
-                }
-                bytes[i] = (byte)number;
+                guidByteRef = (byte)byteVal;


It should be a tiny bit faster/smaller to just do
Unsafe.Add(ref result._parsedGuid._d, i) = (byte)byteVal; and not cache it in guidByteRef.

Ok, will change. Thanks.

jkotas

Nice!

Significantly improves the performance of parsing all Guid number styles, primarily by avoiding using the StringToInt core helper that was used previously to parse each of the consistuent components, as well as avoiding some unnecessary searches of the strings in order to determine which format is employed. I kept strong compatibility with the existing implementation, down to what exceptions are thrown when (even when they’re a bit strange). However, there are a few cases where the error messages in those exceptions differ from what they previously were, due to ambiguities, and IMO it not being worth making the implementation slower to try to maintain the exact same choice. For example, the string “{0xdddddddd, 0xdddd 0xdddd,{0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd}}” isn’t parsable, because it’s missing a comma between the second and third components, and with whitespace removed the parser will try to parse “0xdddd0xdddd” and fail to do so. Previously that would result in an error message “Additional non-parsable characters are at the end of the string”, and now it’ll result in an error message “Guid string should only contain hexadecimal characters.” Similarly, “(-ddddddd-dddd-dddd-dddd-dddddddddddd)” would previously fail with “Unrecognized Guid format”, and now it’ll also fail with “Guid string should only contain hexadecimal characters.”

* Improve Guid parsing performance Significantly improves the performance of parsing all Guid number styles, primarily by avoiding using the StringToInt core helper that was used previously to parse each of the consistuent components, as well as avoiding some unnecessary searches of the strings in order to determine which format is employed. I kept strong compatibility with the existing implementation, down to what exceptions are thrown when (even when they’re a bit strange). However, there are a few cases where the error messages in those exceptions differ from what they previously were, due to ambiguities, and IMO it not being worth making the implementation slower to try to maintain the exact same choice. For example, the string “{0xdddddddd, 0xdddd 0xdddd,{0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd,0xdd}}” isn’t parsable, because it’s missing a comma between the second and third components, and with whitespace removed the parser will try to parse “0xdddd0xdddd” and fail to do so. Previously that would result in an error message “Additional non-parsable characters are at the end of the string”, and now it’ll result in an error message “Guid string should only contain hexadecimal characters.” Similarly, “(-ddddddd-dddd-dddd-dddd-dddddddddddd)” would previously fail with “Unrecognized Guid format”, and now it’ll also fail with “Guid string should only contain hexadecimal characters.” * Undo int->uint / short->ushort field changes * Address PR feedback Signed-off-by: dotnet-bot <[email protected]>

stephentoub added tenet-performance Performance related issue area-System.Runtime labels Sep 28, 2018

stephentoub added this to the 3.0 milestone Sep 28, 2018

stephentoub force-pushed the guidparse branch from 7ed133e to ee3bb66 Compare September 28, 2018 17:34

jkotas reviewed Sep 28, 2018

View reviewed changes

jkotas reviewed Sep 29, 2018

View reviewed changes

jkotas approved these changes Sep 29, 2018

View reviewed changes

stephentoub added 3 commits September 29, 2018 16:56

Undo int->uint / short->ushort field changes

20d0fb0

Address PR feedback

0574c3c

stephentoub force-pushed the guidparse branch from c2180fb to 0574c3c Compare September 29, 2018 21:01

stephentoub merged commit c04fee1 into dotnet:master Sep 29, 2018

stephentoub deleted the guidparse branch September 29, 2018 23:12

stephentoub mentioned this pull request Nov 25, 2018

Remove dead Guid parsing code #21123

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Guid parsing performance #20183

Improve Guid parsing performance #20183

Uh oh!

stephentoub commented Sep 28, 2018

Uh oh!

jkotas Sep 28, 2018

Uh oh!

stephentoub Sep 28, 2018

Uh oh!

stephentoub Sep 28, 2018

Uh oh!

jkotas Sep 29, 2018

Uh oh!

stephentoub Sep 29, 2018

Uh oh!

jkotas Sep 29, 2018

Uh oh!

stephentoub Sep 29, 2018

Uh oh!

jkotas Sep 29, 2018

Uh oh!

stephentoub Sep 29, 2018

Uh oh!

jkotas Sep 29, 2018

Uh oh!

jkotas Sep 29, 2018

Uh oh!

stephentoub Sep 29, 2018

Uh oh!

jkotas left a comment

Uh oh!

Uh oh!


		internal void Init(GuidParseThrowStyle canThrow)
		public GuidResult(GuidParseThrowStyle canThrow) : this()

Improve Guid parsing performance #20183

Improve Guid parsing performance #20183

Uh oh!

Conversation

stephentoub commented Sep 28, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!