Improve performance of WebUtility.Decode for non-escaped strings #7671

hughbe · 2016-04-12T08:45:50Z

Cuts allocations down to 1/3 of the original
Doubles performance (in time)
We still allocate a little bit, as removing all allocations would harm the performance of strings that actually need escaping

Benchmark - escaping needed

No performance regressions

Benchmark - no escaping needed

Benchmark Code

Click here

static void Main(string[] args)
{
    // Escaping
    TimeAction("Old: ", () => Old.UrlDecode("%ABabc"));
    TimeAction("New: ", () => New.UrlDecode("%ABabc"));

    // No escaping
    TimeAction("Old: ", () => Old.UrlDecode("abc"));
    TimeAction("New: ", () => New.UrlDecode("abc"));

    Console.ReadLine();
}

public static void TimeAction(string prefix, Action action)
{
    var sw = new Stopwatch();
    for (int iter = 0; iter < 5; iter++)
    {
        int gen0 = GC.CollectionCount(0);
        sw.Restart();
        for (int i = 0; i < 10000000; i++)
        {
            action();
        }
        sw.Stop();
        Console.WriteLine($"{prefix}Time: {sw.Elapsed.TotalSeconds}\tGC0: {GC.CollectionCount(0) - gen0}");
    }
}

/cc @stephentoub @jamesqo @davidsh
Fixes #6542 together with #7546

- Cuts allocations down to 1/3 of the original - Doubles performance (in time) - We still allocate a little bit, as removing all allocations would harm the performance of strings that actually need escaping Fixes #6542

stephentoub · 2016-04-12T14:13:45Z

src/System.Runtime.Extensions/src/System/Net/WebUtility.cs

@@ -610,6 +612,9 @@ private static bool StringRequiresHtmlDecoding(string s)
        // Internal struct to facilitate URL decoding -- keeps char buffer and byte buffer, allows appending of either chars or bytes
        private struct UrlDecoder
        {
+            public bool _containsUnsafe;
+            public bool _containsSpaces;


Why store these as fields on the UrlDecoder? Seems like they could just be a single local needsDecoding, and then instead of doing return helper.GetString();, you'd do return needsDecoding ? helper.GetString() : value;.

I agree with moving them out of UrlDecoder.
I would like to keep the fields containsUnsafe and containsSpaces as separate variables, as I'm working on a PR that's gonna heavily optimize UrlEncode and UrlDecode for strings that only need space encoding/decoding. Is that OK?

I'd prefer it if you kept them as a single local now. If a subsequent change needs them separated, then that change can do it. Otherwise, if for example that change never materializes or gets merged, we're left with debt.

Sure thing!

stephentoub · 2016-04-12T14:18:54Z

src/System.Runtime.Extensions/src/System/Net/WebUtility.cs

                _bufferSize = bufferSize;
                _encoding = encoding;

-                _charBuffer = new char[bufferSize];
-
+                _charBuffer = null; // char buffer created on demand


Is this really useful? Empty string is already special cased to avoid decoding. So isn't this only valuable in the case where all of the inputs are bytes rather than chars, and to enable that you're then doing an extra null check branch on every AddChar? And even then all of the bytes except for those at the end will end up getting dumped into the char[], forcing it to be allocated. Doesn't seem like a good tradeoff.

Well I found that this halved allocations for the UrlDecode("noescaping") case, as it seems that ASCII chars are added to the decoder as AddByte not as AddChar. I found this behaviour weird, but lazilly instantiating _charBuffer does help in the no escaping case.

Just a follow up: its due to this line of code.

I don't really understand why "// 7 bit have to go as bytes because of Unicode" but hey

I see, ok. (Seems strange then that we're lazily allocating the byte[] array; doing so suggests we don't expect any ASCII chars in common input, but let's not change it.)

understood - obviously I'm not planning to change any of that behaviour in this PR, but do you have a theory as to why we add ascii chars as bytes not chars?

right but they allocate the array only when needed

No. In the code before this PR, the char[] is always allocated (unless the string is empty).

right. looking more, it looks when having ascii characters, it needs to be encoded by the encoder. bu tif the character is not ascii will be stored as it is without encoding. so I believe what they are doing is collecting ascii charcaters in the array, and everytime need to flush it, will run the encoder just once on the array before the flush. so it is just optimizing to not calling the encoder with every ascii character.

so it is just optimizing to not calling the encoder with every ascii character.

but doesn't the implementation check if there are any bytes in the buffer, and if there are none in the buffer it doesn't flush any bytes
So that means that each ascii character wouldn't flush any bytes, only the first in a string of ascii chars

UrlDecoder is collecting the ascii bytes till either AddChar or GetString get called. at that time it will flush the bytes after encoding it. and then will start collecting the new encountered bytes again and so on.
imagine you have string like aaaaaNaaaaNaaaaNaaa
where a is ascii character and N is not ascii, this will have the byte array get filled 4 times and encoded 4 times. if you didn't collect the ascii in byte array as the code is doing, this mean you'll need to call the encoder the number of 'a' characters in the string.

understood, thanks

hughbe · 2016-04-12T17:00:56Z

Test Innerloop CentOS7.1 Release Build and Test
Test Innerloop Windows_NT Debug Build and Test

stephentoub · 2016-04-12T18:00:03Z

@dotnet-bot test code coverage please

hughbe · 2016-04-12T18:01:57Z

@stephentoub is this new/can non-Microsoft people use that code coverage feature

stephentoub · 2016-04-12T18:03:23Z

is this new

nope

can non-Microsoft people use that code coverage feature

yup

hughbe · 2016-04-12T18:05:08Z

nope

That's cool, I've never noticed that before. Is the idea that it fails if there is an overall decrease in coverage or does it just publish a report? (I guess we'll find out!)

stephentoub · 2016-04-12T18:06:32Z

Is the idea that it fails if there is an overall decrease in coverage or does it just publish a report? (I guess we'll find out!)

It doesn't perform any comparison, but it archives the report so you can click through to view it.

stephentoub · 2016-04-13T10:58:51Z

src/System.Runtime.Extensions/src/System/Net/WebUtility.cs

+            {
+                // No decoding needed
+                return value;
+            }


If it were me, I'd have written this like:

return needsDecoding ? helper.GetString() : value;

hughbe · 2016-04-13T11:34:16Z

Thanks, I agree its a good idea to add some more tests to cover all the branches I've changed. I also added some more tests for invalid percent encoding, lowercase hex, space escaping and non-ASCII chars.
This brings coverage up to 100% for UrlDecodeInternal and almost 100% for UrlDecoder - I think the missing coverage may be dead code (line 683)

stephentoub · 2016-04-15T16:00:18Z

LGTM

@davidsh?

davidsh · 2016-04-15T16:35:15Z

LGTM.

davidsh · 2016-04-15T17:42:57Z

CI Failures:
http://dotnet-ci.cloudapp.net/job/dotnet_corefx/job/master/job/windows_nt_debug_prtest/2/

(SslStream)

davidsh · 2016-04-15T17:43:23Z

@dotnet-bot Test Innerloop Windows_NT Debug Build and Test

davidsh · 2016-04-15T17:44:55Z

@dotnet-bot Test Innerloop CentOS7.1 Release Build and Test

hughbe · 2016-04-16T16:44:55Z

Glad to see this merged. Should I send over a similar PR to coreclr, or will this be automatically handled by the netfx-port-consider label

Improve performance of WebUtility.Decode for non-escaped strings Commit migrated from dotnet/corefx@be2360b

Improve performance of WebUtility.Decode for non-escaped strings

8cb7fc4

- Cuts allocations down to 1/3 of the original - Doubles performance (in time) - We still allocate a little bit, as removing all allocations would harm the performance of strings that actually need escaping Fixes #6542

dnfclas added the cla-already-signed label Apr 12, 2016

hughbe changed the title ~~Improve performance of WebUtility.Decode for non-escaped string~~ Improve performance of WebUtility.Decode for non-escaped strings Apr 12, 2016

stephentoub reviewed Apr 12, 2016
View reviewed changes

davidsh added the area-System.Net label Apr 12, 2016

stephentoub reviewed Apr 12, 2016
View reviewed changes

Address PR feedback

5f256f3

Invalid percent encoded sequences don't need decodiign

ad9ec8b

stephentoub reviewed Apr 13, 2016
View reviewed changes

Add some more tests to WebUtility

9c7efbf

stephentoub added the netfx-port-consider label Apr 13, 2016

AlexGhiondea added netfx-port-consider and removed netfx-port-consider labels Apr 13, 2016

davidsh merged commit be2360b into dotnet:master Apr 15, 2016

hughbe deleted the webutility-decode branch April 16, 2016 16:44

karelz added this to the 1.0.0-rtm milestone Dec 3, 2016

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Merge pull request dotnet/corefx#7671 from hughbe/webutility-decode

ef6f6b5

Improve performance of WebUtility.Decode for non-escaped strings Commit migrated from dotnet/corefx@be2360b

Improve performance of WebUtility.Decode for non-escaped strings #7671

Improve performance of WebUtility.Decode for non-escaped strings #7671

Uh oh!

Conversation

hughbe commented Apr 12, 2016

Benchmark - escaping needed

Benchmark - no escaping needed

Benchmark Code

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hughbe commented Apr 12, 2016

Uh oh!

stephentoub commented Apr 12, 2016

Uh oh!

hughbe commented Apr 12, 2016

Uh oh!

stephentoub commented Apr 12, 2016

Uh oh!

hughbe commented Apr 12, 2016

Uh oh!

stephentoub commented Apr 12, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hughbe commented Apr 13, 2016

Uh oh!

stephentoub commented Apr 15, 2016

Uh oh!

davidsh commented Apr 15, 2016

Uh oh!

davidsh commented Apr 15, 2016

Uh oh!

davidsh commented Apr 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidsh commented Apr 15, 2016

Uh oh!

hughbe commented Apr 16, 2016

Uh oh!

Uh oh!

davidsh commented Apr 15, 2016 •

edited

Loading