-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Path
/OsStr
incorrectly handle Formatter
flags
#136617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jieyouxu I notice you added a O-windows tag. Although this affects WTF-8 for Windows, it also affects the non-Windows version which has a more severe bug. |
thaliaarchi
added a commit
to thaliaarchi/rust
that referenced
this issue
Feb 7, 2025
The Display implementation for `OsStr` and `Path` on Windows (the WTF-8 version) only handles formatter flags when the entire string is valid UTF-8. As most paths are valid UTF-8, the common case is formatted like `str`; however, flags are ignored when they contain an unpaired surrogate. Implement its Display with the same logic as that of `str`. Fixes rust-lang#136617 for Windows.
thaliaarchi
added a commit
to thaliaarchi/rust
that referenced
this issue
Feb 7, 2025
The Display implementation for `OsStr` and `Path` on non-Windows systems (the byte-oriented version) only handles formatter flags when the entire string is valid UTF-8. As most paths are valid UTF-8, the common case is formatted like `str`; however, flags are ignored when they contain an invalid UTF-8 sequence. Implement its Display with the same logic as that of `str`. Additionally, it forwards flags when formatting the last chunk of valid UTF-8, so it erroneously inserts padding within the string when it has invalid UTF-8. Fix this. Fixes rust-lang#136617 for non-Windows systems.
thaliaarchi
added a commit
to thaliaarchi/rust
that referenced
this issue
Feb 7, 2025
The Display implementation for `OsStr` and `Path` on non-Windows systems (the byte-oriented version) only handles formatter flags when the entire string is valid UTF-8. As most paths are valid UTF-8, the common case is formatted like `str`; however, flags are ignored when they contain an invalid UTF-8 sequence. Implement its Display with the same logic as that of `str`. Additionally, it forwards flags when formatting the last chunk of valid UTF-8, so it erroneously inserts padding within the string when it has invalid UTF-8. Fix this. Fixes rust-lang#136617 for non-Windows systems.
@rustbot claim |
thaliaarchi
added a commit
to thaliaarchi/rust
that referenced
this issue
Feb 16, 2025
The Display implementation for `OsStr` and `Path` on Windows (the WTF-8 version) only handles formatter flags when the entire string is valid UTF-8. As most paths are valid UTF-8, the common case is formatted like `str`; however, flags are ignored when they contain an unpaired surrogate. Implement its Display with the same logic as that of `str`. Fixes rust-lang#136617 for Windows.
thaliaarchi
added a commit
to thaliaarchi/rust
that referenced
this issue
Feb 16, 2025
The Display implementation for `OsStr` and `Path` on non-Windows systems (the byte-oriented version) only handles formatter flags when the entire string is valid UTF-8. As most paths are valid UTF-8, the common case is formatted like `str`; however, flags are ignored when they contain an invalid UTF-8 sequence. Implement its Display with the same logic as that of `str`. Additionally, it forwards flags when formatting the last chunk of valid UTF-8, so it erroneously inserts padding within the string when it has invalid UTF-8. Fix this. Fixes rust-lang#136617 for non-Windows systems.
thaliaarchi
added a commit
to thaliaarchi/rust
that referenced
this issue
Feb 17, 2025
The Display implementation for `OsStr` and `Path` on Windows (the WTF-8 version) only handles formatter flags when the entire string is valid UTF-8. As most paths are valid UTF-8, the common case is formatted like `str`; however, flags are ignored when they contain an unpaired surrogate. Implement its Display with the same logic as that of `str`. Fixes rust-lang#136617 for Windows.
thaliaarchi
added a commit
to thaliaarchi/rust
that referenced
this issue
Feb 17, 2025
The Display implementation for `OsStr` and `Path` on non-Windows systems (the byte-oriented version) only handles formatter flags when the entire string is valid UTF-8. As most paths are valid UTF-8, the common case is formatted like `str`; however, flags are ignored when they contain an invalid UTF-8 sequence. Implement its Display with the same logic as that of `str`. Additionally, it forwards flags when formatting the last chunk of valid UTF-8, so it erroneously inserts padding within the string when it has invalid UTF-8. Fix this. Fixes rust-lang#136617 for non-Windows systems.
thaliaarchi
added a commit
to thaliaarchi/rust
that referenced
this issue
Feb 20, 2025
The Display implementation for `OsStr` and `Path` on Windows (the WTF-8 version) only handles formatter flags when the entire string is valid UTF-8. As most paths are valid UTF-8, the common case is formatted like `str`; however, flags are ignored when they contain an unpaired surrogate. Implement its Display with the same logic as that of `str`. Fixes rust-lang#136617 for Windows.
thaliaarchi
added a commit
to thaliaarchi/rust
that referenced
this issue
Feb 20, 2025
The Display implementation for `OsStr` and `Path` on non-Windows systems (the byte-oriented version) only handles formatter flags when the entire string is valid UTF-8. As most paths are valid UTF-8, the common case is formatted like `str`; however, flags are ignored when they contain an invalid UTF-8 sequence. Implement its Display with the same logic as that of `str`. Additionally, it forwards flags when formatting the last chunk of valid UTF-8, so it erroneously inserts padding within the string when it has invalid UTF-8. Fix this. Fixes rust-lang#136617 for non-Windows systems.
thaliaarchi
added a commit
to thaliaarchi/rust
that referenced
this issue
Feb 26, 2025
The Display implementation for `OsStr` and `Path` on Windows (the WTF-8 version) only handles formatter flags when the entire string is valid UTF-8. As most paths are valid UTF-8, the common case is formatted like `str`; however, flags are ignored when they contain an unpaired surrogate. Implement its Display with the same logic as that of `str`. Fixes rust-lang#136617 for Windows.
thaliaarchi
added a commit
to thaliaarchi/rust
that referenced
this issue
Feb 26, 2025
The Display implementation for `OsStr` and `Path` on non-Windows systems (the byte-oriented version) only handles formatter flags when the entire string is valid UTF-8. As most paths are valid UTF-8, the common case is formatted like `str`; however, flags are ignored when they contain an invalid UTF-8 sequence. Implement its Display with the same logic as that of `str`. Additionally, it forwards flags when formatting the last chunk of valid UTF-8, so it erroneously inserts padding within the string when it has invalid UTF-8. Fix this. Fixes rust-lang#136617 for non-Windows systems.
thaliaarchi
added a commit
to thaliaarchi/rust
that referenced
this issue
Mar 12, 2025
The Display implementation for `OsStr` and `Path` on Windows (the WTF-8 version) only handles formatter flags when the entire string is valid UTF-8. As most paths are valid UTF-8, the common case is formatted like `str`; however, flags are ignored when they contain an unpaired surrogate. Implement its Display with the same logic as that of `str`. Fixes rust-lang#136617 for Windows.
thaliaarchi
added a commit
to thaliaarchi/rust
that referenced
this issue
Mar 12, 2025
The Display implementation for `OsStr` and `Path` on non-Windows systems (the byte-oriented version) only handles formatter flags when the entire string is valid UTF-8. As most paths are valid UTF-8, the common case is formatted like `str`; however, flags are ignored when they contain an invalid UTF-8 sequence. Implement its Display with the same logic as that of `str`. Additionally, it forwards flags when formatting the last chunk of valid UTF-8, so it erroneously inserts padding within the string when it has invalid UTF-8. Fix this. Fixes rust-lang#136617 for non-Windows systems.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
The
Display
implementations forPath
,PathBuf
,OsStr
, andOsString
incorrectly handleFormatter<'_>
flags. This is visible on stable (for paths) or with#![feature(os_str_display)]
(for OS strings).Background
On Windows, these types convert from wide strings (
u16
) to WTF-8 and the only invalid sequences are unpaired surrogate halves. On other systems, these types use conventional UTF-8 and may have any kind of invalid sequence. TheDisplay
implementations replace such invalid sequences with the replacement character U+FFFD. This is done by iterating over valid chunks, where each is followed by an invalid sequence.Bugs
In the Windows implementation,
Wtf8
, when the full string is valid UTF-8, it forwards the formatter args by callingDisplay::fmt(s, f)
, so it is padded as requested in most cases. However, when it is invalid, the formatter args are ignored. In the non-Windows implementation,Bytes
, the last valid chunk is formatted withDisplay::fmt(last_chunk, f)
, but all other chunks withf.write_str(s)
. This is not only inconsistent, but it can display padding inside the string data.This illustrates the buggy
Display
implementations in (AFAIK) all the APIs they're exposed in:History
Utf8Chunks
-style iterationString::from_utf8_lossy
Fix
We cannot simply ignore the formatter args, because that would break backwards compatibility, as the common case of fully valid UTF-8 respects padding, matching
str
andString
.The simplest fix would be to bring the behavior of non-Windows in line with Windows, i.e., fix the erroneous padding for the last chunk by only forwarding formatter args if the full string is valid.
A simple fix would be to format
self.to_string_lossy()
instead of doing a manual iteration. However, this incurs an allocation when invalid.I propose doing the formatting by scanning in two passes. The first pass counts the number of Unicode scalars to render. Since there's no escaping, this is relatively inexpensive. Then the left padding is written. Then the second pass writes the string. Finally, the right padding is written. This could be abstracted by making a method on the formatter, which takes a length in Unicode scalars and a callback which writes the data: It would pad left according to the length, write via the callback, then pad right.
I would like to work on fixing this, pending discussion on which direction to take it :).
Related?
I suspect that other types could have similar issues with inconsistent or misplaced formatter flag handling, but have only surveyed those detailed here.
The text was updated successfully, but these errors were encountered: