Skip to content

Path/OsStr incorrectly handle Formatter flags #136617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
thaliaarchi opened this issue Feb 6, 2025 · 2 comments · May be fixed by #136677
Open

Path/OsStr incorrectly handle Formatter flags #136617

thaliaarchi opened this issue Feb 6, 2025 · 2 comments · May be fixed by #136677
Assignees
Labels
A-fmt Area: `core::fmt` C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@thaliaarchi
Copy link
Contributor

thaliaarchi commented Feb 6, 2025

The Display implementations for Path, PathBuf, OsStr, and OsString incorrectly handle Formatter<'_> flags. This is visible on stable (for paths) or with #![feature(os_str_display)] (for OS strings).

Background

On Windows, these types convert from wide strings (u16) to WTF-8 and the only invalid sequences are unpaired surrogate halves. On other systems, these types use conventional UTF-8 and may have any kind of invalid sequence. The Display implementations replace such invalid sequences with the replacement character U+FFFD. This is done by iterating over valid chunks, where each is followed by an invalid sequence.

Bugs

In the Windows implementation, Wtf8, when the full string is valid UTF-8, it forwards the formatter args by calling Display::fmt(s, f), so it is padded as requested in most cases. However, when it is invalid, the formatter args are ignored. In the non-Windows implementation, Bytes, the last valid chunk is formatted with Display::fmt(last_chunk, f), but all other chunks with f.write_str(s). This is not only inconsistent, but it can display padding inside the string data.

This illustrates the buggy Display implementations in (AFAIK) all the APIs they're exposed in:

#![feature(os_str_display)]
use std::{ffi::OsString, path::PathBuf};

#[cfg(unix)]
{
    use std::os::unix::ffi::OsStringExt;
    display(OsString::from_vec(b"b\xFFd".to_vec()), "ab�    d     e"); // BUG
    display(OsString::from_vec(b"bcd".to_vec()), "a   bcd    e");
}
#[cfg(windows)]
{
    use std::os::windows::ffi::OsStringExt;
    macro_rules! wtf8[($($c:literal),*) => { &[$($c as u16),*] }];
    display(OsString::from_wide(wtf8![b'b', 0xD800, b'd']), "ab�de"); // BUG
    display(OsString::from_wide(wtf8![b'b', b'c', b'd']), "a   bcd    e");
}
assert_eq!(format!("a{:^10}e", "bcd"), "a   bcd    e");

fn display(s: OsString, expect: &str) {
    let p = PathBuf::from(s.clone());
    assert_eq!(format!("a{:^10}e", s.display()), expect);
    assert_eq!(format!("a{:^10}e", s.as_os_str().display()), expect);
    assert_eq!(format!("a{:^10}e", p.display()), expect);
    assert_eq!(format!("a{:^10}e", p.as_path().display()), expect);
}

History

  • 742ca0caf25 (std: Respect formatting flags for str-like OsStr, 2017-08-12): Forward formatter flags for fully valid strings
  • ac96fd77874 (Avoid allocations in Debug for os_str, 2017-06-12): Display with Utf8Chunks-style iteration
  • 45ddf50cebd (Add new path module, 2015-01-29): Display with String::from_utf8_lossy

Fix

We cannot simply ignore the formatter args, because that would break backwards compatibility, as the common case of fully valid UTF-8 respects padding, matching str and String.

The simplest fix would be to bring the behavior of non-Windows in line with Windows, i.e., fix the erroneous padding for the last chunk by only forwarding formatter args if the full string is valid.

A simple fix would be to format self.to_string_lossy() instead of doing a manual iteration. However, this incurs an allocation when invalid.

I propose doing the formatting by scanning in two passes. The first pass counts the number of Unicode scalars to render. Since there's no escaping, this is relatively inexpensive. Then the left padding is written. Then the second pass writes the string. Finally, the right padding is written. This could be abstracted by making a method on the formatter, which takes a length in Unicode scalars and a callback which writes the data: It would pad left according to the length, write via the callback, then pad right.

I would like to work on fixing this, pending discussion on which direction to take it :).

Related?

I suspect that other types could have similar issues with inconsistent or misplaced formatter flag handling, but have only surveyed those detailed here.

@thaliaarchi thaliaarchi added the C-bug Category: This is a bug. label Feb 6, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 6, 2025
@jieyouxu jieyouxu added T-libs Relevant to the library team, which will review and decide on the PR/issue. O-windows Operating system: Windows and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Feb 6, 2025
@thaliaarchi
Copy link
Contributor Author

@jieyouxu I notice you added a O-windows tag. Although this affects WTF-8 for Windows, it also affects the non-Windows version which has a more severe bug.

@jieyouxu jieyouxu removed the O-windows Operating system: Windows label Feb 6, 2025
@workingjubilee workingjubilee added the A-fmt Area: `core::fmt` label Feb 6, 2025
thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Feb 7, 2025
The Display implementation for `OsStr` and `Path` on Windows (the WTF-8
version) only handles formatter flags when the entire string is valid
UTF-8. As most paths are valid UTF-8, the common case is formatted like
`str`; however, flags are ignored when they contain an unpaired
surrogate. Implement its Display with the same logic as that of `str`.

Fixes rust-lang#136617 for Windows.
thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Feb 7, 2025
The Display implementation for `OsStr` and `Path` on non-Windows systems
(the byte-oriented version) only handles formatter flags when the entire
string is valid UTF-8. As most paths are valid UTF-8, the common case is
formatted like `str`; however, flags are ignored when they contain an
invalid UTF-8 sequence. Implement its Display with the same logic as
that of `str`.

Additionally, it forwards flags when formatting the last chunk of valid
UTF-8, so it erroneously inserts padding within the string when it has
invalid UTF-8. Fix this.

Fixes rust-lang#136617 for non-Windows systems.
@thaliaarchi thaliaarchi linked a pull request Feb 7, 2025 that will close this issue
thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Feb 7, 2025
The Display implementation for `OsStr` and `Path` on non-Windows systems
(the byte-oriented version) only handles formatter flags when the entire
string is valid UTF-8. As most paths are valid UTF-8, the common case is
formatted like `str`; however, flags are ignored when they contain an
invalid UTF-8 sequence. Implement its Display with the same logic as
that of `str`.

Additionally, it forwards flags when formatting the last chunk of valid
UTF-8, so it erroneously inserts padding within the string when it has
invalid UTF-8. Fix this.

Fixes rust-lang#136617 for non-Windows systems.
@thaliaarchi
Copy link
Contributor Author

@rustbot claim

thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Feb 16, 2025
The Display implementation for `OsStr` and `Path` on Windows (the WTF-8
version) only handles formatter flags when the entire string is valid
UTF-8. As most paths are valid UTF-8, the common case is formatted like
`str`; however, flags are ignored when they contain an unpaired
surrogate. Implement its Display with the same logic as that of `str`.

Fixes rust-lang#136617 for Windows.
thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Feb 16, 2025
The Display implementation for `OsStr` and `Path` on non-Windows systems
(the byte-oriented version) only handles formatter flags when the entire
string is valid UTF-8. As most paths are valid UTF-8, the common case is
formatted like `str`; however, flags are ignored when they contain an
invalid UTF-8 sequence. Implement its Display with the same logic as
that of `str`.

Additionally, it forwards flags when formatting the last chunk of valid
UTF-8, so it erroneously inserts padding within the string when it has
invalid UTF-8. Fix this.

Fixes rust-lang#136617 for non-Windows systems.
thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Feb 17, 2025
The Display implementation for `OsStr` and `Path` on Windows (the WTF-8
version) only handles formatter flags when the entire string is valid
UTF-8. As most paths are valid UTF-8, the common case is formatted like
`str`; however, flags are ignored when they contain an unpaired
surrogate. Implement its Display with the same logic as that of `str`.

Fixes rust-lang#136617 for Windows.
thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Feb 17, 2025
The Display implementation for `OsStr` and `Path` on non-Windows systems
(the byte-oriented version) only handles formatter flags when the entire
string is valid UTF-8. As most paths are valid UTF-8, the common case is
formatted like `str`; however, flags are ignored when they contain an
invalid UTF-8 sequence. Implement its Display with the same logic as
that of `str`.

Additionally, it forwards flags when formatting the last chunk of valid
UTF-8, so it erroneously inserts padding within the string when it has
invalid UTF-8. Fix this.

Fixes rust-lang#136617 for non-Windows systems.
thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Feb 20, 2025
The Display implementation for `OsStr` and `Path` on Windows (the WTF-8
version) only handles formatter flags when the entire string is valid
UTF-8. As most paths are valid UTF-8, the common case is formatted like
`str`; however, flags are ignored when they contain an unpaired
surrogate. Implement its Display with the same logic as that of `str`.

Fixes rust-lang#136617 for Windows.
thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Feb 20, 2025
The Display implementation for `OsStr` and `Path` on non-Windows systems
(the byte-oriented version) only handles formatter flags when the entire
string is valid UTF-8. As most paths are valid UTF-8, the common case is
formatted like `str`; however, flags are ignored when they contain an
invalid UTF-8 sequence. Implement its Display with the same logic as
that of `str`.

Additionally, it forwards flags when formatting the last chunk of valid
UTF-8, so it erroneously inserts padding within the string when it has
invalid UTF-8. Fix this.

Fixes rust-lang#136617 for non-Windows systems.
thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Feb 26, 2025
The Display implementation for `OsStr` and `Path` on Windows (the WTF-8
version) only handles formatter flags when the entire string is valid
UTF-8. As most paths are valid UTF-8, the common case is formatted like
`str`; however, flags are ignored when they contain an unpaired
surrogate. Implement its Display with the same logic as that of `str`.

Fixes rust-lang#136617 for Windows.
thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Feb 26, 2025
The Display implementation for `OsStr` and `Path` on non-Windows systems
(the byte-oriented version) only handles formatter flags when the entire
string is valid UTF-8. As most paths are valid UTF-8, the common case is
formatted like `str`; however, flags are ignored when they contain an
invalid UTF-8 sequence. Implement its Display with the same logic as
that of `str`.

Additionally, it forwards flags when formatting the last chunk of valid
UTF-8, so it erroneously inserts padding within the string when it has
invalid UTF-8. Fix this.

Fixes rust-lang#136617 for non-Windows systems.
thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Mar 12, 2025
The Display implementation for `OsStr` and `Path` on Windows (the WTF-8
version) only handles formatter flags when the entire string is valid
UTF-8. As most paths are valid UTF-8, the common case is formatted like
`str`; however, flags are ignored when they contain an unpaired
surrogate. Implement its Display with the same logic as that of `str`.

Fixes rust-lang#136617 for Windows.
thaliaarchi added a commit to thaliaarchi/rust that referenced this issue Mar 12, 2025
The Display implementation for `OsStr` and `Path` on non-Windows systems
(the byte-oriented version) only handles formatter flags when the entire
string is valid UTF-8. As most paths are valid UTF-8, the common case is
formatted like `str`; however, flags are ignored when they contain an
invalid UTF-8 sequence. Implement its Display with the same logic as
that of `str`.

Additionally, it forwards flags when formatting the last chunk of valid
UTF-8, so it erroneously inserts padding within the string when it has
invalid UTF-8. Fix this.

Fixes rust-lang#136617 for non-Windows systems.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-fmt Area: `core::fmt` C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants