@@ -31,6 +31,12 @@ Unicode Type
31
31
These are the basic Unicode object types used for the Unicode implementation in
32
32
Python:
33
33
34
+ .. c :var :: PyTypeObject PyUnicode_Type
35
+
36
+ This instance of :c:type: `PyTypeObject ` represents the Python Unicode type. It
37
+ is exposed to Python code as :py:class: `str `.
38
+
39
+
34
40
.. c :type :: Py_UCS4
35
41
Py_UCS2
36
42
Py_UCS1
@@ -42,19 +48,6 @@ Python:
42
48
.. versionadded :: 3.3
43
49
44
50
45
- .. c :type :: Py_UNICODE
46
-
47
- This is a typedef of :c:type: `wchar_t `, which is a 16-bit type or 32-bit type
48
- depending on the platform.
49
-
50
- .. versionchanged :: 3.3
51
- In previous versions, this was a 16-bit type or a 32-bit type depending on
52
- whether you selected a "narrow" or "wide" Unicode version of Python at
53
- build time.
54
-
55
- .. deprecated-removed :: 3.13 3.15
56
-
57
-
58
51
.. c :type :: PyASCIIObject
59
52
PyCompactUnicodeObject
60
53
PyUnicodeObject
@@ -66,12 +59,6 @@ Python:
66
59
.. versionadded :: 3.3
67
60
68
61
69
- .. c :var :: PyTypeObject PyUnicode_Type
70
-
71
- This instance of :c:type: `PyTypeObject ` represents the Python Unicode type. It
72
- is exposed to Python code as ``str ``.
73
-
74
-
75
62
The following APIs are C macros and static inlined functions for fast checks and
76
63
access to internal read-only data of Unicode objects:
77
64
@@ -87,16 +74,6 @@ access to internal read-only data of Unicode objects:
87
74
subtype. This function always succeeds.
88
75
89
76
90
- .. c :function :: int PyUnicode_READY (PyObject *unicode)
91
-
92
- Returns ``0 ``. This API is kept only for backward compatibility.
93
-
94
- .. versionadded :: 3.3
95
-
96
- .. deprecated :: 3.10
97
- This API does nothing since Python 3.12.
98
-
99
-
100
77
.. c :function :: Py_ssize_t PyUnicode_GET_LENGTH (PyObject *unicode)
101
78
102
79
Return the length of the Unicode string, in code points. *unicode * has to be a
@@ -149,12 +126,16 @@ access to internal read-only data of Unicode objects:
149
126
.. c:function:: void PyUnicode_WRITE(int kind, void *data, \
150
127
Py_ssize_t index, Py_UCS4 value)
151
128
152
- Write into a canonical representation *data * (as obtained with
153
- :c:func: `PyUnicode_DATA `). This function performs no sanity checks, and is
154
- intended for usage in loops. The caller should cache the *kind* value and
155
- *data* pointer as obtained from other calls. *index* is the index in
156
- the string (starts at 0) and *value* is the new code point value which should
157
- be written to that location.
129
+ Write the code point *value * to the given zero-based *index * in a string.
130
+
131
+ The *kind * value and *data * pointer must have been obtained from a
132
+ string using :c:func: `PyUnicode_KIND ` and :c:func: `PyUnicode_DATA `
133
+ respectively. You must hold a reference to that string while calling
134
+ :c:func: `!PyUnicode_WRITE `. All requirements of
135
+ :c:func: `PyUnicode_WriteChar ` also apply.
136
+
137
+ The function performs no checks for any of its requirements,
138
+ and is intended for usage in loops.
158
139
159
140
.. versionadded :: 3.3
160
141
@@ -196,6 +177,14 @@ access to internal read-only data of Unicode objects:
196
177
is not ready.
197
178
198
179
180
+ .. c :function :: unsigned int PyUnicode_IS_ASCII (PyObject *unicode)
181
+
182
+ Return true if the string only contains ASCII characters.
183
+ Equivalent to :py:meth: `str.isascii `.
184
+
185
+ .. versionadded :: 3.2
186
+
187
+
199
188
Unicode Character Properties
200
189
""""""""""""""""""""""""""""
201
190
@@ -330,11 +319,29 @@ APIs:
330
319
to be placed in the string. As an approximation, it can be rounded up to the
331
320
nearest value in the sequence 127, 255, 65535, 1114111.
332
321
333
- This is the recommended way to allocate a new Unicode object. Objects
334
- created using this function are not resizable.
335
-
336
322
On error, set an exception and return ``NULL``.
337
323
324
+ After creation, the string can be filled by :c:func:`PyUnicode_WriteChar`,
325
+ :c:func:`PyUnicode_CopyCharacters`, :c:func:`PyUnicode_Fill`,
326
+ :c:func:`PyUnicode_WRITE` or similar.
327
+ Since strings are supposed to be immutable, take care to not “use” the
328
+ result while it is being modified. In particular, before it's filled
329
+ with its final contents, a string:
330
+
331
+ - must not be hashed,
332
+ - must not be :c:func:`converted to UTF-8 <PyUnicode_AsUTF8AndSize>`,
333
+ or another non-"canonical" representation,
334
+ - must not have its reference count changed,
335
+ - must not be shared with code that might do one of the above.
336
+
337
+ This list is not exhaustive. Avoiding these uses is your responsibility;
338
+ Python does not always check these requirements.
339
+
340
+ To avoid accidentally exposing a partially-written string object, prefer
341
+ using the :c:type: `PyUnicodeWriter ` API, or one of the ``PyUnicode_From* ``
342
+ functions below.
343
+
344
+
338
345
.. versionadded :: 3.3
339
346
340
347
@@ -636,6 +643,9 @@ APIs:
636
643
possible. Returns ``-1 `` and sets an exception on error, otherwise returns
637
644
the number of copied characters.
638
645
646
+ The string must not have been “used” yet.
647
+ See :c:func: `PyUnicode_New ` for details.
648
+
639
649
.. versionadded :: 3.3
640
650
641
651
@@ -648,6 +658,9 @@ APIs:
648
658
Fail if *fill_char * is bigger than the string maximum character, or if the
649
659
string has more than 1 reference.
650
660
661
+ The string must not have been “used” yet.
662
+ See :c:func: `PyUnicode_New ` for details.
663
+
651
664
Return the number of written character, or return ``-1 `` and raise an
652
665
exception on error.
653
666
@@ -657,15 +670,16 @@ APIs:
657
670
.. c :function :: int PyUnicode_WriteChar (PyObject *unicode, Py_ssize_t index, \
658
671
Py_UCS4 character)
659
672
660
- Write a character to a string. The string must have been created through
661
- :c:func: `PyUnicode_New `. Since Unicode strings are supposed to be immutable,
662
- the string must not be shared, or have been hashed yet.
673
+ Write a *character * to the string *unicode * at the zero-based *index *.
674
+ Return ``0 `` on success, ``-1 `` on error with an exception set.
663
675
664
676
This function checks that *unicode * is a Unicode object, that the index is
665
- not out of bounds, and that the object can be modified safely (i.e. that it
666
- its reference count is one).
677
+ not out of bounds, and that the object's reference count is one).
678
+ See :c:func:`PyUnicode_WRITE` for a version that skips these checks,
679
+ making them your responsibility.
667
680
668
- Return ``0`` on success, ``-1`` on error with an exception set.
681
+ The string must not have been “used” yet.
682
+ See :c:func:`PyUnicode_New` for details.
669
683
670
684
.. versionadded:: 3.3
671
685
@@ -1649,6 +1663,20 @@ They all return ``NULL`` or ``-1`` if an exception occurs.
1649
1663
Strings interned this way are made :term:`immortal`.
1650
1664
1651
1665
1666
+ .. c:function:: unsigned int PyUnicode_CHECK_INTERNED(PyObject *str)
1667
+
1668
+ Return a non-zero value if *str * is interned, zero if not.
1669
+ The *str * argument must be a string; this is not checked.
1670
+ This function always succeeds.
1671
+
1672
+ .. impl-detail ::
1673
+
1674
+ A non-zero return value may carry additional information
1675
+ about *how * the string is interned.
1676
+ The meaning of such non-zero values, as well as each specific string's
1677
+ intern-related details, may change between CPython versions.
1678
+
1679
+
1652
1680
PyUnicodeWriter
1653
1681
^^^^^^^^^^^^^^^
1654
1682
@@ -1769,8 +1797,8 @@ object.
1769
1797
*size * is the string length in bytes. If *size * is equal to ``-1 ``, call
1770
1798
``strlen(str) `` to get the string length.
1771
1799
1772
- *errors * is an error handler name, such as `` "replace" ``. If * errors * is
1773
- ``NULL ``, use the strict error handler.
1800
+ *errors * is an :ref: ` error handler < error-handlers >` name, such as
1801
+ ``"replace" ``. If * errors * is `` NULL ``, use the strict error handler.
1774
1802
1775
1803
If *consumed * is not ``NULL ``, set *\* consumed * to the number of decoded
1776
1804
bytes on success.
@@ -1781,3 +1809,49 @@ object.
1781
1809
On error, set an exception, leave the writer unchanged, and return ``-1 ``.
1782
1810
1783
1811
See also :c:func: `PyUnicodeWriter_WriteUTF8 `.
1812
+
1813
+ Deprecated API
1814
+ ^^^^^^^^^^^^^^
1815
+
1816
+ The following API is deprecated.
1817
+
1818
+ .. c :type :: Py_UNICODE
1819
+
1820
+ This is a typedef of :c:type: `wchar_t `, which is a 16-bit type or 32-bit type
1821
+ depending on the platform.
1822
+ Please use :c:type: `wchar_t ` directly instead.
1823
+
1824
+ .. versionchanged :: 3.3
1825
+ In previous versions, this was a 16-bit type or a 32-bit type depending on
1826
+ whether you selected a "narrow" or "wide" Unicode version of Python at
1827
+ build time.
1828
+
1829
+ .. deprecated-removed :: 3.13 3.15
1830
+
1831
+
1832
+ .. c :function :: int PyUnicode_READY (PyObject *unicode)
1833
+
1834
+ Do nothing and return ``0 ``.
1835
+ This API is kept only for backward compatibility, but there are no plans
1836
+ to remove it.
1837
+
1838
+ .. versionadded :: 3.3
1839
+
1840
+ .. deprecated :: 3.10
1841
+ This API does nothing since Python 3.12.
1842
+ Previously, this needed to be called for each string created using
1843
+ the old API (:c:func: `!PyUnicode_FromUnicode ` or similar).
1844
+
1845
+
1846
+ .. c:function:: unsigned int PyUnicode_IS_READY(PyObject *unicode)
1847
+
1848
+ Do nothing and return ``1 ``.
1849
+ This API is kept only for backward compatibility, but there are no plans
1850
+ to remove it.
1851
+
1852
+ .. versionadded :: 3.3
1853
+
1854
+ .. deprecated :: next
1855
+ This API does nothing since Python 3.12.
1856
+ Previously, this could be called to check if
1857
+ :c:func: `PyUnicode_READY ` is necessary.
0 commit comments