source: webkit/trunk/JavaScriptCore/kjs/ustring.h@ 34861

Last change on this file since 34861 was 34821, checked in by Darin Adler, 17 years ago

2008-06-26 Darin Adler <Darin Adler>

Reviewed by Geoff.

  • optimize UString append and the replace function a bit

SunSpider says 1.8% faster.

  • VM/JSPropertyNameIterator.cpp: Added include of JSString.h, now needed because jsString returns a JSString*.
  • VM/Machine.cpp: (KJS::Machine::privateExecute): Removed the toObject call from native function calls. Also removed code to put the this value into a register.
  • kjs/BooleanObject.cpp: (KJS::booleanProtoFuncToString): Rewrite to handle false and true separately.
  • kjs/FunctionPrototype.cpp: (KJS::constructFunction): Use single-character append rather than building a string for each character.
  • kjs/JSFunction.cpp: (KJS::globalFuncUnescape): Ditto.
  • kjs/JSImmediate.cpp: (KJS::JSImmediate::prototype): Added. Gets the appropriate prototype for use with an immediate value. To be used instead of toObject when doing a get on an immediate value.
  • kjs/JSImmediate.h: Added prototype.
  • kjs/JSObject.cpp: (KJS::JSObject::toString): Tweaked formatting.
  • kjs/JSObject.h: (KJS::JSValue::get): Use prototype instead of toObject to avoid creating an object wrapper just to search for properties. This also saves an unnecessary hash table lookup since the object wrappers themselves don't have any properties.
  • kjs/JSString.h: Added toThisString and toThisJSString.
  • kjs/JSValue.cpp: (KJS::JSCell::toThisString): Added. (KJS::JSCell::toThisJSString): Added. (KJS::JSCell::getJSNumber): Added. (KJS::jsString): Changed return type to JSString*. (KJS::jsOwnedString): Ditto.
  • kjs/JSValue.h: (KJS::JSValue::toThisString): Added. (KJS::JSValue::toThisJSString): Added. (KJS::JSValue::getJSNumber): Added.
  • kjs/NumberObject.cpp: (KJS::NumberObject::getJSNumber): Added. (KJS::integer_part_noexp): Append C string directly rather than first turning it into a UString. (KJS::numberProtoFuncToString): Use getJSNumber to check if the value is a number rather than isObject(&NumberObject::info). This works for immediate numbers, number cells, and NumberObject instances. (KJS::numberProtoFuncToLocaleString): Ditto. (KJS::numberProtoFuncValueOf): Ditto. (KJS::numberProtoFuncToFixed): Ditto. (KJS::numberProtoFuncToExponential): Ditto. (KJS::numberProtoFuncToPrecision): Ditto.
  • kjs/NumberObject.h: Added getJSNumber.
  • kjs/PropertySlot.cpp: Tweaked comment.
  • kjs/internal.cpp: (KJS::JSString::toThisString): Added. (KJS::JSString::toThisJSString): Added. (KJS::JSString::getOwnPropertySlot): Changed code that searches the prototype chain to start with the string prototype and not create a string object. (KJS::JSNumberCell::toThisString): Added. (KJS::JSNumberCell::getJSNumber): Added.
  • kjs/lookup.cpp: (KJS::staticFunctionGetter): Moved here, because there's no point in having a function that's only used for a function pointer be inline. (KJS::setUpStaticFunctionSlot): New function for getStaticFunctionSlot.
  • kjs/lookup.h: (KJS::staticValueGetter): Don't mark this inline. It doesn't make sense to have a function that's only used for a function pointer be inline. (KJS::getStaticFunctionSlot): Changed to get properties from the parent first before doing any handling of functions. This is the fastest way to return the function once the initial setup is done.
  • kjs/string_object.cpp: (KJS::StringObject::getPropertyNames): Call value() instead of getString(), avoiding an unnecessary virtual function call (the call to the type() function in the implementation of the isString() function). (KJS::StringObject::toString): Added. (KJS::StringObject::toThisString): Added. (KJS::StringObject::toThisJSString): Added. (KJS::substituteBackreferences): Rewrote to use a appending algorithm instead of a the old one that tried to replace in place. (KJS::stringProtoFuncReplace): Merged this function and the replace function. Replaced the hand-rolled dynamic arrays for source ranges and replacements with Vector. (KJS::stringProtoFuncToString): Handle JSString as well as StringObject. Removed the separate valueOf implementation, since it can just share this. (KJS::stringProtoFuncCharAt): Use toThisString, which handles JSString as well as StringObject, and is slightly more efficient than the old code too. (KJS::stringProtoFuncCharCodeAt): Ditto. (KJS::stringProtoFuncConcat): Ditto. (KJS::stringProtoFuncIndexOf): Ditto. (KJS::stringProtoFuncLastIndexOf): Ditto. (KJS::stringProtoFuncMatch): Ditto. (KJS::stringProtoFuncSearch): Ditto. (KJS::stringProtoFuncSlice): Ditto. (KJS::stringProtoFuncSplit): Ditto. (KJS::stringProtoFuncSubstr): Ditto. (KJS::stringProtoFuncSubstring): Ditto. (KJS::stringProtoFuncToLowerCase): Use toThisJSString. (KJS::stringProtoFuncToUpperCase): Ditto. (KJS::stringProtoFuncToLocaleLowerCase): Ditto. (KJS::stringProtoFuncToLocaleUpperCase): Ditto. (KJS::stringProtoFuncLocaleCompare): Ditto. (KJS::stringProtoFuncBig): Use toThisString. (KJS::stringProtoFuncSmall): Ditto. (KJS::stringProtoFuncBlink): Ditto. (KJS::stringProtoFuncBold): Ditto. (KJS::stringProtoFuncFixed): Ditto. (KJS::stringProtoFuncItalics): Ditto. (KJS::stringProtoFuncStrike): Ditto. (KJS::stringProtoFuncSub): Ditto. (KJS::stringProtoFuncSup): Ditto. (KJS::stringProtoFuncFontcolor): Ditto. (KJS::stringProtoFuncFontsize): Ditto. (KJS::stringProtoFuncAnchor): Ditto. (KJS::stringProtoFuncLink): Ditto.
  • kjs/string_object.h: Added toString, toThisString, and toThisJSString.
  • kjs/ustring.cpp: (KJS::UString::append): Added a version that takes a character pointer and size, so we don't have to create a UString just to append to another UString.
  • kjs/ustring.h:
  • Property svn:eol-style set to native
File size: 14.8 KB
Line 
1// -*- c-basic-offset: 2 -*-
2/*
3 * Copyright (C) 1999-2000 Harri Porten ([email protected])
4 * Copyright (C) 2004, 2005, 2006, 2007, 2008 Apple Inc. All rights reserved.
5 *
6 * This library is free software; you can redistribute it and/or
7 * modify it under the terms of the GNU Library General Public
8 * License as published by the Free Software Foundation; either
9 * version 2 of the License, or (at your option) any later version.
10 *
11 * This library is distributed in the hope that it will be useful,
12 * but WITHOUT ANY WARRANTY; without even the implied warranty of
13 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
14 * Library General Public License for more details.
15 *
16 * You should have received a copy of the GNU Library General Public License
17 * along with this library; see the file COPYING.LIB. If not, write to
18 * the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
19 * Boston, MA 02110-1301, USA.
20 *
21 */
22
23#ifndef _KJS_USTRING_H_
24#define _KJS_USTRING_H_
25
26#include "JSLock.h"
27#include "collector.h"
28#include <stdint.h>
29#include <wtf/Assertions.h>
30#include <wtf/FastMalloc.h>
31#include <wtf/PassRefPtr.h>
32#include <wtf/RefPtr.h>
33#include <wtf/unicode/Unicode.h>
34#include <wtf/Vector.h>
35
36/**
37 * @internal
38 */
39namespace DOM {
40 class DOMString;
41 class AtomicString;
42}
43class KJScript;
44
45namespace KJS {
46
47 using WTF::PlacementNewAdoptType;
48 using WTF::PlacementNewAdopt;
49
50 class IdentifierTable;
51 class UString;
52
53 /**
54 * @short 8 bit char based string class
55 */
56 class CString {
57 public:
58 CString() : data(0), length(0) { }
59 CString(const char *c);
60 CString(const char *c, size_t len);
61 CString(const CString &);
62
63 ~CString();
64
65 static CString adopt(char* c, size_t len); // c should be allocated with new[].
66
67 CString &append(const CString &);
68 CString &operator=(const char *c);
69 CString &operator=(const CString &);
70 CString &operator+=(const CString &c) { return append(c); }
71
72 size_t size() const { return length; }
73 const char *c_str() const { return data; }
74 private:
75 char *data;
76 size_t length;
77 };
78
79 typedef Vector<char, 32> CStringBuffer;
80
81 /**
82 * @short Unicode string class
83 */
84 class UString {
85 friend bool operator==(const UString&, const UString&);
86
87 public:
88 /**
89 * @internal
90 */
91 struct Rep {
92
93 static PassRefPtr<Rep> create(UChar *d, int l);
94 static PassRefPtr<Rep> createCopying(const UChar *d, int l);
95 static PassRefPtr<Rep> create(PassRefPtr<Rep> base, int offset, int length);
96
97 // Constructs a string from a UTF-8 string, using strict conversion (see comments in UTF8.h).
98 // Returns UString::Rep::null for null input or conversion failure.
99 static PassRefPtr<Rep> createFromUTF8(const char*);
100
101 void destroy();
102
103 bool baseIsSelf() const { return baseString == this; }
104 UChar* data() const { return baseString->buf + baseString->preCapacity + offset; }
105 int size() const { return len; }
106
107 unsigned hash() const { if (_hash == 0) _hash = computeHash(data(), len); return _hash; }
108 unsigned computedHash() const { ASSERT(_hash); return _hash; } // fast path for Identifiers
109
110 static unsigned computeHash(const UChar *, int length);
111 static unsigned computeHash(const char *);
112
113 Rep* ref() { ++rc; return this; }
114 ALWAYS_INLINE void deref() { if (--rc == 0) destroy(); }
115
116 // unshared data
117 int offset;
118 int len;
119 int rc; // For null and empty static strings, this field does not reflect a correct count, because ref/deref are not thread-safe. A special case in destroy() guarantees that these do not get deleted.
120 mutable unsigned _hash;
121 IdentifierTable* identifierTable; // 0 if not an identifier. Since garbage collection can happen on a different thread, there is no other way to get to the table during destruction.
122 UString::Rep* baseString;
123 bool isStatic : 1;
124 size_t reportedCost : 31;
125
126 // potentially shared data
127 UChar *buf;
128 int usedCapacity;
129 int capacity;
130 int usedPreCapacity;
131 int preCapacity;
132
133 static Rep null;
134 static Rep empty;
135 };
136
137 public:
138
139 /**
140 * Constructs a null string.
141 */
142 UString();
143 /**
144 * Constructs a string from a classical zero-terminated char string.
145 */
146 UString(const char *c);
147 /**
148 * Constructs a string from an array of Unicode characters of the specified
149 * length.
150 */
151 UString(const UChar *c, int length);
152 /**
153 * If copy is false the string data will be adopted.
154 * That means that the data will NOT be copied and the pointer will
155 * be deleted when the UString object is modified or destroyed.
156 * Behaviour defaults to a deep copy if copy is true.
157 */
158 UString(UChar *c, int length, bool copy);
159 /**
160 * Copy constructor. Makes a shallow copy only.
161 */
162 UString(const UString &s) : m_rep(s.m_rep) {}
163
164 UString(const Vector<UChar>& buffer);
165
166 /**
167 * Convenience declaration only ! You'll be on your own to write the
168 * implementation for a construction from DOM::DOMString.
169 *
170 * Note: feel free to contact me if you want to see a dummy header for
171 * your favorite FooString class here !
172 */
173 UString(const DOM::DOMString&);
174 /**
175 * Convenience declaration only ! See UString(const DOM::DOMString&).
176 */
177 UString(const DOM::AtomicString&);
178
179 /**
180 * Concatenation constructor. Makes operator+ more efficient.
181 */
182 UString(const UString &, const UString &);
183 /**
184 * Destructor.
185 */
186 ~UString() {}
187
188 // Special constructor for cases where we overwrite an object in place.
189 UString(PlacementNewAdoptType) : m_rep(PlacementNewAdopt) { }
190
191 /**
192 * Constructs a string from an int.
193 */
194 static UString from(int i);
195 /**
196 * Constructs a string from an unsigned int.
197 */
198 static UString from(unsigned int u);
199 /**
200 * Constructs a string from a long int.
201 */
202 static UString from(long u);
203 /**
204 * Constructs a string from a double.
205 */
206 static UString from(double d);
207
208 struct Range {
209 public:
210 Range(int pos, int len) : position(pos), length(len) {}
211 Range() {}
212 int position;
213 int length;
214 };
215
216 UString spliceSubstringsWithSeparators(const Range* substringRanges, int rangeCount, const UString* separators, int separatorCount) const;
217
218 /**
219 * Append another string.
220 */
221 UString& append(const UString&);
222 UString& append(const char*);
223 UString& append(UChar);
224 UString& append(char c) { return append(static_cast<UChar>(static_cast<unsigned char>(c))); }
225 UString& append(const UChar*, int size);
226
227 /**
228 * @return The string converted to the 8-bit string type CString().
229 * Returns false if any character is non-ASCII.
230 */
231 bool getCString(CStringBuffer&) const;
232
233 /**
234 * Convert the Unicode string to plain ASCII chars chopping off any higher
235 * bytes. This method should only be used for *debugging* purposes as it
236 * is neither Unicode safe nor free from side effects nor thread-safe.
237 * In order not to waste any memory the char buffer is static and *shared*
238 * by all UString instances.
239 */
240 char* ascii() const;
241
242 /**
243 * Convert the string to UTF-8, assuming it is UTF-16 encoded.
244 * In non-strict mode, this function is tolerant of badly formed UTF-16, it
245 * can create UTF-8 strings that are invalid because they have characters in
246 * the range U+D800-U+DDFF, U+FFFE, or U+FFFF, but the UTF-8 string is
247 * guaranteed to be otherwise valid.
248 * In strict mode, error is returned as null CString.
249 */
250 CString UTF8String(bool strict = false) const;
251
252 /**
253 * @see UString(const DOM::DOMString&).
254 */
255 DOM::DOMString domString() const;
256
257 /**
258 * Assignment operator.
259 */
260 UString &operator=(const char *c);
261 /**
262 * Appends the specified string.
263 */
264 UString &operator+=(const UString &s) { return append(s); }
265 UString &operator+=(const char *s) { return append(s); }
266
267 /**
268 * @return A pointer to the internal Unicode data.
269 */
270 const UChar* data() const { return m_rep->data(); }
271 /**
272 * @return True if null.
273 */
274 bool isNull() const { return (m_rep == &Rep::null); }
275 /**
276 * @return True if null or zero length.
277 */
278 bool isEmpty() const { return (!m_rep->len); }
279 /**
280 * Use this if you want to make sure that this string is a plain ASCII
281 * string. For example, if you don't want to lose any information when
282 * using cstring() or ascii().
283 *
284 * @return True if the string doesn't contain any non-ASCII characters.
285 */
286 bool is8Bit() const;
287 /**
288 * @return The length of the string.
289 */
290 int size() const { return m_rep->size(); }
291 /**
292 * Const character at specified position.
293 */
294 UChar operator[](int pos) const;
295
296 /**
297 * Attempts an conversion to a number. Apart from floating point numbers,
298 * the algorithm will recognize hexadecimal representations (as
299 * indicated by a 0x or 0X prefix) and +/- Infinity.
300 * Returns NaN if the conversion failed.
301 * @param tolerateTrailingJunk if true, toDouble can tolerate garbage after the number.
302 * @param tolerateEmptyString if false, toDouble will turn an empty string into NaN rather than 0.
303 */
304 double toDouble(bool tolerateTrailingJunk, bool tolerateEmptyString) const;
305 double toDouble(bool tolerateTrailingJunk) const;
306 double toDouble() const;
307
308 /**
309 * Attempts an conversion to a 32-bit integer. ok will be set
310 * according to the success.
311 * @param tolerateEmptyString if false, toUInt32 will return false for *ok for an empty string.
312 */
313 uint32_t toUInt32(bool *ok = 0) const;
314 uint32_t toUInt32(bool *ok, bool tolerateEmptyString) const;
315 uint32_t toStrictUInt32(bool *ok = 0) const;
316
317 /**
318 * Attempts an conversion to an array index. The "ok" boolean will be set
319 * to true if it is a valid array index according to the rule from
320 * ECMA 15.2 about what an array index is. It must exactly match the string
321 * form of an unsigned integer, and be less than 2^32 - 1.
322 */
323 unsigned toArrayIndex(bool *ok = 0) const;
324
325 /**
326 * @return Position of first occurrence of f starting at position pos.
327 * -1 if the search was not successful.
328 */
329 int find(const UString &f, int pos = 0) const;
330 int find(UChar, int pos = 0) const;
331 /**
332 * @return Position of first occurrence of f searching backwards from
333 * position pos.
334 * -1 if the search was not successful.
335 */
336 int rfind(const UString &f, int pos) const;
337 int rfind(UChar, int pos) const;
338 /**
339 * @return The sub string starting at position pos and length len.
340 */
341 UString substr(int pos = 0, int len = -1) const;
342 /**
343 * Static instance of a null string.
344 */
345 static const UString &null();
346
347 Rep* rep() const { return m_rep.get(); }
348 UString(PassRefPtr<Rep> r) : m_rep(r) { ASSERT(m_rep); }
349
350 size_t cost() const;
351
352 private:
353 size_t expandedSize(size_t size, size_t otherSize) const;
354 int usedCapacity() const;
355 int usedPreCapacity() const;
356 void expandCapacity(int requiredLength);
357 void expandPreCapacity(int requiredPreCap);
358
359 RefPtr<Rep> m_rep;
360 };
361
362 bool operator==(const UString& s1, const UString& s2);
363 inline bool operator!=(const UString& s1, const UString& s2) {
364 return !KJS::operator==(s1, s2);
365 }
366 bool operator<(const UString& s1, const UString& s2);
367 bool operator>(const UString& s1, const UString& s2);
368 bool operator==(const UString& s1, const char *s2);
369 inline bool operator!=(const UString& s1, const char *s2) {
370 return !KJS::operator==(s1, s2);
371 }
372 inline bool operator==(const char *s1, const UString& s2) {
373 return operator==(s2, s1);
374 }
375 inline bool operator!=(const char *s1, const UString& s2) {
376 return !KJS::operator==(s1, s2);
377 }
378 bool operator==(const CString& s1, const CString& s2);
379 inline UString operator+(const UString& s1, const UString& s2) {
380 return UString(s1, s2);
381 }
382
383 int compare(const UString &, const UString &);
384
385 bool equal(const UString::Rep*, const UString::Rep*);
386
387
388inline UString::UString()
389 : m_rep(&Rep::null)
390{
391}
392
393// Rule from ECMA 15.2 about what an array index is.
394// Must exactly match string form of an unsigned integer, and be less than 2^32 - 1.
395inline unsigned UString::toArrayIndex(bool *ok) const
396{
397 unsigned i = toStrictUInt32(ok);
398 if (ok && i >= 0xFFFFFFFFU)
399 *ok = false;
400 return i;
401}
402
403// We'd rather not do shared substring append for small strings, since
404// this runs too much risk of a tiny initial string holding down a
405// huge buffer.
406// FIXME: this should be size_t but that would cause warnings until we
407// fix UString sizes to be size_t instead of int
408static const int minShareSize = Heap::minExtraCostSize / sizeof(UChar);
409
410inline size_t UString::cost() const
411{
412 size_t capacity = (m_rep->baseString->capacity + m_rep->baseString->preCapacity) * sizeof(UChar);
413 size_t reportedCost = m_rep->baseString->reportedCost;
414 ASSERT(capacity >= reportedCost);
415
416 size_t capacityDelta = capacity - reportedCost;
417
418 if (capacityDelta < static_cast<size_t>(minShareSize))
419 return 0;
420
421#if COMPILER(MSVC)
422// MSVC complains about this assignment, since reportedCost is a 31-bit size_t.
423#pragma warning(push)
424#pragma warning(disable: 4267)
425#endif
426
427 m_rep->baseString->reportedCost = capacity;
428
429#if COMPILER(MSVC)
430#pragma warning(pop)
431#endif
432
433 return capacityDelta;
434}
435
436} // namespace KJS
437
438
439namespace WTF {
440
441 template<typename T> struct DefaultHash;
442 template<typename T> struct StrHash;
443
444 template<> struct StrHash<KJS::UString::Rep*> {
445 static unsigned hash(const KJS::UString::Rep* key) { return key->hash(); }
446 static bool equal(const KJS::UString::Rep* a, const KJS::UString::Rep* b) { return KJS::equal(a, b); }
447 static const bool safeToCompareToEmptyOrDeleted = false;
448 };
449
450 template<> struct StrHash<RefPtr<KJS::UString::Rep> > : public StrHash<KJS::UString::Rep*> {
451 using StrHash<KJS::UString::Rep*>::hash;
452 static unsigned hash(const RefPtr<KJS::UString::Rep>& key) { return key->hash(); }
453 using StrHash<KJS::UString::Rep*>::equal;
454 static bool equal(const RefPtr<KJS::UString::Rep>& a, const RefPtr<KJS::UString::Rep>& b) { return KJS::equal(a.get(), b.get()); }
455 static bool equal(const KJS::UString::Rep* a, const RefPtr<KJS::UString::Rep>& b) { return KJS::equal(a, b.get()); }
456 static bool equal(const RefPtr<KJS::UString::Rep>& a, const KJS::UString::Rep* b) { return KJS::equal(a.get(), b); }
457
458 static const bool safeToCompareToEmptyOrDeleted = false;
459 };
460
461 template<> struct DefaultHash<KJS::UString::Rep*> {
462 typedef StrHash<KJS::UString::Rep*> Hash;
463 };
464
465 template<> struct DefaultHash<RefPtr<KJS::UString::Rep> > {
466 typedef StrHash<RefPtr<KJS::UString::Rep> > Hash;
467 };
468} // namespace WTF
469
470#endif
Note: See TracBrowser for help on using the repository browser.