JavaScript Check Unicode Character: Regex & codePointAt

To javascript check unicode character in a string, the cleanest one-liner is a regex against the Latin-1 range: /[^\u0000-\u00ff]/.test(str). It returns true if the string contains any character beyond plain ASCII + Latin-1 Supplement — emoji, CJK, Cyrillic, accented Latin Extended, etc. This guide also covers codePointAt (per-character check, surrogate-pair safe) and the modern \p{Emoji} property escape for emoji-specific detection.

Contents

TL;DR
The regex one-liner
Per-character check with codePointAt
Detecting emoji specifically
Frequently asked questions
Related guides
References

Last verified: 2026-05-17 in Chromium, Firefox, Safari, and Node.js 22. Originally published 2022-12-19, rewritten and updated 2026-05-17.

TL;DR

// Any character outside Latin-1?
/[^\u0000-\u00ff]/.test("Hello");      // false
/[^\u0000-\u00ff]/.test("Héllo");      // false (é is Latin-1)
/[^\u0000-\u00ff]/.test("Hello 世界");  // true
/[^\u0000-\u00ff]/.test("Hi 👋");       // true

// Per-character (surrogate-pair safe)
"😀".codePointAt(0);                    // 128512 (0x1F600)
"😀".codePointAt(0) > 0xFF;             // true

// Emoji specifically
/\p{Emoji}/u.test("Hello 👋");          // true
/\p{Emoji}/u.test("Hello 世界");         // false

The regex one-liner

The character class [^\u0000-\u00ff] matches any character whose code point is outside the Latin-1 range (U+0000 through U+00FF). If the regex matches anywhere in the string, you have at least one non-Latin-1 character — what most people mean when they say “Unicode.”

function hasUnicode(str) {
  return /[^\u0000-\u00ff]/.test(str);
}

hasUnicode("Hello, world!");      // false
hasUnicode("Hello, 世界!");        // true
hasUnicode("café résumé");        // false (Latin-1 covers these)
hasUnicode("Hi 👋");              // true

JavaScript check Unicode character — regex vs codePointAt vs Unicode property escape

Per-character check with codePointAt

To check a specific character (rather than scan the whole string), codePointAt returns the Unicode code point as a number:

const str = "Hello, world!";
const cp = str.codePointAt(0);     // 72 (H)

if (cp > 0xFF) {
  console.log("First character is above Latin-1.");
}

Prefer codePointAt over the older charCodeAt: charCodeAt returns UTF-16 code units, which split emoji and other supplementary-plane characters into surrogate pairs. codePointAt returns the real code point and handles surrogates correctly.

Detecting emoji specifically

“Has any non-Latin-1 character” is broader than “has emoji.” If you specifically want emoji detection, use a Unicode property escape with the u flag:

/\p{Emoji}/u.test("Hi 👋");        // true
/\p{Emoji}/u.test("Hi 世界");       // false (CJK, not emoji)
/\p{Emoji_Presentation}/u.test(s); // emoji that render as emoji by default

Property escapes are supported in modern browsers and Node.js 12+. They map to the official Unicode property data, so the list stays current with new emoji releases.

Frequently asked questions

What counts as a ‘Unicode character’ here?

In this post, a Unicode character means any code point outside the Latin-1 range (U+0000 through U+00FF) — anything beyond plain ASCII plus the Latin-1 Supplement. Technically every JavaScript string character is Unicode (strings are UTF-16), but in practice you usually want to know whether the string contains characters that won’t fit in a one-byte encoding: accented Latin Extended, Cyrillic, CJK, emoji, etc.

Why is codePointAt safer than charCodeAt?

charCodeAt returns a UTF-16 code unit — for characters above U+FFFF (most emoji, supplementary CJK), it returns half of a surrogate pair, not the real code point. codePointAt returns the actual Unicode code point, correctly handling surrogate pairs. For modern Unicode work — especially anything involving emoji or rare scripts — codePointAt is the right default.

Does the regex /[^\u0000-\u00ff]/ match emoji?

Yes — emoji are code points well above U+00FF, so they match the non-Latin-1 character class. If you want to detect emoji specifically (rather than any non-Latin-1 character), use a Unicode property escape like /\p{Emoji}/u, which requires the u flag and modern browsers / Node 12+.

How do I count Unicode characters correctly?

str.length returns the UTF-16 code unit count, not the user-perceived character count. '😀'.length is 2, not 1. To count code points use [...str].length (the spread iterator yields code points), and for grapheme clusters (the actual visual characters) use Intl.Segmenter: [...new Intl.Segmenter().segment(str)].length.

Can I use this to validate input is ASCII-only?

Yes — flip the check. /^[\x00-\x7f]*$/.test(str) returns true only for pure-ASCII strings (no Latin-1 Supplement either). Useful for fields that must be transmitted over ASCII-only protocols (some SMS gateways, certain legacy database columns).