Get the First Character of a String in PHP (UTF-8 safe)

To get the first character of a string in PHP, use $str[0] for ASCII (or bytes), mb_substr($str, 0, 1) for UTF-8 and other multibyte encodings, or grapheme_substr($str, 0, 1) for grapheme-aware extraction (combining marks, emoji clusters).

Contents

ASCII / single-byte strings
UTF-8 / multibyte strings
Grapheme-aware (emoji clusters, combining marks)
Reading character by character (the source’s use case)
Frequently asked questions
Related guides
References

Last verified: 2026-05-17 on PHP 8.3. Originally published 2022-10-18, rewritten and updated 2026-05-17.

ASCII / single-byte strings

$input = 'ab12476541';

// Bracket notation
$first = $input[0] ?? null;   // 'a'

// substr() — same result, no warning on empty input
$first = substr($input, 0, 1); // 'a'

Both read one byte. For ASCII (digits, basic Latin letters, punctuation), a byte is the same as a character — both methods work.

PHP first character — bracket and substr for ASCII, mb_substr for UTF-8, grapheme_substr for clusters

UTF-8 / multibyte strings

$input = 'éclair';

$input[0];                          // 'Ã'   — wrong: one byte of a 2-byte char
substr($input, 0, 1);               // 'Ã'   — same bug, bytes again
mb_substr($input, 0, 1);            // 'é'   — correct

UTF-8 stores each non-ASCII character as 2–4 bytes. Byte-based methods break those characters in half. mb_substr() is character-aware and handles the encoding correctly.

Grapheme-aware (emoji clusters, combining marks)

$input = '👍🏽 ok';

mb_substr($input, 0, 1);             // '👍'   — drops the skin-tone modifier
grapheme_substr($input, 0, 1);       // '👍🏽'  — keeps the whole cluster

Some emoji and accented characters are clusters — one user-visible character made of several code points. grapheme_substr() (from the intl extension) treats clusters as single units. Use it for usernames, social-media text, or anywhere user-perceived characters matter.

Reading character by character (the source’s use case)

$input = 'ab12476541';

// ASCII — iterate bytes
for ($i = 0, $n = strlen($input); $i < $n; $i++) {
    echo $input[$i];
}

// UTF-8 — iterate characters
$n = mb_strlen($input);
for ($i = 0; $i < $n; $i++) {
    echo mb_substr($input, $i, 1);
}

// Convert to an array of characters in one call
$chars = mb_str_split($input);   // PHP 7.4+
print_r($chars);

mb_str_split() is the one-liner: it splits a UTF-8 string into an array of characters, each entry being one character regardless of byte width. Cleaner than indexing in a loop when you need every character.

Frequently asked questions

Why does $str[0] sometimes return garbage on UTF-8 strings?

$str[0] reads a single byte, not a character. ASCII characters are one byte each, so it works fine for plain English. Multibyte characters (anything past basic ASCII — accented letters, CJK, emoji) are 2–4 bytes in UTF-8, so reading one byte gives you a fragment that’s not a valid character. Use mb_substr() for any string that might contain non-ASCII text.

Is $str[0] safe when the string might be empty?

In modern PHP it raises a warning when the offset doesn’t exist, then returns empty. The null-coalescing pattern $str[0] ?? null gives a clean default. substr($str, 0, 1) returns the empty string on empty input without raising — slightly more forgiving if you don’t want warnings.

What’s the difference between substr() and mb_substr()?

substr() works in bytes — fine for ASCII, wrong for multibyte. mb_substr() takes an encoding (defaults to mb_internal_encoding(), usually UTF-8) and operates on characters. If your data could include user names, addresses, or any free-form text, mb_substr() is the safe default. The mbstring extension must be enabled (it is on almost every modern PHP install).

How do I get the first grapheme (user-perceived character, not byte or code point)?

Use the intl extension’s grapheme_substr(): grapheme_substr($str, 0, 1). This handles combining marks (é as e + ́), regional-indicator emoji (flags like 🇺🇸), and skin-tone modifiers correctly. mb_substr is enough for most text; grapheme_substr matters when emoji or combining-character handling is critical.