Quantcast
Channel: User Schwern - Stack Overflow
Viewing all articles
Browse latest Browse all 581

Answer by Schwern for Regular expression for letters of any alphabet

$
0
0

That is, need something like /[[:alpha:]]+/g which is only for Latin...

That is mistaken.

[:alpha:] is a POSIX character class. Depending on your regex implementation what alpha matches either depends on your locale, or it matches all Unicode "letters".

If your regex implementation follows the Unicode conventions then the alpha character class is upper+lower. upper matches all Unicode upper case characters and lower matches all Unicode lower case characters.

POSIX character classes pre-date Unicode and their implementation can be inconsistent. If you want to be more explicit, some regex implementations provide Unicode character classes, usually as \p{xx}. You want \p{L} for all Unicode letters.

Because some languages have no concept of case, such as Japanese, some implementations, such as Ruby, will also include "other letters". They make [:alpha:] and \p{L} equivalent.

"ふ".match?(/[[:alpha:]]/) # true"ふ".match?(/\p{L}/) # true"a".match?(/[[:alpha:]]/)  # true"a".match?(/\p{L}/)  # true"Ⳃ".match?(/[[:alpha:]]/) # true"Ⳃ".match?(/\p{L}/) # true"1".match?(/[[:alpha:]]/)  # false"1".match?(/\p{L}/) # false

See regular-expressions.info's articles on POSIX Bracket Expressions and Unicode.


Viewing all articles
Browse latest Browse all 581

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>