← Back to challenges

RegEx XXII : Unicode property escapes

JavaScriptHardregexformatting

Instructions

Unicode property escapes match characters based on their Unicode properties - Binary ("boolean-like") or non-binary. They can be used to match emojis, punctuations, letters (even letters from specific languages or scripts), etc.

const sentence = "A ticket to 大阪 costs ¥2000 👌."

sentence.match(/\p{Emoji_Presentation}/gu) ➞ ["👌"]

Note: For Unicode property escapes to work, a regular expression must use the u flag which indicates a string must be considered as a series of Unicode code points. See also RegExp.prototype.unicode.

Note: Some Unicode properties encompasses much more characters than some character classes (such as \w which matches only latin letters, a to z) but the latter is better supported among browsers (as of January 2020).

Match all words in the nonEnglishText using a unicode property escape.

const nonEnglishText = "Приключения Алисы в Стране чудес"

const regex = /\w+/gu
nonEnglishText.match(regex) ➞ null, \w doesn't work with non english text

const regexpBMPWord = /([\u0000-\u0019\u0021-\uFFFF])+/gu
nonEnglishText.match(regexpBMPWord) ➞ [ 'Приключения', 'Алисы', 'в', 'Стране', 'чудес' ], this works

const regexpUPE = /YOUR SOLUTION HERE/gu
nonEnglishText.match(regexpUPE) ➞ [ 'Приключения', 'Алисы', 'в', 'Стране', 'чудес' ], an easier way

Notes

You will more than likely have to check the references in the Resource tab to solve.

javascript
Loading editor…
to run
Walks through the solution with reasoning and edge cases.
Next: Centroid of a Triangle