JavaScript ã¯æååã«å¯¾ã Unicode ã¨ã³ã³ã¼ãã£ã³ã° ã使ç¨ãã¾ããã»ã¨ãã©ã®æåã¯2ãã¤ãã§ã¨ã³ã³ã¼ãã£ã³ã°ããã¦ãã¾ããããã¯æå¤§ã§ 65536 æå表ç¾ã§ãã¾ãã
ãã®ç¯å²ã¯å¯è½ãªãã¹ã¦ã®æåãã¨ã³ã³ã¼ãããã®ã«ã¯ååãªå¤§ããã§ã¯ããã¾ããããã®ãããð³ (æ°å¦ç㪠X)ã ð (ã¹ãã¤ã«)ã®ãããªä¸é¨ã®ã¾ãæåã¯4ãã¤ãã§ã¨ã³ã³ã¼ãããã¦ãã¾ã
ããã¯æ¯è¼ã®ããã®Unicodeå¤ã§ã:
| Character | Unicode | Bytes |
|---|---|---|
| a | 0x0061 |
2 |
| â | 0x2248 |
2 |
| ð³ | 0x1d4b3 |
4 |
| ð´ | 0x1d4b4 |
4 |
| ð | 0x1f604 |
4 |
ãããã£ã¦ãa ã â ã¨è¨ã£ãæåã2ãã¤ããå ããçãããã®ã¯4ãã¤ãã«ãªãã¾ãã
JavaScript ãç»å ´ããããã¯ãUnicodeã¨ã³ã³ã¼ãã£ã³ã°ã¯ã·ã³ãã«ã§ãã: 4ãã¤ãæåãããã¾ããã§ããããã®ãããä¾ç¶ã¨ãã¦ä¸é¨ã®è¨èªæ©è½ã¯Unicodeãæ£ããå¦çãã¾ããã
alert('ð'.length); // 2
alert('ð³'.length); // 2
â¦ã§ããã1æåã«ããè¦ãã¾ãããã? ãã¤ã³ã㯠length ã¯4ãã¤ãã2ã¤ã®2ãã¤ãæåã¨ãã¦æ±ãã¨ãããã¨ã§ãããããã¯2ã¤ããããã¦ã§ããæå³ããªããªã(ãããã âãµãã²ã¼ããã¢â)ãããæ£ããããã¾ãããããã«é¢ãã¦ã¯ æåå ã§è¨è¿°ãã¦ãã¾ãã
ããã©ã«ãã§ã¯ãéå¸¸ã®æ£è¦è¡¨ç¾ã4ãã¤ãã® âé·ãæåâ ã2ãã¤ãã®æåã®ãã¢ã¨ãã¦æ±ãã¾ããããã¦ãæååã§èµ·ãã£ãããã«ãããããªçµæã«ãªãå ´åãããã¾ããããã«ã¤ãã¦ã¯å¾ã»ã©ã éåã¨ç¯å² [...] ã®è¨äºã§èª¬æãã¾ãã
æååã¨ã¯ç°ãªããéå¸¸ã®æ£è¦è¡¨ç¾ã¯ãã®ãããªåé¡ã解決ãããã©ã° u ãæã£ã¦ãã¾ãããã®ãã©ã°ãããã°ãæ£è¦è¡¨ç¾ã¯4ãã¤ãæåãæ£ããæ±ããã¨ãã§ãã¾ããããã¦Unicodeããããã£æ¤ç´¢ãå¯è½ã«ãªãã¾ããæ¬¡ã§èª¬æãã¦ããã¾ãã
Unicodeãããã㣠\p{â¦}
2018å¹´ããæ¨æºã®ä¸é¨ã§ã¯ããã¾ãããUnicodeããããã£ã¯ Firefox(bug)ã¨Edge (bug)ã§ã¯ãµãã¼ãããã¦ãã¾ããã
ã¯ãã¹ãã©ã¦ã¶ã§Unicodeããããã£ããµãã¼ããã âæ¡å¼µâ æ£è¦è¡¨ç¾ãæä¾ãã XRegExp ã¨ããã©ã¤ãã©ãªãããã¾ãã
Unicodeã®ãã¹ã¦ã®æåã«ã¯å¤ãã®ããããã£ãããã¾ãããããã¯æåã âã©ã®ã«ãã´ãªâ ã«å±ãã¦ãããã説æããããã«é¢ããæ§ã ãªæ å ±ãå«ã¿ã¾ãã
ä¾ãã°ãæåã Letter ããããã£ãæã£ã¦ããå ´åããã®æåã¯(ä»»æã®è¨èªã®)ã¢ã«ãã¡ãããã«å±ãããã¨ãæå³ãã¾ããã¾ããNumber ããããã£ã¯æ°å¤ã§ãããã¨ãæå³ãã¾ã: ã¢ã©ãã¢èªãä¸å½èªãªã©ã
\p{â¦} ã§ãããããã£ã§æåãæ¤ç´¢ãããã¨ãã§ãã¾ãã\p{â¦} ã使ãã«ã¯ãæ£è¦è¡¨ç¾ã« u ãã©ã°ãå¿
è¦ã§ãã
ä¾ãã°ã\p{Letter} ã¯ä»»æã®è¨èªã®æåã示ãã¾ãã\p{L} ã使ç¨ãããã¨ãã§ããL 㯠Letter ã®ã¨ã¤ãªã¢ã¹ã§ããã»ã¼ãã¹ã¦ã®ããããã£ã«ç縮ãããã¨ã¤ãªã¢ã¹ãããã¾ãã
以ä¸ã®ä¾ã§ã¯ã3種é¡ã®æåãè¦ã¤ããã¾ã: è±èªãã°ã«ã¸ã¢èªãéå½èªã
let str = "A á ã±";
alert( str.match(/\p{L}/gu) ); // A,á,ã±
alert( str.match(/\p{L}/g) ); // null ("u" ãã©ã°ããªãã®ã§ããããã¾ãã)
ããã¯ä¸»ãªæåã«ãã´ãªã¨ãããã®ãµãã«ãã´ãªã§ã:
- Letter
L:- lowercase
Ll - modifier
Lm, - titlecase
Lt, - uppercase
Lu, - other
Lo.
- lowercase
- Number
N:- decimal digit
Nd, - letter number
Nl, - other
No.
- decimal digit
- Punctuation
P:- connector
Pc, - dash
Pd, - initial quote
Pi, - final quote
Pf, - open
Ps, - close
Pe, - other
Po.
- connector
- Mark
M(accents etc):- spacing combining
Mc, - enclosing
Me, - non-spacing
Mn.
- spacing combining
- Symbol
S:- currency
Sc, - modifier
Sk, - math
Sm, - other
So.
- currency
- Separator
Z:- line
Zl, - paragraph
Zp, - space
Zs.
- line
- Other
C:- control
Cc, - format
Cf, - not assigned
Cn, â private useCo, - surrogate
Cs.
- control
ãªã®ã§ãä¾ãã°å°æåã®æåãå¿
è¦ãªå ´å㯠\p{Ll}ãå¥èªç¹(punctuation)ãå¿
è¦ã§ããã° \p{P} ã¨ãã£ãããã«ã§ãã¾ãã
次ã®ãããªæ´¾çã«ãã´ãªãããã¾ã:
Alphabetic(Alpha)ã¯æåLã«å ããæåçªå·Nl(ä¾: â « â ãã¼ãæ°åã®12)ãä¸é¨ã®è¨å·Other_Alphabetic(OAlpha)ãå«ã¿ã¾ããHex_Digitã¯16鲿°ã§ã:0-9,a-fã- ãªã©ãªã©
Unicodeã¯å¤ãã®ç°ãªãããããã£ããµãã¼ããã¦ããããããã®å®å ¨ãªãªã¹ãã¯ããã§ã¯æ¸ããããªããããããã§ã¯ãã®åç §ã示ãã¾ã:
- æåæ¯ã®å ¨ããããã£ã®ãªã¹ã: https://unicode.org/cldr/utility/character.jsp.
- ããããã£æ¯ã®å ¨æåã®ãªã¹ã: https://unicode.org/cldr/utility/list-unicodeset.jsp.
- ããããã£ã®ç縮ã¨ã¤ãªã¢ã¹: https://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt.
- ãã¹ã¦ã®ããããã£ãå«ãããã¹ãå½¢å¼ã§ã®Unicodeæåã®å®å ¨ãªãã¼ã¹ã¯ããã§ã: https://www.unicode.org/Public/UCD/latest/ucd/.
ä¾: 16鲿°
ä¾ãã°ãxFF ã®ããã«ãã¦æ¸ããã16鲿°ãæ¢ãã¾ããããããã§ F ã¯16鲿°å¤ã§ã(0â¦1 or Aâ¦F)ã
16鲿°ã¯ \p{Hex_Digit} ã§è¡¨ããã¨ãã§ãã¾ãã:
let regexp = /x\p{Hex_Digit}\p{Hex_Digit}/u;
alert("number: xAF".match(regexp)); // xAF
ä¾: ä¸å½ã®è±¡å½¢æå
象形æåãæ¢ãã¾ãããã
Cyrillic, Greek, Arabic, Han(ä¸å½èª)ãªã©ã®å¤ããã¤ãUnicodeãããã㣠Script (æ¸è¨ä½ç³»)ãããã¾ããå®å
¨ãªãªã¹ãã¯ãã¡ãã§ã.
æå®ãããæ¸è¨ä½ç³»ã§æåãæ¢ãã«ã¯ãScript=<value> ã使ç¨ãã¾ããä¾ãã°ãããªã«æåã®å ´åã¯ã\p{sc=Cyrillic}, ä¸å½ã®è±¡å½¢æåã®å ´åã¯: \p{sc=Han}, ã¨ãªãã¾ã:
let regexp = /\p{sc=Han}/gu; // 象形æåãããã
let str = `Hello ÐÑÐ¸Ð²ÐµÑ ä½ å¥½ 123_456`;
alert( str.match(regexp) ); // ä½ ,好
ä¾: ã«ã¬ã³ã·ã¼(é貨)
$, â¬, Â¥ ã®ãããªé貨ã表ãæåã«ã¯Unicodeãããã㣠\p{Currency_Symbol} ããããç縮ã¨ã¤ãªã¢ã¹ã¯: \p{Sc} ã§ãã
âé貨ã«ç¶ãã¦æ°å¤â ããããã©ã¼ãããã«å¯¾ãã¦ãä¾¡æ ¼ãæ¢ãã¦ãã¾ãããã:
let regexp = /\p{Sc}\d/gu;
let str = `Prices: $2, â¬1, Â¥9`;
alert( str.match(regexp) ); // $2,â¬1,Â¥9
å¾ã»ã©ãè¨äº éæå®å +, *, ? 㨠{n} ã§å¤ãã®æ°åãå«ãæ°å¤ã®æ¢ãæ¹ãè¦ã¦ããã¾ãã
ãµããª
ãã©ã° u ã¯æ£è¦è¡¨ç¾ã§ã®Unicodeãµãã¼ããæå¹ã«ãã¾ãã
ããã¯2ã¤ã®ãã¨ãæå³ãã¾ã:
- 4ãã¤ãæåã2ã¤ã®2ãã¤ãã®æåã§ã¯ãªãã1ã¤ã®æåã¨ãã¦æ£ããå¦çããã¾ãã
- æ¤ç´¢
\p{â¦}ã§ Unicode ããããã£ãå©ç¨ã§ãã¾ãã
Unicode ããããã£ãå©ç¨ããã¨ãæå®ãããè¨èªã®åèªãç¹æ®æå(å¼ç¨ç¬¦ãé貨)ãªã©ãæ¢ããã¨ãã§ãã¾ãã
ã³ã¡ã³ã
<code>ã¿ã°ã使ã£ã¦ãã ãããè¤æ°è¡ã®å ´åã¯<pre>ãã10è¡ãè¶ ããå ´åã«ã¯ãµã³ãããã¯ã¹ã使ã£ã¦ãã ãã(plnkr, JSBin, codepenâ¦)ã