commit | 1e0c73d5b643707335b06abd2546a83d9439d14c | [log] [tgz] |
---|---|---|
author | Roberto Ierusalimschy <roberto@inf.puc-rio.br> | Fri Mar 15 13:14:17 2019 -0300 |
committer | Roberto Ierusalimschy <roberto@inf.puc-rio.br> | Fri Mar 15 13:14:17 2019 -0300 |
tree | b80b7d5e2cfeeef888ddf98fcc6276832134c1bf | |
parent | 8fa4f1380b9a203bfdf002c2e9e9e13ebb8384c1 [diff] |
Changes in the validation of UTF-8 All UTF-8 encoding functionality (including the escape sequence '\u') accepts all values from the original UTF-8 specification (with sequences of up to six bytes). By default, the decoding functions in the UTF-8 library do not accept invalid Unicode code points, such as surrogates. A new parameter 'nonstrict' makes them accept all code points up to (2^31)-1, as in the original UTF-8 specification.