Added ARM-Neon support for SIMD.SkipWhitespace*
Change-Id: Iaf210d029758723a7eeb7f28fc10cab7467889a9 Signed-off-by: Jun He <jun.he@arm.com>
This commit is contained in:
parent
e6d7247ed9
commit
2291258bb8
@ -256,7 +256,7 @@ Alternatively, if we don't want to explicitly refer to the root value of `addres
|
|||||||
|
|
||||||
3. What is SIMD? How it is applied in RapidJSON?
|
3. What is SIMD? How it is applied in RapidJSON?
|
||||||
|
|
||||||
[SIMD](http://en.wikipedia.org/wiki/SIMD) instructions can perform parallel computation in modern CPUs. RapidJSON support Intel's SSE2/SSE4.2 to accelerate whitespace skipping. This improves performance of parsing indent formatted JSON. Define `RAPIDJSON_SSE2` or `RAPIDJSON_SSE42` macro to enable this feature. However, running the executable on a machine without such instruction set support will make it crash.
|
[SIMD](http://en.wikipedia.org/wiki/SIMD) instructions can perform parallel computation in modern CPUs. RapidJSON support Intel's SSE2/SSE4.2 and ARM's Neon to accelerate whitespace/tabspace/carriage-return/line-feed skipping. This improves performance of parsing indent formatted JSON. Define `RAPIDJSON_SSE2`, `RAPIDJSON_SSE42` or `RAPIDJSON_NEON` macro to enable this feature. However, running the executable on a machine without such instruction set support will make it crash.
|
||||||
|
|
||||||
4. Does it consume a lot of memory?
|
4. Does it consume a lot of memory?
|
||||||
|
|
||||||
|
@ -257,7 +257,7 @@
|
|||||||
|
|
||||||
3. 什是是 SIMD?它如何用于 RapidJSON?
|
3. 什是是 SIMD?它如何用于 RapidJSON?
|
||||||
|
|
||||||
[SIMD](http://en.wikipedia.org/wiki/SIMD) 指令可以在现代 CPU 中执行并行运算。RapidJSON 支持了 Intel 的 SSE2/SSE4.2 去加速跳过空白字符。在解析含缩进的 JSON 时,这能提升性能。只要定义名为 `RAPIDJSON_SSE2` 或 `RAPIDJSON_SSE42` 的宏,就能启动这个功能。然而,若在不支持这些指令集的机器上执行这些可执行文件,会导致崩溃。
|
[SIMD](http://en.wikipedia.org/wiki/SIMD) 指令可以在现代 CPU 中执行并行运算。RapidJSON 支持使用 Intel 的 SSE2/SSE4.2 和 ARM 的 Neon 来加速对空白符、制表符、回车符和换行符的过滤处理。在解析含缩进的 JSON 时,这能提升性能。只要定义名为 `RAPIDJSON_SSE2` ,`RAPIDJSON_SSE42` 或 `RAPIDJSON_NEON` 的宏,就能启动这个功能。然而,若在不支持这些指令集的机器上执行这些可执行文件,会导致崩溃。
|
||||||
|
|
||||||
4. 它会消耗许多内存么?
|
4. 它会消耗许多内存么?
|
||||||
|
|
||||||
|
@ -183,17 +183,20 @@ void SkipWhitespace(InputStream& s) {
|
|||||||
|
|
||||||
However, this requires 4 comparisons and a few branching for each character. This was found to be a hot spot.
|
However, this requires 4 comparisons and a few branching for each character. This was found to be a hot spot.
|
||||||
|
|
||||||
To accelerate this process, SIMD was applied to compare 16 characters with 4 white spaces for each iteration. Currently RapidJSON only supports SSE2 and SSE4.2 instructions for this. And it is only activated for UTF-8 memory streams, including string stream or *in situ* parsing.
|
To accelerate this process, SIMD was applied to compare 16 characters with 4 white spaces for each iteration. Currently RapidJSON supports SSE2, SSE4.2 and ARM Neon instructions for this. And it is only activated for UTF-8 memory streams, including string stream or *in situ* parsing.
|
||||||
|
|
||||||
To enable this optimization, need to define `RAPIDJSON_SSE2` or `RAPIDJSON_SSE42` before including `rapidjson.h`. Some compilers can detect the setting, as in `perftest.h`:
|
To enable this optimization, need to define `RAPIDJSON_SSE2`, `RAPIDJSON_SSE42` or `RAPIDJSON_NEON` before including `rapidjson.h`. Some compilers can detect the setting, as in `perftest.h`:
|
||||||
|
|
||||||
~~~cpp
|
~~~cpp
|
||||||
// __SSE2__ and __SSE4_2__ are recognized by gcc, clang, and the Intel compiler.
|
// __SSE2__ and __SSE4_2__ are recognized by gcc, clang, and the Intel compiler.
|
||||||
// We use -march=native with gmake to enable -msse2 and -msse4.2, if supported.
|
// We use -march=native with gmake to enable -msse2 and -msse4.2, if supported.
|
||||||
|
// Likewise, __ARM_NEON is used to detect Neon.
|
||||||
#if defined(__SSE4_2__)
|
#if defined(__SSE4_2__)
|
||||||
# define RAPIDJSON_SSE42
|
# define RAPIDJSON_SSE42
|
||||||
#elif defined(__SSE2__)
|
#elif defined(__SSE2__)
|
||||||
# define RAPIDJSON_SSE2
|
# define RAPIDJSON_SSE2
|
||||||
|
#elif defined(__ARM_NEON)
|
||||||
|
# define RAPIDJSON_NEON
|
||||||
#endif
|
#endif
|
||||||
~~~
|
~~~
|
||||||
|
|
||||||
|
@ -183,17 +183,20 @@ void SkipWhitespace(InputStream& s) {
|
|||||||
|
|
||||||
但是,这需要对每个字符进行4次比较以及一些分支。这被发现是一个热点。
|
但是,这需要对每个字符进行4次比较以及一些分支。这被发现是一个热点。
|
||||||
|
|
||||||
为了加速这一处理,RapidJSON 使用 SIMD 来在一次迭代中比较16个字符和4个空格。目前 RapidJSON 只支持 SSE2 和 SSE4.2 指令。同时它也只会对 UTF-8 内存流启用,包括字符串流或 *原位* 解析。
|
为了加速这一处理,RapidJSON 使用 SIMD 来在一次迭代中比较16个字符和4个空格。目前 RapidJSON 支持 SSE2 , SSE4.2 和 ARM Neon 指令。同时它也只会对 UTF-8 内存流启用,包括字符串流或 *原位* 解析。
|
||||||
|
|
||||||
你可以通过在包含 `rapidjson.h` 之前定义 `RAPIDJSON_SSE2` 或 `RAPIDJSON_SSE42` 来启用这个优化。一些编译器可以检测这个设置,如 `perftest.h`:
|
你可以通过在包含 `rapidjson.h` 之前定义 `RAPIDJSON_SSE2` , `RAPIDJSON_SSE42` 或 `RAPIDJSON_NEON` 来启用这个优化。一些编译器可以检测这个设置,如 `perftest.h`:
|
||||||
|
|
||||||
~~~cpp
|
~~~cpp
|
||||||
// __SSE2__ 和 __SSE4_2__ 可被 gcc、clang 和 Intel 编译器识别:
|
// __SSE2__ 和 __SSE4_2__ 可被 gcc、clang 和 Intel 编译器识别:
|
||||||
// 如果支持的话,我们在 gmake 中使用了 -march=native 来启用 -msse2 和 -msse4.2
|
// 如果支持的话,我们在 gmake 中使用了 -march=native 来启用 -msse2 和 -msse4.2
|
||||||
|
// 同样的, __ARM_NEON 被用于识别Neon
|
||||||
#if defined(__SSE4_2__)
|
#if defined(__SSE4_2__)
|
||||||
# define RAPIDJSON_SSE42
|
# define RAPIDJSON_SSE42
|
||||||
#elif defined(__SSE2__)
|
#elif defined(__SSE2__)
|
||||||
# define RAPIDJSON_SSE2
|
# define RAPIDJSON_SSE2
|
||||||
|
#elif defined(__ARM_NEON)
|
||||||
|
# define RAPIDJSON_NEON
|
||||||
#endif
|
#endif
|
||||||
~~~
|
~~~
|
||||||
|
|
||||||
|
@ -325,17 +325,17 @@
|
|||||||
#endif
|
#endif
|
||||||
|
|
||||||
///////////////////////////////////////////////////////////////////////////////
|
///////////////////////////////////////////////////////////////////////////////
|
||||||
// RAPIDJSON_SSE2/RAPIDJSON_SSE42/RAPIDJSON_SIMD
|
// RAPIDJSON_SSE2/RAPIDJSON_SSE42/RAPIDJSON_NEON/RAPIDJSON_SIMD
|
||||||
|
|
||||||
/*! \def RAPIDJSON_SIMD
|
/*! \def RAPIDJSON_SIMD
|
||||||
\ingroup RAPIDJSON_CONFIG
|
\ingroup RAPIDJSON_CONFIG
|
||||||
\brief Enable SSE2/SSE4.2 optimization.
|
\brief Enable SSE2/SSE4.2/Neon optimization.
|
||||||
|
|
||||||
RapidJSON supports optimized implementations for some parsing operations
|
RapidJSON supports optimized implementations for some parsing operations
|
||||||
based on the SSE2 or SSE4.2 SIMD extensions on modern Intel-compatible
|
based on the SSE2, SSE4.2 or NEon SIMD extensions on modern Intel
|
||||||
processors.
|
or ARM compatible processors.
|
||||||
|
|
||||||
To enable these optimizations, two different symbols can be defined;
|
To enable these optimizations, three different symbols can be defined;
|
||||||
\code
|
\code
|
||||||
// Enable SSE2 optimization.
|
// Enable SSE2 optimization.
|
||||||
#define RAPIDJSON_SSE2
|
#define RAPIDJSON_SSE2
|
||||||
@ -344,13 +344,17 @@
|
|||||||
#define RAPIDJSON_SSE42
|
#define RAPIDJSON_SSE42
|
||||||
\endcode
|
\endcode
|
||||||
|
|
||||||
\c RAPIDJSON_SSE42 takes precedence, if both are defined.
|
// Enable ARM Neon optimization.
|
||||||
|
#define RAPIDJSON_NEON
|
||||||
|
\endcode
|
||||||
|
|
||||||
|
\c RAPIDJSON_SSE42 takes precedence over SSE2, if both are defined.
|
||||||
|
|
||||||
If any of these symbols is defined, RapidJSON defines the macro
|
If any of these symbols is defined, RapidJSON defines the macro
|
||||||
\c RAPIDJSON_SIMD to indicate the availability of the optimized code.
|
\c RAPIDJSON_SIMD to indicate the availability of the optimized code.
|
||||||
*/
|
*/
|
||||||
#if defined(RAPIDJSON_SSE2) || defined(RAPIDJSON_SSE42) \
|
#if defined(RAPIDJSON_SSE2) || defined(RAPIDJSON_SSE42) \
|
||||||
|| defined(RAPIDJSON_DOXYGEN_RUNNING)
|
|| defined(RAPIDJSON_NEON) || defined(RAPIDJSON_DOXYGEN_RUNNING)
|
||||||
#define RAPIDJSON_SIMD
|
#define RAPIDJSON_SIMD
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
@ -33,6 +33,8 @@
|
|||||||
#include <nmmintrin.h>
|
#include <nmmintrin.h>
|
||||||
#elif defined(RAPIDJSON_SSE2)
|
#elif defined(RAPIDJSON_SSE2)
|
||||||
#include <emmintrin.h>
|
#include <emmintrin.h>
|
||||||
|
#elif defined(RAPIDJSON_NEON)
|
||||||
|
#include <arm_neon.h>
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#ifdef _MSC_VER
|
#ifdef _MSC_VER
|
||||||
@ -411,7 +413,92 @@ inline const char *SkipWhitespace_SIMD(const char* p, const char* end) {
|
|||||||
return SkipWhitespace(p, end);
|
return SkipWhitespace(p, end);
|
||||||
}
|
}
|
||||||
|
|
||||||
#endif // RAPIDJSON_SSE2
|
#elif defined(RAPIDJSON_NEON)
|
||||||
|
|
||||||
|
//! Skip whitespace with ARM Neon instructions, testing 16 8-byte characters at once.
|
||||||
|
inline const char *SkipWhitespace_SIMD(const char* p) {
|
||||||
|
// Fast return for single non-whitespace
|
||||||
|
if (*p == ' ' || *p == '\n' || *p == '\r' || *p == '\t')
|
||||||
|
++p;
|
||||||
|
else
|
||||||
|
return p;
|
||||||
|
|
||||||
|
// 16-byte align to the next boundary
|
||||||
|
const char* nextAligned = reinterpret_cast<const char*>((reinterpret_cast<size_t>(p) + 15) & static_cast<size_t>(~15));
|
||||||
|
while (p != nextAligned)
|
||||||
|
if (*p == ' ' || *p == '\n' || *p == '\r' || *p == '\t')
|
||||||
|
++p;
|
||||||
|
else
|
||||||
|
return p;
|
||||||
|
|
||||||
|
const uint8x16_t w0 = vmovq_n_u8(' ');
|
||||||
|
const uint8x16_t w1 = vmovq_n_u8('\n');
|
||||||
|
const uint8x16_t w2 = vmovq_n_u8('\r');
|
||||||
|
const uint8x16_t w3 = vmovq_n_u8('\t');
|
||||||
|
|
||||||
|
for (;; p += 16) {
|
||||||
|
const uint8x16_t s = vld1q_u8(reinterpret_cast<const uint8_t *>(p));
|
||||||
|
uint8x16_t x = vceqq_u8(s, w0);
|
||||||
|
x = vorrq_u8(x, vceqq_u8(s, w1));
|
||||||
|
x = vorrq_u8(x, vceqq_u8(s, w2));
|
||||||
|
x = vorrq_u8(x, vceqq_u8(s, w3));
|
||||||
|
|
||||||
|
x = vmvnq_u8(x); // Negate
|
||||||
|
x = vrev64q_u8(x); // Rev in 64
|
||||||
|
uint64_t low = vgetq_lane_u64(reinterpret_cast<uint64x2_t>(x), 0); // extract
|
||||||
|
uint64_t high = vgetq_lane_u64(reinterpret_cast<uint64x2_t>(x), 1); // extract
|
||||||
|
|
||||||
|
if (low == 0) {
|
||||||
|
if (high != 0) {
|
||||||
|
int lz =__builtin_clzll(high);;
|
||||||
|
return p + 8 + (lz >> 3);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
int lz = __builtin_clzll(low);;
|
||||||
|
return p + (lz >> 3);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
inline const char *SkipWhitespace_SIMD(const char* p, const char* end) {
|
||||||
|
// Fast return for single non-whitespace
|
||||||
|
if (p != end && (*p == ' ' || *p == '\n' || *p == '\r' || *p == '\t'))
|
||||||
|
++p;
|
||||||
|
else
|
||||||
|
return p;
|
||||||
|
|
||||||
|
const uint8x16_t w0 = vmovq_n_u8(' ');
|
||||||
|
const uint8x16_t w1 = vmovq_n_u8('\n');
|
||||||
|
const uint8x16_t w2 = vmovq_n_u8('\r');
|
||||||
|
const uint8x16_t w3 = vmovq_n_u8('\t');
|
||||||
|
|
||||||
|
for (; p <= end - 16; p += 16) {
|
||||||
|
const uint8x16_t s = vld1q_u8(reinterpret_cast<const uint8_t *>(p));
|
||||||
|
uint8x16_t x = vceqq_u8(s, w0);
|
||||||
|
x = vorrq_u8(x, vceqq_u8(s, w1));
|
||||||
|
x = vorrq_u8(x, vceqq_u8(s, w2));
|
||||||
|
x = vorrq_u8(x, vceqq_u8(s, w3));
|
||||||
|
|
||||||
|
x = vmvnq_u8(x); // Negate
|
||||||
|
x = vrev64q_u8(x); // Rev in 64
|
||||||
|
uint64_t low = vgetq_lane_u64(reinterpret_cast<uint64x2_t>(x), 0); // extract
|
||||||
|
uint64_t high = vgetq_lane_u64(reinterpret_cast<uint64x2_t>(x), 1); // extract
|
||||||
|
|
||||||
|
if (low == 0) {
|
||||||
|
if (high != 0) {
|
||||||
|
int lz = __builtin_clzll(high);
|
||||||
|
return p + 8 + (lz >> 3);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
int lz = __builtin_clzll(low);
|
||||||
|
return p + (lz >> 3);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return SkipWhitespace(p, end);
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif // RAPIDJSON_NEON
|
||||||
|
|
||||||
#ifdef RAPIDJSON_SIMD
|
#ifdef RAPIDJSON_SIMD
|
||||||
//! Template function specialization for InsituStringStream
|
//! Template function specialization for InsituStringStream
|
||||||
@ -1129,7 +1216,180 @@ private:
|
|||||||
|
|
||||||
is.src_ = is.dst_ = p;
|
is.src_ = is.dst_ = p;
|
||||||
}
|
}
|
||||||
#endif
|
#elif defined(RAPIDJSON_NEON)
|
||||||
|
// StringStream -> StackStream<char>
|
||||||
|
static RAPIDJSON_FORCEINLINE void ScanCopyUnescapedString(StringStream& is, StackStream<char>& os) {
|
||||||
|
const char* p = is.src_;
|
||||||
|
|
||||||
|
// Scan one by one until alignment (unaligned load may cross page boundary and cause crash)
|
||||||
|
const char* nextAligned = reinterpret_cast<const char*>((reinterpret_cast<size_t>(p) + 15) & static_cast<size_t>(~15));
|
||||||
|
while (p != nextAligned)
|
||||||
|
if (RAPIDJSON_UNLIKELY(*p == '\"') || RAPIDJSON_UNLIKELY(*p == '\\') || RAPIDJSON_UNLIKELY(static_cast<unsigned>(*p) < 0x20)) {
|
||||||
|
is.src_ = p;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
os.Put(*p++);
|
||||||
|
|
||||||
|
// The rest of string using SIMD
|
||||||
|
const uint8x16_t s0 = vmovq_n_u8('"');
|
||||||
|
const uint8x16_t s1 = vmovq_n_u8('\\');
|
||||||
|
const uint8x16_t s2 = vmovq_n_u8('\b');
|
||||||
|
const uint8x16_t s3 = vmovq_n_u8(32);
|
||||||
|
|
||||||
|
for (;; p += 16) {
|
||||||
|
const uint8x16_t s = vld1q_u8(reinterpret_cast<const uint8_t *>(p));
|
||||||
|
uint8x16_t x = vceqq_u8(s, s0);
|
||||||
|
x = vorrq_u8(x, vceqq_u8(s, s1));
|
||||||
|
x = vorrq_u8(x, vceqq_u8(s, s2));
|
||||||
|
x = vorrq_u8(x, vcltq_u8(s, s3));
|
||||||
|
|
||||||
|
x = vrev64q_u8(x); // Rev in 64
|
||||||
|
uint64_t low = vgetq_lane_u64(reinterpret_cast<uint64x2_t>(x), 0); // extract
|
||||||
|
uint64_t high = vgetq_lane_u64(reinterpret_cast<uint64x2_t>(x), 1); // extract
|
||||||
|
|
||||||
|
SizeType length = 0;
|
||||||
|
bool escaped = false;
|
||||||
|
if (low == 0) {
|
||||||
|
if (high != 0) {
|
||||||
|
unsigned lz = (unsigned)__builtin_clzll(high);;
|
||||||
|
length = 8 + (lz >> 3);
|
||||||
|
escaped = true;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
unsigned lz = (unsigned)__builtin_clzll(low);;
|
||||||
|
length = lz >> 3;
|
||||||
|
escaped = true;
|
||||||
|
}
|
||||||
|
if (RAPIDJSON_UNLIKELY(escaped)) { // some of characters is escaped
|
||||||
|
if (length != 0) {
|
||||||
|
char* q = reinterpret_cast<char*>(os.Push(length));
|
||||||
|
for (size_t i = 0; i < length; i++)
|
||||||
|
q[i] = p[i];
|
||||||
|
|
||||||
|
p += length;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
vst1q_u8(reinterpret_cast<uint8_t *>(os.Push(16)), s);
|
||||||
|
}
|
||||||
|
|
||||||
|
is.src_ = p;
|
||||||
|
}
|
||||||
|
|
||||||
|
// InsituStringStream -> InsituStringStream
|
||||||
|
static RAPIDJSON_FORCEINLINE void ScanCopyUnescapedString(InsituStringStream& is, InsituStringStream& os) {
|
||||||
|
RAPIDJSON_ASSERT(&is == &os);
|
||||||
|
(void)os;
|
||||||
|
|
||||||
|
if (is.src_ == is.dst_) {
|
||||||
|
SkipUnescapedString(is);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
char* p = is.src_;
|
||||||
|
char *q = is.dst_;
|
||||||
|
|
||||||
|
// Scan one by one until alignment (unaligned load may cross page boundary and cause crash)
|
||||||
|
const char* nextAligned = reinterpret_cast<const char*>((reinterpret_cast<size_t>(p) + 15) & static_cast<size_t>(~15));
|
||||||
|
while (p != nextAligned)
|
||||||
|
if (RAPIDJSON_UNLIKELY(*p == '\"') || RAPIDJSON_UNLIKELY(*p == '\\') || RAPIDJSON_UNLIKELY(static_cast<unsigned>(*p) < 0x20)) {
|
||||||
|
is.src_ = p;
|
||||||
|
is.dst_ = q;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
*q++ = *p++;
|
||||||
|
|
||||||
|
// The rest of string using SIMD
|
||||||
|
const uint8x16_t s0 = vmovq_n_u8('"');
|
||||||
|
const uint8x16_t s1 = vmovq_n_u8('\\');
|
||||||
|
const uint8x16_t s2 = vmovq_n_u8('\b');
|
||||||
|
const uint8x16_t s3 = vmovq_n_u8(32);
|
||||||
|
|
||||||
|
for (;; p += 16, q += 16) {
|
||||||
|
const uint8x16_t s = vld1q_u8(reinterpret_cast<uint8_t *>(p));
|
||||||
|
uint8x16_t x = vceqq_u8(s, s0);
|
||||||
|
x = vorrq_u8(x, vceqq_u8(s, s1));
|
||||||
|
x = vorrq_u8(x, vceqq_u8(s, s2));
|
||||||
|
x = vorrq_u8(x, vcltq_u8(s, s3));
|
||||||
|
|
||||||
|
x = vrev64q_u8(x); // Rev in 64
|
||||||
|
uint64_t low = vgetq_lane_u64(reinterpret_cast<uint64x2_t>(x), 0); // extract
|
||||||
|
uint64_t high = vgetq_lane_u64(reinterpret_cast<uint64x2_t>(x), 1); // extract
|
||||||
|
|
||||||
|
SizeType length = 0;
|
||||||
|
bool escaped = false;
|
||||||
|
if (low == 0) {
|
||||||
|
if (high != 0) {
|
||||||
|
unsigned lz = (unsigned)__builtin_clzll(high);
|
||||||
|
length = 8 + (lz >> 3);
|
||||||
|
escaped = true;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
unsigned lz = (unsigned)__builtin_clzll(low);
|
||||||
|
length = lz >> 3;
|
||||||
|
escaped = true;
|
||||||
|
}
|
||||||
|
if (RAPIDJSON_UNLIKELY(escaped)) { // some of characters is escaped
|
||||||
|
for (const char* pend = p + length; p != pend; ) {
|
||||||
|
*q++ = *p++;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
vst1q_u8(reinterpret_cast<uint8_t *>(q), s);
|
||||||
|
}
|
||||||
|
|
||||||
|
is.src_ = p;
|
||||||
|
is.dst_ = q;
|
||||||
|
}
|
||||||
|
|
||||||
|
// When read/write pointers are the same for insitu stream, just skip unescaped characters
|
||||||
|
static RAPIDJSON_FORCEINLINE void SkipUnescapedString(InsituStringStream& is) {
|
||||||
|
RAPIDJSON_ASSERT(is.src_ == is.dst_);
|
||||||
|
char* p = is.src_;
|
||||||
|
|
||||||
|
// Scan one by one until alignment (unaligned load may cross page boundary and cause crash)
|
||||||
|
const char* nextAligned = reinterpret_cast<const char*>((reinterpret_cast<size_t>(p) + 15) & static_cast<size_t>(~15));
|
||||||
|
for (; p != nextAligned; p++)
|
||||||
|
if (RAPIDJSON_UNLIKELY(*p == '\"') || RAPIDJSON_UNLIKELY(*p == '\\') || RAPIDJSON_UNLIKELY(static_cast<unsigned>(*p) < 0x20)) {
|
||||||
|
is.src_ = is.dst_ = p;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// The rest of string using SIMD
|
||||||
|
const uint8x16_t s0 = vmovq_n_u8('"');
|
||||||
|
const uint8x16_t s1 = vmovq_n_u8('\\');
|
||||||
|
const uint8x16_t s2 = vmovq_n_u8('\b');
|
||||||
|
const uint8x16_t s3 = vmovq_n_u8(32);
|
||||||
|
|
||||||
|
for (;; p += 16) {
|
||||||
|
const uint8x16_t s = vld1q_u8(reinterpret_cast<uint8_t *>(p));
|
||||||
|
uint8x16_t x = vceqq_u8(s, s0);
|
||||||
|
x = vorrq_u8(x, vceqq_u8(s, s1));
|
||||||
|
x = vorrq_u8(x, vceqq_u8(s, s2));
|
||||||
|
x = vorrq_u8(x, vcltq_u8(s, s3));
|
||||||
|
|
||||||
|
x = vrev64q_u8(x); // Rev in 64
|
||||||
|
uint64_t low = vgetq_lane_u64(reinterpret_cast<uint64x2_t>(x), 0); // extract
|
||||||
|
uint64_t high = vgetq_lane_u64(reinterpret_cast<uint64x2_t>(x), 1); // extract
|
||||||
|
|
||||||
|
if (low == 0) {
|
||||||
|
if (high != 0) {
|
||||||
|
int lz = __builtin_clzll(high);
|
||||||
|
p += 8 + (lz >> 3);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
int lz = __builtin_clzll(low);
|
||||||
|
p += lz >> 3;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
is.src_ = is.dst_ = p;
|
||||||
|
}
|
||||||
|
#endif // RAPIDJSON_NEON
|
||||||
|
|
||||||
template<typename InputStream, bool backup, bool pushOnTake>
|
template<typename InputStream, bool backup, bool pushOnTake>
|
||||||
class NumberStream;
|
class NumberStream;
|
||||||
|
@ -32,6 +32,8 @@
|
|||||||
#include <nmmintrin.h>
|
#include <nmmintrin.h>
|
||||||
#elif defined(RAPIDJSON_SSE2)
|
#elif defined(RAPIDJSON_SSE2)
|
||||||
#include <emmintrin.h>
|
#include <emmintrin.h>
|
||||||
|
#elif defined(RAPIDJSON_NEON)
|
||||||
|
#include <arm_neon.h>
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#ifdef _MSC_VER
|
#ifdef _MSC_VER
|
||||||
@ -619,7 +621,75 @@ inline bool Writer<StringBuffer>::ScanWriteUnescapedString(StringStream& is, siz
|
|||||||
is.src_ = p;
|
is.src_ = p;
|
||||||
return RAPIDJSON_LIKELY(is.Tell() < length);
|
return RAPIDJSON_LIKELY(is.Tell() < length);
|
||||||
}
|
}
|
||||||
#endif // defined(RAPIDJSON_SSE2) || defined(RAPIDJSON_SSE42)
|
#elif defined(RAPIDJSON_NEON)
|
||||||
|
template<>
|
||||||
|
inline bool Writer<StringBuffer>::ScanWriteUnescapedString(StringStream& is, size_t length) {
|
||||||
|
if (length < 16)
|
||||||
|
return RAPIDJSON_LIKELY(is.Tell() < length);
|
||||||
|
|
||||||
|
if (!RAPIDJSON_LIKELY(is.Tell() < length))
|
||||||
|
return false;
|
||||||
|
|
||||||
|
const char* p = is.src_;
|
||||||
|
const char* end = is.head_ + length;
|
||||||
|
const char* nextAligned = reinterpret_cast<const char*>((reinterpret_cast<size_t>(p) + 15) & static_cast<size_t>(~15));
|
||||||
|
const char* endAligned = reinterpret_cast<const char*>(reinterpret_cast<size_t>(end) & static_cast<size_t>(~15));
|
||||||
|
if (nextAligned > end)
|
||||||
|
return true;
|
||||||
|
|
||||||
|
while (p != nextAligned)
|
||||||
|
if (*p < 0x20 || *p == '\"' || *p == '\\') {
|
||||||
|
is.src_ = p;
|
||||||
|
return RAPIDJSON_LIKELY(is.Tell() < length);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
os_->PutUnsafe(*p++);
|
||||||
|
|
||||||
|
// The rest of string using SIMD
|
||||||
|
const uint8x16_t s0 = vmovq_n_u8('"');
|
||||||
|
const uint8x16_t s1 = vmovq_n_u8('\\');
|
||||||
|
const uint8x16_t s2 = vmovq_n_u8('\b');
|
||||||
|
const uint8x16_t s3 = vmovq_n_u8(32);
|
||||||
|
|
||||||
|
for (; p != endAligned; p += 16) {
|
||||||
|
const uint8x16_t s = vld1q_u8(reinterpret_cast<const uint8_t *>(p));
|
||||||
|
uint8x16_t x = vceqq_u8(s, s0);
|
||||||
|
x = vorrq_u8(x, vceqq_u8(s, s1));
|
||||||
|
x = vorrq_u8(x, vceqq_u8(s, s2));
|
||||||
|
x = vorrq_u8(x, vcltq_u8(s, s3));
|
||||||
|
|
||||||
|
x = vrev64q_u8(x); // Rev in 64
|
||||||
|
uint64_t low = vgetq_lane_u64(reinterpret_cast<uint64x2_t>(x), 0); // extract
|
||||||
|
uint64_t high = vgetq_lane_u64(reinterpret_cast<uint64x2_t>(x), 1); // extract
|
||||||
|
|
||||||
|
SizeType len = 0;
|
||||||
|
bool escaped = false;
|
||||||
|
if (low == 0) {
|
||||||
|
if (high != 0) {
|
||||||
|
unsigned lz = (unsigned)__builtin_clzll(high);
|
||||||
|
len = 8 + (lz >> 3);
|
||||||
|
escaped = true;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
unsigned lz = (unsigned)__builtin_clzll(low);
|
||||||
|
len = lz >> 3;
|
||||||
|
escaped = true;
|
||||||
|
}
|
||||||
|
if (RAPIDJSON_UNLIKELY(escaped)) { // some of characters is escaped
|
||||||
|
char* q = reinterpret_cast<char*>(os_->PushUnsafe(len));
|
||||||
|
for (size_t i = 0; i < len; i++)
|
||||||
|
q[i] = p[i];
|
||||||
|
|
||||||
|
p += len;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
vst1q_u8(reinterpret_cast<uint8_t *>(os_->PushUnsafe(16)), s);
|
||||||
|
}
|
||||||
|
|
||||||
|
is.src_ = p;
|
||||||
|
return RAPIDJSON_LIKELY(is.Tell() < length);
|
||||||
|
}
|
||||||
|
#endif // RAPIDJSON_NEON
|
||||||
|
|
||||||
RAPIDJSON_NAMESPACE_END
|
RAPIDJSON_NAMESPACE_END
|
||||||
|
|
||||||
|
@ -24,10 +24,13 @@
|
|||||||
|
|
||||||
// __SSE2__ and __SSE4_2__ are recognized by gcc, clang, and the Intel compiler.
|
// __SSE2__ and __SSE4_2__ are recognized by gcc, clang, and the Intel compiler.
|
||||||
// We use -march=native with gmake to enable -msse2 and -msse4.2, if supported.
|
// We use -march=native with gmake to enable -msse2 and -msse4.2, if supported.
|
||||||
|
// Likewise, __ARM_NEON is used to detect Neon.
|
||||||
#if defined(__SSE4_2__)
|
#if defined(__SSE4_2__)
|
||||||
# define RAPIDJSON_SSE42
|
# define RAPIDJSON_SSE42
|
||||||
#elif defined(__SSE2__)
|
#elif defined(__SSE2__)
|
||||||
# define RAPIDJSON_SSE2
|
# define RAPIDJSON_SSE2
|
||||||
|
#elif defined(__ARM_NEON)
|
||||||
|
# define RAPIDJSON_NEON
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#define RAPIDJSON_HAS_STDSTRING 1
|
#define RAPIDJSON_HAS_STDSTRING 1
|
||||||
|
@ -28,6 +28,8 @@
|
|||||||
#define SIMD_SUFFIX(name) name##_SSE2
|
#define SIMD_SUFFIX(name) name##_SSE2
|
||||||
#elif defined(RAPIDJSON_SSE42)
|
#elif defined(RAPIDJSON_SSE42)
|
||||||
#define SIMD_SUFFIX(name) name##_SSE42
|
#define SIMD_SUFFIX(name) name##_SSE42
|
||||||
|
#elif defined(RAPIDJSON_NEON)
|
||||||
|
#define SIMD_SUFFIX(name) name##_NEON
|
||||||
#else
|
#else
|
||||||
#define SIMD_SUFFIX(name) name
|
#define SIMD_SUFFIX(name) name
|
||||||
#endif
|
#endif
|
||||||
|
@ -21,6 +21,8 @@
|
|||||||
# define RAPIDJSON_SSE42
|
# define RAPIDJSON_SSE42
|
||||||
#elif defined(__SSE2__)
|
#elif defined(__SSE2__)
|
||||||
# define RAPIDJSON_SSE2
|
# define RAPIDJSON_SSE2
|
||||||
|
#elif defined(__ARM_NEON)
|
||||||
|
# define RAPIDJSON_NEON
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#define RAPIDJSON_NAMESPACE rapidjson_simd
|
#define RAPIDJSON_NAMESPACE rapidjson_simd
|
||||||
@ -41,6 +43,8 @@ using namespace rapidjson_simd;
|
|||||||
#define SIMD_SUFFIX(name) name##_SSE2
|
#define SIMD_SUFFIX(name) name##_SSE2
|
||||||
#elif defined(RAPIDJSON_SSE42)
|
#elif defined(RAPIDJSON_SSE42)
|
||||||
#define SIMD_SUFFIX(name) name##_SSE42
|
#define SIMD_SUFFIX(name) name##_SSE42
|
||||||
|
#elif defined(RAPIDJSON_NEON)
|
||||||
|
#define SIMD_SUFFIX(name) name##_NEON
|
||||||
#else
|
#else
|
||||||
#define SIMD_SUFFIX(name) name
|
#define SIMD_SUFFIX(name) name
|
||||||
#endif
|
#endif
|
||||||
|
Loading…
x
Reference in New Issue
Block a user