diff --git a/doc/diagram/architecture.dot b/doc/diagram/architecture.dot new file mode 100644 index 0000000..c816c87 --- /dev/null +++ b/doc/diagram/architecture.dot @@ -0,0 +1,50 @@ +digraph { + compound=true + fontname="Inconsolata, Consolas" + fontsize=10 + margin="0,0" + ranksep=0.2 + nodesep=0.5 + penwidth=0.5 + colorscheme=spectral7 + + node [shape=box, fontname="Inconsolata, Consolas", fontsize=10, penwidth=0.5, style=filled, fillcolor=white] + edge [fontname="Inconsolata, Consolas", fontsize=10, penwidth=0.5] + + subgraph cluster1 { + margin="10,10" + labeljust="left" + label = "SAX" + style=filled + fillcolor=6 + + Reader -> Writer [style=invis] + } + + subgraph cluster2 { + margin="10,10" + labeljust="left" + label = "DOM" + style=filled + fillcolor=7 + + Value + Document + } + + Handler [label="<>\nHandler"] + + { + edge [arrowtail=onormal, dir=back] + Value -> Document + Handler -> Document + Handler -> Writer + } + + { + edge [arrowhead=vee, style=dashed, constraint=false] + Reader -> Handler [label="calls"] + Value -> Handler [label="calls"] + Document -> Reader [label="uses"] + } +} \ No newline at end of file diff --git a/doc/diagram/architecture.png b/doc/diagram/architecture.png new file mode 100644 index 0000000..05336df Binary files /dev/null and b/doc/diagram/architecture.png differ diff --git a/doc/diagram/utilityclass.dot b/doc/diagram/utilityclass.dot new file mode 100644 index 0000000..1492a8a --- /dev/null +++ b/doc/diagram/utilityclass.dot @@ -0,0 +1,73 @@ +digraph { + rankdir=LR + compound=true + fontname="Inconsolata, Consolas" + fontsize=10 + margin="0,0" + ranksep=0.3 + nodesep=0.15 + penwidth=0.5 + colorscheme=spectral7 + + node [shape=box, fontname="Inconsolata, Consolas", fontsize=10, penwidth=0.5, style=filled, fillcolor=white] + edge [fontname="Inconsolata, Consolas", fontsize=10, penwidth=0.5] + + subgraph cluster0 { + style=filled + fillcolor=4 + + Encoding [label="<>\nEncoding"] + + edge [arrowtail=onormal, dir=back] + Encoding -> { UTF8; UTF16; UTF32; ASCII; AutoUTF } + UTF16 -> { UTF16LE; UTF16BE } + UTF32 -> { UTF32LE; UTF32BE } + } + + subgraph cluster1 { + style=filled + fillcolor=5 + + Stream [label="<>\nStream"] + InputByteStream [label="<>\nInputByteStream"] + OutputByteStream [label="<>\nOutputByteStream"] + + edge [arrowtail=onormal, dir=back] + Stream -> { + StringStream; InsituStringStream; StringBuffer; + EncodedInputStream; EncodedOutputStream; + AutoUTFInputStream; AutoUTFOutputStream + InputByteStream; OutputByteStream + } + + InputByteStream -> { MemoryStream; FlieReadStream } + OutputByteStream -> { MemoryBuffer; FileWriteStream } + } + + subgraph cluster2 { + style=filled + fillcolor=3 + + Allocator [label="<>\nAllocator"] + + edge [arrowtail=onormal, dir=back] + Allocator -> { CrtAllocator; MemoryPoolAllocator } + } + + { + edge [arrowtail=odiamond, arrowhead=vee, dir=both] + EncodedInputStream -> InputByteStream + EncodedOutputStream -> OutputByteStream + AutoUTFInputStream -> InputByteStream + AutoUTFOutputStream -> OutputByteStream + MemoryPoolAllocator -> Allocator [label="base", tailport=s] + } + + { + edge [arrowhead=vee, style=dashed] + AutoUTFInputStream -> AutoUTF + AutoUTFOutputStream -> AutoUTF + } + + //UTF32LE -> Stream [style=invis] +} \ No newline at end of file diff --git a/doc/diagram/utilityclass.png b/doc/diagram/utilityclass.png new file mode 100644 index 0000000..0088bb1 Binary files /dev/null and b/doc/diagram/utilityclass.png differ diff --git a/doc/internals.md b/doc/internals.md index da60825..a20c014 100644 --- a/doc/internals.md +++ b/doc/internals.md @@ -4,24 +4,234 @@ This section records some design and implementation details. [TOC] +# Architecture {#Architecture} + +## SAX and DOM + +The basic relationships of SAX and DOM is shown in the following UML diagram. + +![Architecture UML class diagram](diagram/architecture.png) + +The core of the relationship is the `Handler` concept. From the SAX side, `Reader` parses a JSON from a stream and publish events to a `Handler`. `Writer` implements the `Handler` concept to handle the same set of events. From the DOM side, `Document` implements the `Handler` concept to build a DOM according to the events. `Value` supports a `Value::Accept(Handler&)` function, which traverses the DOM to publish events. + +With this design, SAX is not dependent on DOM. Even `Reader` and `Writer` have no dependencies between them. This provides flexibility to chain event publisher and handlers. Besides, `Value` does not depends on SAX as well. So, in addition to stringify a DOM to JSON, user may also stringify it to a XML writer, or do anything else. + +## Utility Classes + +Both SAX and DOM APIs depends on 3 additional concepts: `Allocator`, `Encoding` and `Stream`. Their inheritance hierarchy is shown as below. + +![Utility classes UML class diagram](diagram/utilityclass.png) + # Value {#Value} +`Value` (actually a typedef of `GenericValue>`) is the core of DOM API. This section describes the design of it. + ## Data Layout {#DataLayout} +`Value` is a [variant type](http://en.wikipedia.org/wiki/Variant_type). In RapidJSON's context, an instance of `Value` can contain 1 of 6 JSON value types. This is possble by using `union`. Each `Value` contains two members: `union Data data_` and a`unsigned flags_`. The `flags_` indiciates the JSON type, and also additional information. + +The following tables show the data layout of each type. The 32-bit/64-bit columns indicates the size of the field in bytes. + +| Null | |32-bit|64-bit| +|-------------------|----------------------------------|:----:|:----:| +| (unused) | |4 |8 | +| (unused) | |4 |4 | +| (unused) | |4 |4 | +| `unsigned flags_` | `kNullType | kNullFlag` |4 |4 | + +| Bool | |32-bit|64-bit| +|-------------------|----------------------------------------------------|:----:|:----:| +| (unused) | |4 |8 | +| (unused) | |4 |4 | +| (unused) | |4 |4 | +| `unsigned flags_` | `kBoolType | `(either `kTrueFlag` or `kFalseFlag`) |4 |4 | + +| String | |32-bit|64-bit| +|---------------------|-------------------------------------|:----:|:----:| +| `Ch* str` | Pointer to the string (may own) |4 |8 | +| `SizeType length` | Length of string |4 |4 | +| (unused) | |4 |4 | +| `unsigned flags_` | `kStringType | kStringFlag | ...` |4 |4 | + +| Object | |32-bit|64-bit| +|---------------------|-------------------------------------|:----:|:----:| +| `Member* members` | Pointer to array of members (owned) |4 |8 | +| `SizeType size` | Number of members |4 |4 | +| `SizeType capacity` | Capacity of members |4 |4 | +| `unsigned flags_` | `kObjectType | kObjectFlag` |4 |4 | + +| Array | |32-bit|64-bit| +|---------------------|-------------------------------------|:----:|:----:| +| `Value* values` | Pointer to array of values (owned) |4 |8 | +| `SizeType size` | Number of values |4 |4 | +| `SizeType capacity` | Capacity of values |4 |4 | +| `unsigned flags_` | `kArrayType | kArrayFlag` |4 |4 | + +| Number (Int) | |32-bit|64-bit| +|---------------------|-------------------------------------|:----:|:----:| +| `int i` | 32-bit signed integer |4 |4 | +| (zero padding) | 0 |4 |4 | +| (unused) | |4 |8 | +| `unsigned flags_` | `kNumberType | kNumberFlag | kIntFlag | kInt64Flag | ...` |4 |4 | + +| Number (UInt) | |32-bit|64-bit| +|---------------------|-------------------------------------|:----:|:----:| +| `unsigned u` | 32-bit unsigned integer |4 |4 | +| (zero padding) | 0 |4 |4 | +| (unused) | |4 |8 | +| `unsigned flags_` | `kNumberType | kNumberFlag | kUIntFlag | kUInt64Flag | ...` |4 |4 | + +| Number (Int64) | |32-bit|64-bit| +|---------------------|-------------------------------------|:----:|:----:| +| `int64_t i64` | 64-bit signed integer |8 |8 | +| (unused) | |4 |8 | +| `unsigned flags_` | `kNumberType | kNumberFlag | kInt64Flag | ...` |4 |4 | + +| Number (Uint64) | |32-bit|64-bit| +|---------------------|-------------------------------------|:----:|:----:| +| `uint64_t i64` | 64-bit unsigned integer |8 |8 | +| (unused) | |4 |8 | +| `unsigned flags_` | `kNumberType | kNumberFlag | kInt64Flag | ...` |4 |4 | + +| Number (Double) | |32-bit|64-bit| +|---------------------|-------------------------------------|:----:|:----:| +| `uint64_t i64` | Double precision floating-point |8 |8 | +| (unused) | |4 |8 | +| `unsigned flags_` | `kNumberType | kNumberFlag | kDoubleFlag` |4 |4 | + +Here are some notes: +* To reduce memory consumption for 64-bit architecture, `SizeType` is typedef as `unsigned` instead of `size_t`. +* Zero padding for 32-bit number may be placed after or before the actual type, according to the endianess. This makes possible for interpreting a 32-bit integer as a 64-bit integer, without any conversion. +* An `Int` is always an `Int64`, but the converse is not always true. + ## Flags {#Flags} +The 32-bit `flags_` contains both JSON type and other additional information. As shown in the above tables, each JSON type contains redundant `kXXXType` and `kXXXFlag`. This design is for optimizing the operation of testing bit-flags (`IsNumber()`) and obtaining a sequental number for each type (`GetType()`). + +String has two optional flags. `kCopyFlag` means that the string owns a copy of the string. `kInlineStrFlag` means using [Short-String Optimizatoin](#ShortString). + +Number is a bit more complicated. For normal integer values, it can contains `kIntFlag`, `kUintFlag`, `kInt64Flag` and/or `kUint64Flag`, according to the range of the integer. For numbers with fraction, and integers larger than 64-bit range, they will be stored as `double` with `kDoubleFlag`. + +## Short-String Optimization {#ShortString} + + Kosta (@Kosta-Github) provided a very neat short-string optimization. The optimization idea is given as follow. Excluding the `flags_`, a `Value` has 12 or 16 bytes (32-bit or 64-bit) for storing actual data. Instead of storing a pointer to a string, it is possible to store short strings in these space internally. For encoding with 1-byte character type (e.g. `char`), it can store maxium 11 or 15 characters string inside the `Value` type. + +| ShortString (Ch=char) | |32-bit|64-bit| +|---------------------|-------------------------------------|:----:|:----:| +| `Ch str[MaxChars]` | String buffer |11 |15 | +| `Ch invLength` | MaxChars - Length |1 |1 | +| `unsigned flags_` | `kStringType | kStringFlag | ...` |4 |4 | + +A special technique is applied. Instead of storing the length of string directly, it stores (MaxChars - length). This make it possible to store 11 characters with trailing `\0`. + +This optimization can reduce memory usage for copy-string. It can also improve cache-coherence thus improve runtime performance. + # Allocator {#Allocator} +`Allocator` is a concept in RapidJSON: +~~~cpp +concept Allocator { + static const bool kNeedFree; //!< Whether this allocator needs to call Free(). + + // Allocate a memory block. + // \param size of the memory block in bytes. + // \returns pointer to the memory block. + void* Malloc(size_t size); + + // Resize a memory block. + // \param originalPtr The pointer to current memory block. Null pointer is permitted. + // \param originalSize The current size in bytes. (Design issue: since some allocator may not book-keep this, explicitly pass to it can save memory.) + // \param newSize the new size in bytes. + void* Realloc(void* originalPtr, size_t originalSize, size_t newSize); + + // Free a memory block. + // \param pointer to the memory block. Null pointer is permitted. + static void Free(void *ptr); +}; +~~~ + +Note that `Malloc()` and `Realloc()` are member functions but `Free()` is static member function. + ## MemoryPoolAllocator {#MemoryPoolAllocator} +`MemoryPoolAllocator` is the default allocator for DOM. It allocate but do not free memory. This is suitable for building a DOM tree. + +Internally, it allocates chunks of memory from the base allocator (by default `CrtAllocator`) and stores the chunks as a singly linked list. When user requests an allocation, it allocates memory from the following order: + +1. User supplied buffer if it is available. (See [User Buffer section in DOM](dom.md)) +2. If user supplied buffer is full, use the current memory chunk. +3. If the current block is full, allocate a new block of memory. + # Parsing Optimization {#ParsingOptimization} -## Skip Whitespace with SIMD {#SkipwhitespaceWithSIMD} +## Skip Whitespaces with SIMD {#SkipwhitespaceWithSIMD} -## Pow10() {#Pow10} +When parsing JSON from a stream, the parser need to skip 4 whitespace characters: + +1. Space (`U+0020`) +2. Character Tabulation (`U+000B`) +3. Line Feed (`U+000A`) +4. Carriage Return (`U+000D`) + +A simple implementation will be simply: +~~~cpp +void SkipWhitespace(InputStream& s) { + while (s.Peek() == ' ' || s.Peek() == '\n' || s.Peek() == '\r' || s.Peek() == '\t') + s.Take(); +} +~~~ + +However, this requires 4 comparisons and a few branching for each character. This was found to be a hot spot. + +To accelerate this process, SIMD was applied to compare 16 characters with 4 white spaces for each iteration. Currently RapidJSON only supports SSE2 and SSE4.1 instructions for this. And it is only activated for UTF-8 memory streams, including string stream or *in situ* parsing. ## Local Stream Copy {#LocalStreamCopy} +During optimization, it is found that some compilers cannot localize some member data access of streams into local variables or registers. Experimental results show that for some stream types, making a copy of the stream and used it in inner-loop can improve performance. For example, the actual (non-SIMD) implementation of `SkipWhitespace()` is implemented as: + +~~~cpp +template +void SkipWhitespace(InputStream& is) { + internal::StreamLocalCopy copy(is); + InputStream& s(copy.s); + + while (s.Peek() == ' ' || s.Peek() == '\n' || s.Peek() == '\r' || s.Peek() == '\t') + s.Take(); +} +~~~ + +Depending on the traits of stream, `StreamLocalCopy` will make (or not make) a copy of the stream object, use it locally and copy the states of stream back to the original stream. + +## Parsing to Double {#ParsingDouble} + +Parsing string into `double` is difficult. The standard library function `strtod()` can do the job but it is slow. By default, the parsers use normal precision setting. This has has maximum 3 [ULP](http://en.wikipedia.org/wiki/Unit_in_the_last_place) error and implemented in `internal::StrtodNormalPrecision()`. + +When using `kParseFullPrecisionFlag`, the parsers calls `internal::StrtodFullPrecision()` instead, and this function actually implemented 3 versions of conversion methods. +1. [Fast-Path](http://www.exploringbinary.com/fast-path-decimal-to-floating-point-conversion/). +2. Custom DIY-FP implementation as in [double-conversion](https://github.com/floitsch/double-conversion). +3. Big Integer Method as in (Clinger, William D. How to read floating point numbers accurately. Vol. 25. No. 6. ACM, 1990). + +If the first conversion methods fail, it will try the second, and so on. + +# Generation Optimizatoin {#GenerationOptimization} + +## Integer-to-String conversion {#itoa} + +The naive algorithm for integer-to-string conversion involves division per each decimal digit. We have implemented various implementations and evaluated them in [itoa-benchmark](https://github.com/miloyip/itoa-benchmark). + +Although SSE2 version is the fastest but the difference is minor by comparing to the first running-up `branchlut`. And `branchlut` is pure C++ implementation so we adopt `branchlut` in RapidJSON. + +## Double-to-String conversion {#dtoa} + +Originally RapidJSON uses `snprintf(..., ..., "%g")` to achieve double-to-string conversion. This is not accurate as the default precision is 6. Later we also find that this is slow and there is an alternative. + +Google's V8 [double-conversion](https://github.com/floitsch/double-conversion +) implemented a newer, fast algorithm called Grisu3 (Loitsch, Florian. "Printing floating-point numbers quickly and accurately with integers." ACM Sigplan Notices 45.6 (2010): 233-243.). + +However, since it is not header-only so that we implemented a header-only version of Grisu2. This algorithm guarantees that the result is always accurate. And in most of cases it produces the shortest (optimal) string representation. + +The header-only conversion function has been evaluated in [dtoa-benchmark](https://github.com/miloyip/dtoa-benchmark). + # Parser {#Parser} ## Iterative Parser {#IterativeParser}