From bdffa9a5b32bda155d9384b91148383e19879ac6 Mon Sep 17 00:00:00 2001
From: Julio Biason <julio.biason@pm.me>
Date: Wed, 5 Jan 2022 21:30:32 -0300
Subject: [PATCH] FAST explanation

---
 content/thoughts/decoding-fast.md | 500 ++++++++++++++++++++++++++++++
 1 file changed, 500 insertions(+)
 create mode 100644 content/thoughts/decoding-fast.md

diff --git a/content/thoughts/decoding-fast.md b/content/thoughts/decoding-fast.md
new file mode 100644
index 0000000..7a53298
--- /dev/null
+++ b/content/thoughts/decoding-fast.md
@@ -0,0 +1,500 @@
++++
+title = "Decoding the FAST Protocol"
+date = 2022-01-05
+
+[taxonomies]
+tags = ["finance", "binary", "protocol", "fix", "fast"]
++++
+
+Recently I have to work with a FAST (FIX Adapted for Streaming) and because the
+documentation is scattered around, I decided to put the things I discovered in a
+single place for (my own) future reference.
+
+<!-- more -->
+
+{% note() %}
+Because this is based on my personal experience and I had contact with a single
+instance of this so far, there are some things that are incomplete and/or
+wrong. I'll keep updating this post as I figure out new things.
+
+The changelog is in the end of the post.
+{% end %}
+
+# What is FAST
+
+[FAST](https://en.wikipedia.org/wiki/FAST_protocol) is, basically, a compression
+method for the FIX protocol.
+
+# And What is FIX?
+
+[FIX](https://en.wikipedia.org/wiki/Financial_Information_eXchange) is a
+protocol created for financial institutions to exchange information. Although
+there is nothing "financially" related to it -- you could use the protocol for
+anything, basically -- most financial companies use it.
+
+FIX is a very simple protocol: You have pairs of "field ID" and "value"
+separated by a "`=`" (equal sign) and the pairs are separated by the ASCII char
+with code 1 (which is represented by `^A` in some editors). Some field IDs are
+defined by the protocol (there is a whole list
+[here](https://www.inforeachinc.com/fix-dictionary/index.html)) but each
+exchange can create their own IDs.
+
+For example, if you have MsgType (ID 35) with value "`y`" and Security ID (ID
+48) with value "`123456`", you'd get the message:
+
+```
+35=y^A48=123456
+```
+
+# And Back to FAST
+
+One of the things FAST is designed for is removing duplicate and/or constant
+content. For example, MsgType (ID 35) is the "SecurityList" message, which
+contains information about all the symbols (the their security IDs) handled by
+the exchange. Because the exchange is the same in all the symbols, FAST allows
+defining the fields related to it (Source, field ID 22, and Exchange, field ID
+207) to constant values, so they don't need to be transmitted and, when decoding
+FAST back to FIX, the decoder simply add the constant value.
+
+To know which fields are constant and which are not (and some other information),
+the protocol defines a template, which have a well defined schema, to report
+that information.
+
+# The Template
+
+The template is, as mentioned before, a XML file (which the protocol definition
+doesn't provide any default way to actually receive that field, and thus is left
+for the exchange to find their way) which describes field types, names, IDs and
+operators.
+
+Note that the template describe the field IDs and their types, which the
+incoming data have only the values. If we use the FIX description above, the
+template defines the left side of the pair, while the incoming have have only
+the right side.
+
+## Field Types
+
+The protocol have a few field types: Unsigned Ints of 32 and 64 bits, Signed
+Ints of 32 and 64 bits, ASCII strings, UTF-8 strings, sequences, decimals and a
+type called "presence map".
+
+One thing to note is that all fields use a "stop bit" format. This is quite
+similar to UTF8, although UTF8 uses a "continuation bit" instead of "stop bit",
+but the process of reading is the same:
+
+- Read a byte;
+- Does it have the high order by set to 0?
+  - Yes: Keep reading;
+  - No: Stop reading the conclude the field value.
+
+## Field definitions
+
+On the template, the fields have their type, name (optional), ID, a presence
+indicator and an operator (optional).
+
+For example, if you have an unsigned int of 32 bits, named "MsgType" with ID
+"35", that would be described in the template as
+
+```xml
+<uInt32 name="MsgType" id="35"/>
+```
+
+Because there is no indication of presence, it is assumed that the field is
+"mandatory" and should always have a value. On the other hand, if you have a
+field defined as
+
+```xml
+<int32 name="ImpliedMarketIndicator" id="1144" presence="optional"/>
+```
+
+... then the field may not not have a value. This is also referred as "nullable"
+field.
+
+### Types: Ints
+
+To read an Int, you pick the 7 low order bits (everything except the high order
+one) and move to the resulting variable. If the stop bit is there, you're done;
+if it is not, you shift the result by 7 bits and add the 7 bits from the next
+byte and so on, till you find a byte with the stop bit set.
+
+The 32 and 64 bits only define the maximum value of the field and should not be
+used as "number of bits to be read" -- because of the stop bit. If the value
+exceeds 32 or 64 bits, that is considered an error and the processing should be
+aborted.
+
+Signed Int work exactly the same, but as 2's complement.
+
+For example, if incoming data have the following bytes (in binary, to make it
+easier to read; also, I added a single underscore between each 4 values, also
+to make it easier to read):
+
+```
+0000_0001 1001_0010
+```
+
+... the decoder will read the first byte and see that it doesn't have the high
+order bit set, so it keep just the "1" for the value and shift everything by 7
+bits. Then the second byte is read; this one have the high order bit set, so
+the remaining bits (in this case "001_0010") are added to the resulting value
+and get `1001_0010` -- or 146.
+
+Negative numbers are represented using 2's so you'd get, for example:
+
+```
+0000_0011 0111_1110 1110_1110
+```
+
+... which, when you remove the high order bits and follow the high order to find
+the stop bit, you get "`1111_1111 0110_1110`", which is -146 (in 16 bits, just
+to make it shorter).
+
+When an integer field is optional, the result must be decremented by 1. The
+reason for this is that, when the field is marked as optional -- also called
+"nullable" -- we need something to differentiate both 0 and Null. So, an
+optional integer with value 0 is, actually, Null; if we have a value of 0, we
+incoming data will have the value 1, which we'll decrement by 1 and become 0.
+
+### Types: Strings
+
+ASCII strings are pretty simple to read: Again, you keep reading the incoming
+data till you find a byte with the high order bit set (again, the stop bit) and
+just convert to their respective ASCII character.
+
+For example
+
+```
+0100_1000 0110_0101 0110_1100 0110_1100 1110_1111
+```
+
+Would generate the bytes 72, 101, 108, 108 and 111, which using the values as
+ASCII codes would result in "Hello". Note that the stop bit here represents "end
+of string" and the bytes should not be grouped like in Ints.
+
+{% note() %}
+So far, I didn't find any UTF8 strings, so I'm not quite sure how to process
+them yet. Surely there is documentation around on how to read those, but since
+this is my personal experience with the protocol, I decided to not mention it
+here.
+{% end %}
+
+Optional strings are Null when the first byte have a stop bit set and every
+other bit is zero.
+
+### Types: Sequences
+
+Sequences are basically arrays. The first field of a sequence is the "length"
+(with the type "`<length>`" in the template) with the number of records
+present. Inside the sequence, you have a list of field definitions, which may
+even include more sequences.
+
+Optional Sequences follow the same idea of optional Ints: You read the length
+and, it is null, there is nothing in the sequence -- and mandatory Sequences can
+be zero.
+
+### Types: Decimals
+
+Decimals are formed by two fields: Exponent and Mantissa. The way it works is
+that if you have an Exponent of "-2" and a Mantissa of "1020", you'd do `1020 *
+10 ^ -2` ("1020 times 10 to the power of -2"), and the actual value is "10.20".
+
+Both Exponent and Mantissa are read as Signed Ints.
+
+An Optional Decimal means the Exponent is optional. The documentation says that
+the Mantissa is always mandatory, but there is a catch: If the Exponent is null,
+then the Mantissa is not present and shouldn't be read; otherwise, you read the
+Mantissa and apply the conversion.
+
+Also, because Exponent and Mantissa are two fields, they can have different
+operators. I'll show some examples after the Operator, mostly because I've seen
+both with different operators and they make a mess to read.
+
+### Type: Presence Map
+
+Presence Maps are used in conjunction with operators. They are read basically
+like you'd read an unsigned int (read bytes till you find the one with the high
+order bit) but do not have any conversion in themselves. Every time you need to
+check if a field is present by checking the presence map, you consume the high
+
+Presence Maps are not present in the template and their presence is implied if
+there is the need for one. For example, in a pure mandatory sequence of fields,
+there will be no presence map at all.
+order bit from it, so it is never used again.
+
+The bits in the Presence Map are in the order of the required fields. For
+example, if a template with:
+
+1. A mandatory field;
+2. A field with an operator that requires the presence map (I'll mention those
+   later);
+3. Another mandatory field;
+4. And, finally, another field with operator.
+
+You may receive a Presence Map as `1110_0000`, in which:
+
+1. The first bit is the stop bit, so the decoder assumes this is the last byte
+   of the presence map.
+2. The second bit indicates that the first field with operator is present. It
+   does *not* represent the mandatory field, 'cause, well, it is mandatory and,
+   thus, is always present.
+3. The second bit indicates the second field with an operator.
+
+Again, I'll mention which ones the decoder should be checked in the presence
+map.
+
+## Operators
+
+Operators define a way to deal with some fields. I've seen 5 different types of
+operators:
+
+- No Operator;
+- Constant;
+- Default;
+- Copy;
+- Delta;
+- Increment.
+
+### Operator: No Operator
+
+When there is no operator defined, you have a "no operator" operator. It means
+there is no special way of dealing with the incoming value: You just capture it
+and use it.
+
+When a field have No Operator, there will be no bit in the Presence Map.
+
+### Operator: Constant
+
+A field with the Constant operator will not appear in the incoming data and you
+should assume that its value is the value in the constant. Previously I
+mentioned that a list of securities may have the field 22, "Source", and field
+207, "Exchange", with constant values, they would be defined as
+
+```xml
+<string name="Source" id="22">
+    <constant value="123"/>
+</string>
+<string name="Exchange" id="207">
+    <constant value="EXCHANGE"/>
+</string>
+```
+
+There is a catch, though: When a constant can be Null (`presence="optional"`),
+then the decoder needs to use the Presence Map bit; if it is set, the constant
+value should be used; if it is not set, then the field value is Null.
+
+The Presence Map should be use only if there is a field with a constant value
+that is optional.
+
+### Operator: Default
+
+The Default operator is similar to the Constant operator, but the decoder needs
+to check the Presence Map; if the bit for the field is set, then you use the
+default value; if it is not set, then the field is Null.
+
+### Operator: Copy
+
+The copy operator indicates that the value for this record have the same value
+of the previous record; if it is the first record, then the value should be
+used. If the Presence Map bit is set for the field, then the decoder must read
+the value in the incoming data; if it is not set, then the previous value should
+be used. In the data I saw, every first record have the bit set, so you get the
+initial/previous value.
+
+An example: You can have a template like
+
+```xml
+<string name="MDReqID" id="262">
+    <copy/>
+</string>
+```
+
+... and you have the following records and their Presence Maps:
+
+1. The first record have the bit set for this field in the Presence Map and the
+   strings reads "first". This record will have this field with the value
+   "first".
+2. The second record doesn't have the bit set in the Presence Map. So the
+   decoder reuses the previous value and this record will have the field with
+   the value "first" (again).
+3. The third record have the bit set again, and the value "second". This is the
+   value for the field in this record.
+4. The fourth record doesn't have the bit set and the decoder reuses the value
+   "second" for the field.
+
+The Copy operator may have the initial value, so you don't need to read it. For
+example
+
+```xml
+<string name="MDReqID" id="262">
+    <copy value="string"/>
+</string>
+```
+
+This means that you should use "string" as previous value, even in the first
+field.
+
+As pointed, fields with the Copy operator appear in the Presence Map.
+
+### Operator: Delta
+
+Delta is an operator similar to Copy, but instead of using the value of the
+previous record in this field, the new value must be computed using the previous
+value and the current one. Again, if you have no previous value, then there is
+no operation to be done and the incoming value is the current one.
+
+An example:
+
+```xml
+<uInt32 name="NumberOfOrders" id="346">
+    <delta/>
+</uInt32>
+```
+
+1. The first record comes with the value of "300". That's the value for the
+   field.
+2. The second record comes with the value "2". That should be added in the
+   previous value and used, so the field for the second record is "302".
+3. The third record comes with the value "3". Again, you reuse the previous
+   value and add the current one. So the field for the third record have the
+   value "305".
+
+Fields with the Delta operator do not appear in the Presence Map.
+
+### Operator: Increment
+
+Increment is another operator that works similar to the Copy operator, but if
+its bit is set in the Presence Map, the decoder reads the field value from the
+incoming data; if it is not set, the decoder does not read any data, but reuses
+the previous value with an increment of 1.
+
+Example:
+
+```xml
+<uInt32 name="RptSeq" id="83">
+    <increment/>
+</uInt32>
+```
+
+1. The first record have the bit set in the Presence Map, the decoder reads the
+   value "100". That's the field value for the record.
+2. The second have doesn't have the bit set, so nothing should be read from the
+   incoming data, but the field value should be "101" for this record.
+
+Fields with the Increment operator appear in the presence map.
+
+## Presence Map Map
+
+There is a simple map that indicates if a field appears or not in the Presence
+Map, [according to JetTek](https://jettekfix.com/education/fix-fast-tutorial/):
+
+<table>
+    <tr>
+        <th>Operator</th>
+        <th>Appears for Mandatory Fields?</th>
+        <th>Appears for Optional Fields?</th>
+    </tr>
+    <tr>
+        <td>No Operator</td>
+        <td>No</td>
+        <td>No</td>
+    </tr>
+    <tr>
+        <td>Constant</td>
+        <td>No, the Constant value should be used</td>
+        <td>Yes; if set, use the Constant value; otherwise the field is Null</td>
+    </tr>
+    <tr>
+        <td>Copy</td>
+        <td>Yes; if set, use the incoming value is the current value;
+        otherwise, use the previous value</td>
+        <td>Yes; same as above, but the value can be Null (e.g., it was read as
+        0 for Ints or a single Null byte for Strings.</td>
+    </tr>
+    <tr>
+        <td>Default</td>
+        <td>Yes; if set, read the value from the incoming data; otherwise, use
+        the default value.</td>
+        <td>Yes; same as above</td>
+    </tr>
+    <tr>
+        <td>Delta</td>
+        <td>No; the value should always be added to the previous one.</td>
+        <td>No; same as above</td>
+    </tr>
+    <tr>
+        <td>Increment</td>
+        <td>Yes; if set, read the value from the incoming data; otherwise, add
+        1 to the previous value.</td>
+        <td>Yes; same as above</td>
+    </tr>
+</table>
+
+
+# Anomalies
+
+I call "anomaly" anything that I had to spent way too much time to understand.
+
+## Decimals With Different Operators
+
+This is one thing that made things pretty hard to grasp at first. For example:
+
+```xml
+<decimal name="MDEntryPX" id="270" presence="optional">
+    <exponent>
+        <default value="0"/>
+    </exponent>
+    <mantissa>
+        <delta/>
+    </mantissa>
+</decimal>
+```
+
+That seems simple. But there are a lot of moving pieces here:
+
+1. The `presence="optional"` in the decimal means that the `exponent` can be
+   Null and only that.
+2. The `default` operator in the Exponent means the decoder must check if the
+   Exponent have a value or should use the default value of "0".
+
+   There is another issue here: If the Presence Map indicates that the value is
+   present and the read value is 0, because the Exponent is optional, it should
+   be considered Null and, thus, there is no Mantissa and everything is Null.
+3. The `delta` operator in the Mantissa should be used applying the incoming
+   value to the previous one. But, if the Exponent is Null, then there is no
+   Mantissa, but the previous value is kept.
+
+This causes a bunch of weird, "exception of the rule" dealings:
+
+1. The first record have the field set in the Presence Map and it is read as
+   "-2". That's the Exponent, reading the mantissa gives the value "1020", so
+   the whole decimal is "10.20";
+2. The second record have the field set in the Presence Map and it is read as
+   "0". Because the decimal is optional, the exponent is optional, and because
+   0 is Null, there is no Exponent, and the next value is *not* the Mantissa.
+3. The third record have the field set in the Presence Map and it is again,
+   "-2" for the Exponent and we read the Mantissa. The value read for the
+   Mantissa is "-20", but instead of assuming that the Mantissa was Null in the
+   previous record, it uses the first record value, so the Mantissa for this
+   record is "1000" and the value for the decimal is "10.00".
+
+Another weird thing I saw was related to the way the exchange was ordering the
+results. It had a sequence of sell and buy orders in which
+
+1. The first record was the sell order, with an Exponent of 0 and a Mantissa of
+   "5410". That meant the value is "5410" (pretty straight).
+2. The second record was the buy order. It had an Exponent of "-2" and the
+   Mantissa had an incoming value of 526604. That gives the value of "532014",
+   but because the Exponent is "-2", the actual value is "5320.14".
+3. The weird thing happened in the third record, which was again a sell order.
+   The value should be exactly the same as the first, but the exchange sent an
+   Exponent of 0 and a Mantissa of "−526604". With the delta, that would bring
+   the value back to "5410".
+
+I found it weird that they kept jumping between two different Exponents instead
+of using a single one, and at the time I had some issues with the delta math in
+my code, so...
+
+---
+
+### Changelog:
+
+2022-01-05: First release.