Browse Source

FAST explanation

master
Julio Biason 3 years ago
parent
commit
bdffa9a5b3
  1. 500
      content/thoughts/decoding-fast.md

500
content/thoughts/decoding-fast.md

@ -0,0 +1,500 @@
+++
title = "Decoding the FAST Protocol"
date = 2022-01-05
[taxonomies]
tags = ["finance", "binary", "protocol", "fix", "fast"]
+++
Recently I have to work with a FAST (FIX Adapted for Streaming) and because the
documentation is scattered around, I decided to put the things I discovered in a
single place for (my own) future reference.
<!-- more -->
{% note() %}
Because this is based on my personal experience and I had contact with a single
instance of this so far, there are some things that are incomplete and/or
wrong. I'll keep updating this post as I figure out new things.
The changelog is in the end of the post.
{% end %}
# What is FAST
[FAST](https://en.wikipedia.org/wiki/FAST_protocol) is, basically, a compression
method for the FIX protocol.
# And What is FIX?
[FIX](https://en.wikipedia.org/wiki/Financial_Information_eXchange) is a
protocol created for financial institutions to exchange information. Although
there is nothing "financially" related to it -- you could use the protocol for
anything, basically -- most financial companies use it.
FIX is a very simple protocol: You have pairs of "field ID" and "value"
separated by a "`=`" (equal sign) and the pairs are separated by the ASCII char
with code 1 (which is represented by `^A` in some editors). Some field IDs are
defined by the protocol (there is a whole list
[here](https://www.inforeachinc.com/fix-dictionary/index.html)) but each
exchange can create their own IDs.
For example, if you have MsgType (ID 35) with value "`y`" and Security ID (ID
48) with value "`123456`", you'd get the message:
```
35=y^A48=123456
```
# And Back to FAST
One of the things FAST is designed for is removing duplicate and/or constant
content. For example, MsgType (ID 35) is the "SecurityList" message, which
contains information about all the symbols (the their security IDs) handled by
the exchange. Because the exchange is the same in all the symbols, FAST allows
defining the fields related to it (Source, field ID 22, and Exchange, field ID
207) to constant values, so they don't need to be transmitted and, when decoding
FAST back to FIX, the decoder simply add the constant value.
To know which fields are constant and which are not (and some other information),
the protocol defines a template, which have a well defined schema, to report
that information.
# The Template
The template is, as mentioned before, a XML file (which the protocol definition
doesn't provide any default way to actually receive that field, and thus is left
for the exchange to find their way) which describes field types, names, IDs and
operators.
Note that the template describe the field IDs and their types, which the
incoming data have only the values. If we use the FIX description above, the
template defines the left side of the pair, while the incoming have have only
the right side.
## Field Types
The protocol have a few field types: Unsigned Ints of 32 and 64 bits, Signed
Ints of 32 and 64 bits, ASCII strings, UTF-8 strings, sequences, decimals and a
type called "presence map".
One thing to note is that all fields use a "stop bit" format. This is quite
similar to UTF8, although UTF8 uses a "continuation bit" instead of "stop bit",
but the process of reading is the same:
- Read a byte;
- Does it have the high order by set to 0?
- Yes: Keep reading;
- No: Stop reading the conclude the field value.
## Field definitions
On the template, the fields have their type, name (optional), ID, a presence
indicator and an operator (optional).
For example, if you have an unsigned int of 32 bits, named "MsgType" with ID
"35", that would be described in the template as
```xml
<uInt32 name="MsgType" id="35"/>
```
Because there is no indication of presence, it is assumed that the field is
"mandatory" and should always have a value. On the other hand, if you have a
field defined as
```xml
<int32 name="ImpliedMarketIndicator" id="1144" presence="optional"/>
```
... then the field may not not have a value. This is also referred as "nullable"
field.
### Types: Ints
To read an Int, you pick the 7 low order bits (everything except the high order
one) and move to the resulting variable. If the stop bit is there, you're done;
if it is not, you shift the result by 7 bits and add the 7 bits from the next
byte and so on, till you find a byte with the stop bit set.
The 32 and 64 bits only define the maximum value of the field and should not be
used as "number of bits to be read" -- because of the stop bit. If the value
exceeds 32 or 64 bits, that is considered an error and the processing should be
aborted.
Signed Int work exactly the same, but as 2's complement.
For example, if incoming data have the following bytes (in binary, to make it
easier to read; also, I added a single underscore between each 4 values, also
to make it easier to read):
```
0000_0001 1001_0010
```
... the decoder will read the first byte and see that it doesn't have the high
order bit set, so it keep just the "1" for the value and shift everything by 7
bits. Then the second byte is read; this one have the high order bit set, so
the remaining bits (in this case "001_0010") are added to the resulting value
and get `1001_0010` -- or 146.
Negative numbers are represented using 2's so you'd get, for example:
```
0000_0011 0111_1110 1110_1110
```
... which, when you remove the high order bits and follow the high order to find
the stop bit, you get "`1111_1111 0110_1110`", which is -146 (in 16 bits, just
to make it shorter).
When an integer field is optional, the result must be decremented by 1. The
reason for this is that, when the field is marked as optional -- also called
"nullable" -- we need something to differentiate both 0 and Null. So, an
optional integer with value 0 is, actually, Null; if we have a value of 0, we
incoming data will have the value 1, which we'll decrement by 1 and become 0.
### Types: Strings
ASCII strings are pretty simple to read: Again, you keep reading the incoming
data till you find a byte with the high order bit set (again, the stop bit) and
just convert to their respective ASCII character.
For example
```
0100_1000 0110_0101 0110_1100 0110_1100 1110_1111
```
Would generate the bytes 72, 101, 108, 108 and 111, which using the values as
ASCII codes would result in "Hello". Note that the stop bit here represents "end
of string" and the bytes should not be grouped like in Ints.
{% note() %}
So far, I didn't find any UTF8 strings, so I'm not quite sure how to process
them yet. Surely there is documentation around on how to read those, but since
this is my personal experience with the protocol, I decided to not mention it
here.
{% end %}
Optional strings are Null when the first byte have a stop bit set and every
other bit is zero.
### Types: Sequences
Sequences are basically arrays. The first field of a sequence is the "length"
(with the type "`<length>`" in the template) with the number of records
present. Inside the sequence, you have a list of field definitions, which may
even include more sequences.
Optional Sequences follow the same idea of optional Ints: You read the length
and, it is null, there is nothing in the sequence -- and mandatory Sequences can
be zero.
### Types: Decimals
Decimals are formed by two fields: Exponent and Mantissa. The way it works is
that if you have an Exponent of "-2" and a Mantissa of "1020", you'd do `1020 *
10 ^ -2` ("1020 times 10 to the power of -2"), and the actual value is "10.20".
Both Exponent and Mantissa are read as Signed Ints.
An Optional Decimal means the Exponent is optional. The documentation says that
the Mantissa is always mandatory, but there is a catch: If the Exponent is null,
then the Mantissa is not present and shouldn't be read; otherwise, you read the
Mantissa and apply the conversion.
Also, because Exponent and Mantissa are two fields, they can have different
operators. I'll show some examples after the Operator, mostly because I've seen
both with different operators and they make a mess to read.
### Type: Presence Map
Presence Maps are used in conjunction with operators. They are read basically
like you'd read an unsigned int (read bytes till you find the one with the high
order bit) but do not have any conversion in themselves. Every time you need to
check if a field is present by checking the presence map, you consume the high
Presence Maps are not present in the template and their presence is implied if
there is the need for one. For example, in a pure mandatory sequence of fields,
there will be no presence map at all.
order bit from it, so it is never used again.
The bits in the Presence Map are in the order of the required fields. For
example, if a template with:
1. A mandatory field;
2. A field with an operator that requires the presence map (I'll mention those
later);
3. Another mandatory field;
4. And, finally, another field with operator.
You may receive a Presence Map as `1110_0000`, in which:
1. The first bit is the stop bit, so the decoder assumes this is the last byte
of the presence map.
2. The second bit indicates that the first field with operator is present. It
does *not* represent the mandatory field, 'cause, well, it is mandatory and,
thus, is always present.
3. The second bit indicates the second field with an operator.
Again, I'll mention which ones the decoder should be checked in the presence
map.
## Operators
Operators define a way to deal with some fields. I've seen 5 different types of
operators:
- No Operator;
- Constant;
- Default;
- Copy;
- Delta;
- Increment.
### Operator: No Operator
When there is no operator defined, you have a "no operator" operator. It means
there is no special way of dealing with the incoming value: You just capture it
and use it.
When a field have No Operator, there will be no bit in the Presence Map.
### Operator: Constant
A field with the Constant operator will not appear in the incoming data and you
should assume that its value is the value in the constant. Previously I
mentioned that a list of securities may have the field 22, "Source", and field
207, "Exchange", with constant values, they would be defined as
```xml
<string name="Source" id="22">
<constant value="123"/>
</string>
<string name="Exchange" id="207">
<constant value="EXCHANGE"/>
</string>
```
There is a catch, though: When a constant can be Null (`presence="optional"`),
then the decoder needs to use the Presence Map bit; if it is set, the constant
value should be used; if it is not set, then the field value is Null.
The Presence Map should be use only if there is a field with a constant value
that is optional.
### Operator: Default
The Default operator is similar to the Constant operator, but the decoder needs
to check the Presence Map; if the bit for the field is set, then you use the
default value; if it is not set, then the field is Null.
### Operator: Copy
The copy operator indicates that the value for this record have the same value
of the previous record; if it is the first record, then the value should be
used. If the Presence Map bit is set for the field, then the decoder must read
the value in the incoming data; if it is not set, then the previous value should
be used. In the data I saw, every first record have the bit set, so you get the
initial/previous value.
An example: You can have a template like
```xml
<string name="MDReqID" id="262">
<copy/>
</string>
```
... and you have the following records and their Presence Maps:
1. The first record have the bit set for this field in the Presence Map and the
strings reads "first". This record will have this field with the value
"first".
2. The second record doesn't have the bit set in the Presence Map. So the
decoder reuses the previous value and this record will have the field with
the value "first" (again).
3. The third record have the bit set again, and the value "second". This is the
value for the field in this record.
4. The fourth record doesn't have the bit set and the decoder reuses the value
"second" for the field.
The Copy operator may have the initial value, so you don't need to read it. For
example
```xml
<string name="MDReqID" id="262">
<copy value="string"/>
</string>
```
This means that you should use "string" as previous value, even in the first
field.
As pointed, fields with the Copy operator appear in the Presence Map.
### Operator: Delta
Delta is an operator similar to Copy, but instead of using the value of the
previous record in this field, the new value must be computed using the previous
value and the current one. Again, if you have no previous value, then there is
no operation to be done and the incoming value is the current one.
An example:
```xml
<uInt32 name="NumberOfOrders" id="346">
<delta/>
</uInt32>
```
1. The first record comes with the value of "300". That's the value for the
field.
2. The second record comes with the value "2". That should be added in the
previous value and used, so the field for the second record is "302".
3. The third record comes with the value "3". Again, you reuse the previous
value and add the current one. So the field for the third record have the
value "305".
Fields with the Delta operator do not appear in the Presence Map.
### Operator: Increment
Increment is another operator that works similar to the Copy operator, but if
its bit is set in the Presence Map, the decoder reads the field value from the
incoming data; if it is not set, the decoder does not read any data, but reuses
the previous value with an increment of 1.
Example:
```xml
<uInt32 name="RptSeq" id="83">
<increment/>
</uInt32>
```
1. The first record have the bit set in the Presence Map, the decoder reads the
value "100". That's the field value for the record.
2. The second have doesn't have the bit set, so nothing should be read from the
incoming data, but the field value should be "101" for this record.
Fields with the Increment operator appear in the presence map.
## Presence Map Map
There is a simple map that indicates if a field appears or not in the Presence
Map, [according to JetTek](https://jettekfix.com/education/fix-fast-tutorial/):
<table>
<tr>
<th>Operator</th>
<th>Appears for Mandatory Fields?</th>
<th>Appears for Optional Fields?</th>
</tr>
<tr>
<td>No Operator</td>
<td>No</td>
<td>No</td>
</tr>
<tr>
<td>Constant</td>
<td>No, the Constant value should be used</td>
<td>Yes; if set, use the Constant value; otherwise the field is Null</td>
</tr>
<tr>
<td>Copy</td>
<td>Yes; if set, use the incoming value is the current value;
otherwise, use the previous value</td>
<td>Yes; same as above, but the value can be Null (e.g., it was read as
0 for Ints or a single Null byte for Strings.</td>
</tr>
<tr>
<td>Default</td>
<td>Yes; if set, read the value from the incoming data; otherwise, use
the default value.</td>
<td>Yes; same as above</td>
</tr>
<tr>
<td>Delta</td>
<td>No; the value should always be added to the previous one.</td>
<td>No; same as above</td>
</tr>
<tr>
<td>Increment</td>
<td>Yes; if set, read the value from the incoming data; otherwise, add
1 to the previous value.</td>
<td>Yes; same as above</td>
</tr>
</table>
# Anomalies
I call "anomaly" anything that I had to spent way too much time to understand.
## Decimals With Different Operators
This is one thing that made things pretty hard to grasp at first. For example:
```xml
<decimal name="MDEntryPX" id="270" presence="optional">
<exponent>
<default value="0"/>
</exponent>
<mantissa>
<delta/>
</mantissa>
</decimal>
```
That seems simple. But there are a lot of moving pieces here:
1. The `presence="optional"` in the decimal means that the `exponent` can be
Null and only that.
2. The `default` operator in the Exponent means the decoder must check if the
Exponent have a value or should use the default value of "0".
There is another issue here: If the Presence Map indicates that the value is
present and the read value is 0, because the Exponent is optional, it should
be considered Null and, thus, there is no Mantissa and everything is Null.
3. The `delta` operator in the Mantissa should be used applying the incoming
value to the previous one. But, if the Exponent is Null, then there is no
Mantissa, but the previous value is kept.
This causes a bunch of weird, "exception of the rule" dealings:
1. The first record have the field set in the Presence Map and it is read as
"-2". That's the Exponent, reading the mantissa gives the value "1020", so
the whole decimal is "10.20";
2. The second record have the field set in the Presence Map and it is read as
"0". Because the decimal is optional, the exponent is optional, and because
0 is Null, there is no Exponent, and the next value is *not* the Mantissa.
3. The third record have the field set in the Presence Map and it is again,
"-2" for the Exponent and we read the Mantissa. The value read for the
Mantissa is "-20", but instead of assuming that the Mantissa was Null in the
previous record, it uses the first record value, so the Mantissa for this
record is "1000" and the value for the decimal is "10.00".
Another weird thing I saw was related to the way the exchange was ordering the
results. It had a sequence of sell and buy orders in which
1. The first record was the sell order, with an Exponent of 0 and a Mantissa of
"5410". That meant the value is "5410" (pretty straight).
2. The second record was the buy order. It had an Exponent of "-2" and the
Mantissa had an incoming value of 526604. That gives the value of "532014",
but because the Exponent is "-2", the actual value is "5320.14".
3. The weird thing happened in the third record, which was again a sell order.
The value should be exactly the same as the first, but the exchange sent an
Exponent of 0 and a Mantissa of "−526604". With the delta, that would bring
the value back to "5410".
I found it weird that they kept jumping between two different Exponents instead
of using a single one, and at the time I had some issues with the delta math in
my code, so...
---
### Changelog:
2022-01-05: First release.
Loading…
Cancel
Save