Julio Biason
3 years ago
1 changed files with 500 additions and 0 deletions
@ -0,0 +1,500 @@
|
||||
+++ |
||||
title = "Decoding the FAST Protocol" |
||||
date = 2022-01-05 |
||||
|
||||
[taxonomies] |
||||
tags = ["finance", "binary", "protocol", "fix", "fast"] |
||||
+++ |
||||
|
||||
Recently I have to work with a FAST (FIX Adapted for Streaming) and because the |
||||
documentation is scattered around, I decided to put the things I discovered in a |
||||
single place for (my own) future reference. |
||||
|
||||
<!-- more --> |
||||
|
||||
{% note() %} |
||||
Because this is based on my personal experience and I had contact with a single |
||||
instance of this so far, there are some things that are incomplete and/or |
||||
wrong. I'll keep updating this post as I figure out new things. |
||||
|
||||
The changelog is in the end of the post. |
||||
{% end %} |
||||
|
||||
# What is FAST |
||||
|
||||
[FAST](https://en.wikipedia.org/wiki/FAST_protocol) is, basically, a compression |
||||
method for the FIX protocol. |
||||
|
||||
# And What is FIX? |
||||
|
||||
[FIX](https://en.wikipedia.org/wiki/Financial_Information_eXchange) is a |
||||
protocol created for financial institutions to exchange information. Although |
||||
there is nothing "financially" related to it -- you could use the protocol for |
||||
anything, basically -- most financial companies use it. |
||||
|
||||
FIX is a very simple protocol: You have pairs of "field ID" and "value" |
||||
separated by a "`=`" (equal sign) and the pairs are separated by the ASCII char |
||||
with code 1 (which is represented by `^A` in some editors). Some field IDs are |
||||
defined by the protocol (there is a whole list |
||||
[here](https://www.inforeachinc.com/fix-dictionary/index.html)) but each |
||||
exchange can create their own IDs. |
||||
|
||||
For example, if you have MsgType (ID 35) with value "`y`" and Security ID (ID |
||||
48) with value "`123456`", you'd get the message: |
||||
|
||||
``` |
||||
35=y^A48=123456 |
||||
``` |
||||
|
||||
# And Back to FAST |
||||
|
||||
One of the things FAST is designed for is removing duplicate and/or constant |
||||
content. For example, MsgType (ID 35) is the "SecurityList" message, which |
||||
contains information about all the symbols (the their security IDs) handled by |
||||
the exchange. Because the exchange is the same in all the symbols, FAST allows |
||||
defining the fields related to it (Source, field ID 22, and Exchange, field ID |
||||
207) to constant values, so they don't need to be transmitted and, when decoding |
||||
FAST back to FIX, the decoder simply add the constant value. |
||||
|
||||
To know which fields are constant and which are not (and some other information), |
||||
the protocol defines a template, which have a well defined schema, to report |
||||
that information. |
||||
|
||||
# The Template |
||||
|
||||
The template is, as mentioned before, a XML file (which the protocol definition |
||||
doesn't provide any default way to actually receive that field, and thus is left |
||||
for the exchange to find their way) which describes field types, names, IDs and |
||||
operators. |
||||
|
||||
Note that the template describe the field IDs and their types, which the |
||||
incoming data have only the values. If we use the FIX description above, the |
||||
template defines the left side of the pair, while the incoming have have only |
||||
the right side. |
||||
|
||||
## Field Types |
||||
|
||||
The protocol have a few field types: Unsigned Ints of 32 and 64 bits, Signed |
||||
Ints of 32 and 64 bits, ASCII strings, UTF-8 strings, sequences, decimals and a |
||||
type called "presence map". |
||||
|
||||
One thing to note is that all fields use a "stop bit" format. This is quite |
||||
similar to UTF8, although UTF8 uses a "continuation bit" instead of "stop bit", |
||||
but the process of reading is the same: |
||||
|
||||
- Read a byte; |
||||
- Does it have the high order by set to 0? |
||||
- Yes: Keep reading; |
||||
- No: Stop reading the conclude the field value. |
||||
|
||||
## Field definitions |
||||
|
||||
On the template, the fields have their type, name (optional), ID, a presence |
||||
indicator and an operator (optional). |
||||
|
||||
For example, if you have an unsigned int of 32 bits, named "MsgType" with ID |
||||
"35", that would be described in the template as |
||||
|
||||
```xml |
||||
<uInt32 name="MsgType" id="35"/> |
||||
``` |
||||
|
||||
Because there is no indication of presence, it is assumed that the field is |
||||
"mandatory" and should always have a value. On the other hand, if you have a |
||||
field defined as |
||||
|
||||
```xml |
||||
<int32 name="ImpliedMarketIndicator" id="1144" presence="optional"/> |
||||
``` |
||||
|
||||
... then the field may not not have a value. This is also referred as "nullable" |
||||
field. |
||||
|
||||
### Types: Ints |
||||
|
||||
To read an Int, you pick the 7 low order bits (everything except the high order |
||||
one) and move to the resulting variable. If the stop bit is there, you're done; |
||||
if it is not, you shift the result by 7 bits and add the 7 bits from the next |
||||
byte and so on, till you find a byte with the stop bit set. |
||||
|
||||
The 32 and 64 bits only define the maximum value of the field and should not be |
||||
used as "number of bits to be read" -- because of the stop bit. If the value |
||||
exceeds 32 or 64 bits, that is considered an error and the processing should be |
||||
aborted. |
||||
|
||||
Signed Int work exactly the same, but as 2's complement. |
||||
|
||||
For example, if incoming data have the following bytes (in binary, to make it |
||||
easier to read; also, I added a single underscore between each 4 values, also |
||||
to make it easier to read): |
||||
|
||||
``` |
||||
0000_0001 1001_0010 |
||||
``` |
||||
|
||||
... the decoder will read the first byte and see that it doesn't have the high |
||||
order bit set, so it keep just the "1" for the value and shift everything by 7 |
||||
bits. Then the second byte is read; this one have the high order bit set, so |
||||
the remaining bits (in this case "001_0010") are added to the resulting value |
||||
and get `1001_0010` -- or 146. |
||||
|
||||
Negative numbers are represented using 2's so you'd get, for example: |
||||
|
||||
``` |
||||
0000_0011 0111_1110 1110_1110 |
||||
``` |
||||
|
||||
... which, when you remove the high order bits and follow the high order to find |
||||
the stop bit, you get "`1111_1111 0110_1110`", which is -146 (in 16 bits, just |
||||
to make it shorter). |
||||
|
||||
When an integer field is optional, the result must be decremented by 1. The |
||||
reason for this is that, when the field is marked as optional -- also called |
||||
"nullable" -- we need something to differentiate both 0 and Null. So, an |
||||
optional integer with value 0 is, actually, Null; if we have a value of 0, we |
||||
incoming data will have the value 1, which we'll decrement by 1 and become 0. |
||||
|
||||
### Types: Strings |
||||
|
||||
ASCII strings are pretty simple to read: Again, you keep reading the incoming |
||||
data till you find a byte with the high order bit set (again, the stop bit) and |
||||
just convert to their respective ASCII character. |
||||
|
||||
For example |
||||
|
||||
``` |
||||
0100_1000 0110_0101 0110_1100 0110_1100 1110_1111 |
||||
``` |
||||
|
||||
Would generate the bytes 72, 101, 108, 108 and 111, which using the values as |
||||
ASCII codes would result in "Hello". Note that the stop bit here represents "end |
||||
of string" and the bytes should not be grouped like in Ints. |
||||
|
||||
{% note() %} |
||||
So far, I didn't find any UTF8 strings, so I'm not quite sure how to process |
||||
them yet. Surely there is documentation around on how to read those, but since |
||||
this is my personal experience with the protocol, I decided to not mention it |
||||
here. |
||||
{% end %} |
||||
|
||||
Optional strings are Null when the first byte have a stop bit set and every |
||||
other bit is zero. |
||||
|
||||
### Types: Sequences |
||||
|
||||
Sequences are basically arrays. The first field of a sequence is the "length" |
||||
(with the type "`<length>`" in the template) with the number of records |
||||
present. Inside the sequence, you have a list of field definitions, which may |
||||
even include more sequences. |
||||
|
||||
Optional Sequences follow the same idea of optional Ints: You read the length |
||||
and, it is null, there is nothing in the sequence -- and mandatory Sequences can |
||||
be zero. |
||||
|
||||
### Types: Decimals |
||||
|
||||
Decimals are formed by two fields: Exponent and Mantissa. The way it works is |
||||
that if you have an Exponent of "-2" and a Mantissa of "1020", you'd do `1020 * |
||||
10 ^ -2` ("1020 times 10 to the power of -2"), and the actual value is "10.20". |
||||
|
||||
Both Exponent and Mantissa are read as Signed Ints. |
||||
|
||||
An Optional Decimal means the Exponent is optional. The documentation says that |
||||
the Mantissa is always mandatory, but there is a catch: If the Exponent is null, |
||||
then the Mantissa is not present and shouldn't be read; otherwise, you read the |
||||
Mantissa and apply the conversion. |
||||
|
||||
Also, because Exponent and Mantissa are two fields, they can have different |
||||
operators. I'll show some examples after the Operator, mostly because I've seen |
||||
both with different operators and they make a mess to read. |
||||
|
||||
### Type: Presence Map |
||||
|
||||
Presence Maps are used in conjunction with operators. They are read basically |
||||
like you'd read an unsigned int (read bytes till you find the one with the high |
||||
order bit) but do not have any conversion in themselves. Every time you need to |
||||
check if a field is present by checking the presence map, you consume the high |
||||
|
||||
Presence Maps are not present in the template and their presence is implied if |
||||
there is the need for one. For example, in a pure mandatory sequence of fields, |
||||
there will be no presence map at all. |
||||
order bit from it, so it is never used again. |
||||
|
||||
The bits in the Presence Map are in the order of the required fields. For |
||||
example, if a template with: |
||||
|
||||
1. A mandatory field; |
||||
2. A field with an operator that requires the presence map (I'll mention those |
||||
later); |
||||
3. Another mandatory field; |
||||
4. And, finally, another field with operator. |
||||
|
||||
You may receive a Presence Map as `1110_0000`, in which: |
||||
|
||||
1. The first bit is the stop bit, so the decoder assumes this is the last byte |
||||
of the presence map. |
||||
2. The second bit indicates that the first field with operator is present. It |
||||
does *not* represent the mandatory field, 'cause, well, it is mandatory and, |
||||
thus, is always present. |
||||
3. The second bit indicates the second field with an operator. |
||||
|
||||
Again, I'll mention which ones the decoder should be checked in the presence |
||||
map. |
||||
|
||||
## Operators |
||||
|
||||
Operators define a way to deal with some fields. I've seen 5 different types of |
||||
operators: |
||||
|
||||
- No Operator; |
||||
- Constant; |
||||
- Default; |
||||
- Copy; |
||||
- Delta; |
||||
- Increment. |
||||
|
||||
### Operator: No Operator |
||||
|
||||
When there is no operator defined, you have a "no operator" operator. It means |
||||
there is no special way of dealing with the incoming value: You just capture it |
||||
and use it. |
||||
|
||||
When a field have No Operator, there will be no bit in the Presence Map. |
||||
|
||||
### Operator: Constant |
||||
|
||||
A field with the Constant operator will not appear in the incoming data and you |
||||
should assume that its value is the value in the constant. Previously I |
||||
mentioned that a list of securities may have the field 22, "Source", and field |
||||
207, "Exchange", with constant values, they would be defined as |
||||
|
||||
```xml |
||||
<string name="Source" id="22"> |
||||
<constant value="123"/> |
||||
</string> |
||||
<string name="Exchange" id="207"> |
||||
<constant value="EXCHANGE"/> |
||||
</string> |
||||
``` |
||||
|
||||
There is a catch, though: When a constant can be Null (`presence="optional"`), |
||||
then the decoder needs to use the Presence Map bit; if it is set, the constant |
||||
value should be used; if it is not set, then the field value is Null. |
||||
|
||||
The Presence Map should be use only if there is a field with a constant value |
||||
that is optional. |
||||
|
||||
### Operator: Default |
||||
|
||||
The Default operator is similar to the Constant operator, but the decoder needs |
||||
to check the Presence Map; if the bit for the field is set, then you use the |
||||
default value; if it is not set, then the field is Null. |
||||
|
||||
### Operator: Copy |
||||
|
||||
The copy operator indicates that the value for this record have the same value |
||||
of the previous record; if it is the first record, then the value should be |
||||
used. If the Presence Map bit is set for the field, then the decoder must read |
||||
the value in the incoming data; if it is not set, then the previous value should |
||||
be used. In the data I saw, every first record have the bit set, so you get the |
||||
initial/previous value. |
||||
|
||||
An example: You can have a template like |
||||
|
||||
```xml |
||||
<string name="MDReqID" id="262"> |
||||
<copy/> |
||||
</string> |
||||
``` |
||||
|
||||
... and you have the following records and their Presence Maps: |
||||
|
||||
1. The first record have the bit set for this field in the Presence Map and the |
||||
strings reads "first". This record will have this field with the value |
||||
"first". |
||||
2. The second record doesn't have the bit set in the Presence Map. So the |
||||
decoder reuses the previous value and this record will have the field with |
||||
the value "first" (again). |
||||
3. The third record have the bit set again, and the value "second". This is the |
||||
value for the field in this record. |
||||
4. The fourth record doesn't have the bit set and the decoder reuses the value |
||||
"second" for the field. |
||||
|
||||
The Copy operator may have the initial value, so you don't need to read it. For |
||||
example |
||||
|
||||
```xml |
||||
<string name="MDReqID" id="262"> |
||||
<copy value="string"/> |
||||
</string> |
||||
``` |
||||
|
||||
This means that you should use "string" as previous value, even in the first |
||||
field. |
||||
|
||||
As pointed, fields with the Copy operator appear in the Presence Map. |
||||
|
||||
### Operator: Delta |
||||
|
||||
Delta is an operator similar to Copy, but instead of using the value of the |
||||
previous record in this field, the new value must be computed using the previous |
||||
value and the current one. Again, if you have no previous value, then there is |
||||
no operation to be done and the incoming value is the current one. |
||||
|
||||
An example: |
||||
|
||||
```xml |
||||
<uInt32 name="NumberOfOrders" id="346"> |
||||
<delta/> |
||||
</uInt32> |
||||
``` |
||||
|
||||
1. The first record comes with the value of "300". That's the value for the |
||||
field. |
||||
2. The second record comes with the value "2". That should be added in the |
||||
previous value and used, so the field for the second record is "302". |
||||
3. The third record comes with the value "3". Again, you reuse the previous |
||||
value and add the current one. So the field for the third record have the |
||||
value "305". |
||||
|
||||
Fields with the Delta operator do not appear in the Presence Map. |
||||
|
||||
### Operator: Increment |
||||
|
||||
Increment is another operator that works similar to the Copy operator, but if |
||||
its bit is set in the Presence Map, the decoder reads the field value from the |
||||
incoming data; if it is not set, the decoder does not read any data, but reuses |
||||
the previous value with an increment of 1. |
||||
|
||||
Example: |
||||
|
||||
```xml |
||||
<uInt32 name="RptSeq" id="83"> |
||||
<increment/> |
||||
</uInt32> |
||||
``` |
||||
|
||||
1. The first record have the bit set in the Presence Map, the decoder reads the |
||||
value "100". That's the field value for the record. |
||||
2. The second have doesn't have the bit set, so nothing should be read from the |
||||
incoming data, but the field value should be "101" for this record. |
||||
|
||||
Fields with the Increment operator appear in the presence map. |
||||
|
||||
## Presence Map Map |
||||
|
||||
There is a simple map that indicates if a field appears or not in the Presence |
||||
Map, [according to JetTek](https://jettekfix.com/education/fix-fast-tutorial/): |
||||
|
||||
<table> |
||||
<tr> |
||||
<th>Operator</th> |
||||
<th>Appears for Mandatory Fields?</th> |
||||
<th>Appears for Optional Fields?</th> |
||||
</tr> |
||||
<tr> |
||||
<td>No Operator</td> |
||||
<td>No</td> |
||||
<td>No</td> |
||||
</tr> |
||||
<tr> |
||||
<td>Constant</td> |
||||
<td>No, the Constant value should be used</td> |
||||
<td>Yes; if set, use the Constant value; otherwise the field is Null</td> |
||||
</tr> |
||||
<tr> |
||||
<td>Copy</td> |
||||
<td>Yes; if set, use the incoming value is the current value; |
||||
otherwise, use the previous value</td> |
||||
<td>Yes; same as above, but the value can be Null (e.g., it was read as |
||||
0 for Ints or a single Null byte for Strings.</td> |
||||
</tr> |
||||
<tr> |
||||
<td>Default</td> |
||||
<td>Yes; if set, read the value from the incoming data; otherwise, use |
||||
the default value.</td> |
||||
<td>Yes; same as above</td> |
||||
</tr> |
||||
<tr> |
||||
<td>Delta</td> |
||||
<td>No; the value should always be added to the previous one.</td> |
||||
<td>No; same as above</td> |
||||
</tr> |
||||
<tr> |
||||
<td>Increment</td> |
||||
<td>Yes; if set, read the value from the incoming data; otherwise, add |
||||
1 to the previous value.</td> |
||||
<td>Yes; same as above</td> |
||||
</tr> |
||||
</table> |
||||
|
||||
|
||||
# Anomalies |
||||
|
||||
I call "anomaly" anything that I had to spent way too much time to understand. |
||||
|
||||
## Decimals With Different Operators |
||||
|
||||
This is one thing that made things pretty hard to grasp at first. For example: |
||||
|
||||
```xml |
||||
<decimal name="MDEntryPX" id="270" presence="optional"> |
||||
<exponent> |
||||
<default value="0"/> |
||||
</exponent> |
||||
<mantissa> |
||||
<delta/> |
||||
</mantissa> |
||||
</decimal> |
||||
``` |
||||
|
||||
That seems simple. But there are a lot of moving pieces here: |
||||
|
||||
1. The `presence="optional"` in the decimal means that the `exponent` can be |
||||
Null and only that. |
||||
2. The `default` operator in the Exponent means the decoder must check if the |
||||
Exponent have a value or should use the default value of "0". |
||||
|
||||
There is another issue here: If the Presence Map indicates that the value is |
||||
present and the read value is 0, because the Exponent is optional, it should |
||||
be considered Null and, thus, there is no Mantissa and everything is Null. |
||||
3. The `delta` operator in the Mantissa should be used applying the incoming |
||||
value to the previous one. But, if the Exponent is Null, then there is no |
||||
Mantissa, but the previous value is kept. |
||||
|
||||
This causes a bunch of weird, "exception of the rule" dealings: |
||||
|
||||
1. The first record have the field set in the Presence Map and it is read as |
||||
"-2". That's the Exponent, reading the mantissa gives the value "1020", so |
||||
the whole decimal is "10.20"; |
||||
2. The second record have the field set in the Presence Map and it is read as |
||||
"0". Because the decimal is optional, the exponent is optional, and because |
||||
0 is Null, there is no Exponent, and the next value is *not* the Mantissa. |
||||
3. The third record have the field set in the Presence Map and it is again, |
||||
"-2" for the Exponent and we read the Mantissa. The value read for the |
||||
Mantissa is "-20", but instead of assuming that the Mantissa was Null in the |
||||
previous record, it uses the first record value, so the Mantissa for this |
||||
record is "1000" and the value for the decimal is "10.00". |
||||
|
||||
Another weird thing I saw was related to the way the exchange was ordering the |
||||
results. It had a sequence of sell and buy orders in which |
||||
|
||||
1. The first record was the sell order, with an Exponent of 0 and a Mantissa of |
||||
"5410". That meant the value is "5410" (pretty straight). |
||||
2. The second record was the buy order. It had an Exponent of "-2" and the |
||||
Mantissa had an incoming value of 526604. That gives the value of "532014", |
||||
but because the Exponent is "-2", the actual value is "5320.14". |
||||
3. The weird thing happened in the third record, which was again a sell order. |
||||
The value should be exactly the same as the first, but the exchange sent an |
||||
Exponent of 0 and a Mantissa of "−526604". With the delta, that would bring |
||||
the value back to "5410". |
||||
|
||||
I found it weird that they kept jumping between two different Exponents instead |
||||
of using a single one, and at the time I had some issues with the delta math in |
||||
my code, so... |
||||
|
||||
--- |
||||
|
||||
### Changelog: |
||||
|
||||
2022-01-05: First release. |
Loading…
Reference in new issue