From bdffa9a5b32bda155d9384b91148383e19879ac6 Mon Sep 17 00:00:00 2001 From: Julio Biason Date: Wed, 5 Jan 2022 21:30:32 -0300 Subject: [PATCH] FAST explanation --- content/thoughts/decoding-fast.md | 500 ++++++++++++++++++++++++++++++ 1 file changed, 500 insertions(+) create mode 100644 content/thoughts/decoding-fast.md diff --git a/content/thoughts/decoding-fast.md b/content/thoughts/decoding-fast.md new file mode 100644 index 0000000..7a53298 --- /dev/null +++ b/content/thoughts/decoding-fast.md @@ -0,0 +1,500 @@ ++++ +title = "Decoding the FAST Protocol" +date = 2022-01-05 + +[taxonomies] +tags = ["finance", "binary", "protocol", "fix", "fast"] ++++ + +Recently I have to work with a FAST (FIX Adapted for Streaming) and because the +documentation is scattered around, I decided to put the things I discovered in a +single place for (my own) future reference. + + + +{% note() %} +Because this is based on my personal experience and I had contact with a single +instance of this so far, there are some things that are incomplete and/or +wrong. I'll keep updating this post as I figure out new things. + +The changelog is in the end of the post. +{% end %} + +# What is FAST + +[FAST](https://en.wikipedia.org/wiki/FAST_protocol) is, basically, a compression +method for the FIX protocol. + +# And What is FIX? + +[FIX](https://en.wikipedia.org/wiki/Financial_Information_eXchange) is a +protocol created for financial institutions to exchange information. Although +there is nothing "financially" related to it -- you could use the protocol for +anything, basically -- most financial companies use it. + +FIX is a very simple protocol: You have pairs of "field ID" and "value" +separated by a "`=`" (equal sign) and the pairs are separated by the ASCII char +with code 1 (which is represented by `^A` in some editors). Some field IDs are +defined by the protocol (there is a whole list +[here](https://www.inforeachinc.com/fix-dictionary/index.html)) but each +exchange can create their own IDs. + +For example, if you have MsgType (ID 35) with value "`y`" and Security ID (ID +48) with value "`123456`", you'd get the message: + +``` +35=y^A48=123456 +``` + +# And Back to FAST + +One of the things FAST is designed for is removing duplicate and/or constant +content. For example, MsgType (ID 35) is the "SecurityList" message, which +contains information about all the symbols (the their security IDs) handled by +the exchange. Because the exchange is the same in all the symbols, FAST allows +defining the fields related to it (Source, field ID 22, and Exchange, field ID +207) to constant values, so they don't need to be transmitted and, when decoding +FAST back to FIX, the decoder simply add the constant value. + +To know which fields are constant and which are not (and some other information), +the protocol defines a template, which have a well defined schema, to report +that information. + +# The Template + +The template is, as mentioned before, a XML file (which the protocol definition +doesn't provide any default way to actually receive that field, and thus is left +for the exchange to find their way) which describes field types, names, IDs and +operators. + +Note that the template describe the field IDs and their types, which the +incoming data have only the values. If we use the FIX description above, the +template defines the left side of the pair, while the incoming have have only +the right side. + +## Field Types + +The protocol have a few field types: Unsigned Ints of 32 and 64 bits, Signed +Ints of 32 and 64 bits, ASCII strings, UTF-8 strings, sequences, decimals and a +type called "presence map". + +One thing to note is that all fields use a "stop bit" format. This is quite +similar to UTF8, although UTF8 uses a "continuation bit" instead of "stop bit", +but the process of reading is the same: + +- Read a byte; +- Does it have the high order by set to 0? + - Yes: Keep reading; + - No: Stop reading the conclude the field value. + +## Field definitions + +On the template, the fields have their type, name (optional), ID, a presence +indicator and an operator (optional). + +For example, if you have an unsigned int of 32 bits, named "MsgType" with ID +"35", that would be described in the template as + +```xml + +``` + +Because there is no indication of presence, it is assumed that the field is +"mandatory" and should always have a value. On the other hand, if you have a +field defined as + +```xml + +``` + +... then the field may not not have a value. This is also referred as "nullable" +field. + +### Types: Ints + +To read an Int, you pick the 7 low order bits (everything except the high order +one) and move to the resulting variable. If the stop bit is there, you're done; +if it is not, you shift the result by 7 bits and add the 7 bits from the next +byte and so on, till you find a byte with the stop bit set. + +The 32 and 64 bits only define the maximum value of the field and should not be +used as "number of bits to be read" -- because of the stop bit. If the value +exceeds 32 or 64 bits, that is considered an error and the processing should be +aborted. + +Signed Int work exactly the same, but as 2's complement. + +For example, if incoming data have the following bytes (in binary, to make it +easier to read; also, I added a single underscore between each 4 values, also +to make it easier to read): + +``` +0000_0001 1001_0010 +``` + +... the decoder will read the first byte and see that it doesn't have the high +order bit set, so it keep just the "1" for the value and shift everything by 7 +bits. Then the second byte is read; this one have the high order bit set, so +the remaining bits (in this case "001_0010") are added to the resulting value +and get `1001_0010` -- or 146. + +Negative numbers are represented using 2's so you'd get, for example: + +``` +0000_0011 0111_1110 1110_1110 +``` + +... which, when you remove the high order bits and follow the high order to find +the stop bit, you get "`1111_1111 0110_1110`", which is -146 (in 16 bits, just +to make it shorter). + +When an integer field is optional, the result must be decremented by 1. The +reason for this is that, when the field is marked as optional -- also called +"nullable" -- we need something to differentiate both 0 and Null. So, an +optional integer with value 0 is, actually, Null; if we have a value of 0, we +incoming data will have the value 1, which we'll decrement by 1 and become 0. + +### Types: Strings + +ASCII strings are pretty simple to read: Again, you keep reading the incoming +data till you find a byte with the high order bit set (again, the stop bit) and +just convert to their respective ASCII character. + +For example + +``` +0100_1000 0110_0101 0110_1100 0110_1100 1110_1111 +``` + +Would generate the bytes 72, 101, 108, 108 and 111, which using the values as +ASCII codes would result in "Hello". Note that the stop bit here represents "end +of string" and the bytes should not be grouped like in Ints. + +{% note() %} +So far, I didn't find any UTF8 strings, so I'm not quite sure how to process +them yet. Surely there is documentation around on how to read those, but since +this is my personal experience with the protocol, I decided to not mention it +here. +{% end %} + +Optional strings are Null when the first byte have a stop bit set and every +other bit is zero. + +### Types: Sequences + +Sequences are basically arrays. The first field of a sequence is the "length" +(with the type "``" in the template) with the number of records +present. Inside the sequence, you have a list of field definitions, which may +even include more sequences. + +Optional Sequences follow the same idea of optional Ints: You read the length +and, it is null, there is nothing in the sequence -- and mandatory Sequences can +be zero. + +### Types: Decimals + +Decimals are formed by two fields: Exponent and Mantissa. The way it works is +that if you have an Exponent of "-2" and a Mantissa of "1020", you'd do `1020 * +10 ^ -2` ("1020 times 10 to the power of -2"), and the actual value is "10.20". + +Both Exponent and Mantissa are read as Signed Ints. + +An Optional Decimal means the Exponent is optional. The documentation says that +the Mantissa is always mandatory, but there is a catch: If the Exponent is null, +then the Mantissa is not present and shouldn't be read; otherwise, you read the +Mantissa and apply the conversion. + +Also, because Exponent and Mantissa are two fields, they can have different +operators. I'll show some examples after the Operator, mostly because I've seen +both with different operators and they make a mess to read. + +### Type: Presence Map + +Presence Maps are used in conjunction with operators. They are read basically +like you'd read an unsigned int (read bytes till you find the one with the high +order bit) but do not have any conversion in themselves. Every time you need to +check if a field is present by checking the presence map, you consume the high + +Presence Maps are not present in the template and their presence is implied if +there is the need for one. For example, in a pure mandatory sequence of fields, +there will be no presence map at all. +order bit from it, so it is never used again. + +The bits in the Presence Map are in the order of the required fields. For +example, if a template with: + +1. A mandatory field; +2. A field with an operator that requires the presence map (I'll mention those + later); +3. Another mandatory field; +4. And, finally, another field with operator. + +You may receive a Presence Map as `1110_0000`, in which: + +1. The first bit is the stop bit, so the decoder assumes this is the last byte + of the presence map. +2. The second bit indicates that the first field with operator is present. It + does *not* represent the mandatory field, 'cause, well, it is mandatory and, + thus, is always present. +3. The second bit indicates the second field with an operator. + +Again, I'll mention which ones the decoder should be checked in the presence +map. + +## Operators + +Operators define a way to deal with some fields. I've seen 5 different types of +operators: + +- No Operator; +- Constant; +- Default; +- Copy; +- Delta; +- Increment. + +### Operator: No Operator + +When there is no operator defined, you have a "no operator" operator. It means +there is no special way of dealing with the incoming value: You just capture it +and use it. + +When a field have No Operator, there will be no bit in the Presence Map. + +### Operator: Constant + +A field with the Constant operator will not appear in the incoming data and you +should assume that its value is the value in the constant. Previously I +mentioned that a list of securities may have the field 22, "Source", and field +207, "Exchange", with constant values, they would be defined as + +```xml + + + + + + +``` + +There is a catch, though: When a constant can be Null (`presence="optional"`), +then the decoder needs to use the Presence Map bit; if it is set, the constant +value should be used; if it is not set, then the field value is Null. + +The Presence Map should be use only if there is a field with a constant value +that is optional. + +### Operator: Default + +The Default operator is similar to the Constant operator, but the decoder needs +to check the Presence Map; if the bit for the field is set, then you use the +default value; if it is not set, then the field is Null. + +### Operator: Copy + +The copy operator indicates that the value for this record have the same value +of the previous record; if it is the first record, then the value should be +used. If the Presence Map bit is set for the field, then the decoder must read +the value in the incoming data; if it is not set, then the previous value should +be used. In the data I saw, every first record have the bit set, so you get the +initial/previous value. + +An example: You can have a template like + +```xml + + + +``` + +... and you have the following records and their Presence Maps: + +1. The first record have the bit set for this field in the Presence Map and the + strings reads "first". This record will have this field with the value + "first". +2. The second record doesn't have the bit set in the Presence Map. So the + decoder reuses the previous value and this record will have the field with + the value "first" (again). +3. The third record have the bit set again, and the value "second". This is the + value for the field in this record. +4. The fourth record doesn't have the bit set and the decoder reuses the value + "second" for the field. + +The Copy operator may have the initial value, so you don't need to read it. For +example + +```xml + + + +``` + +This means that you should use "string" as previous value, even in the first +field. + +As pointed, fields with the Copy operator appear in the Presence Map. + +### Operator: Delta + +Delta is an operator similar to Copy, but instead of using the value of the +previous record in this field, the new value must be computed using the previous +value and the current one. Again, if you have no previous value, then there is +no operation to be done and the incoming value is the current one. + +An example: + +```xml + + + +``` + +1. The first record comes with the value of "300". That's the value for the + field. +2. The second record comes with the value "2". That should be added in the + previous value and used, so the field for the second record is "302". +3. The third record comes with the value "3". Again, you reuse the previous + value and add the current one. So the field for the third record have the + value "305". + +Fields with the Delta operator do not appear in the Presence Map. + +### Operator: Increment + +Increment is another operator that works similar to the Copy operator, but if +its bit is set in the Presence Map, the decoder reads the field value from the +incoming data; if it is not set, the decoder does not read any data, but reuses +the previous value with an increment of 1. + +Example: + +```xml + + + +``` + +1. The first record have the bit set in the Presence Map, the decoder reads the + value "100". That's the field value for the record. +2. The second have doesn't have the bit set, so nothing should be read from the + incoming data, but the field value should be "101" for this record. + +Fields with the Increment operator appear in the presence map. + +## Presence Map Map + +There is a simple map that indicates if a field appears or not in the Presence +Map, [according to JetTek](https://jettekfix.com/education/fix-fast-tutorial/): + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperatorAppears for Mandatory Fields?Appears for Optional Fields?
No OperatorNoNo
ConstantNo, the Constant value should be usedYes; if set, use the Constant value; otherwise the field is Null
CopyYes; if set, use the incoming value is the current value; + otherwise, use the previous valueYes; same as above, but the value can be Null (e.g., it was read as + 0 for Ints or a single Null byte for Strings.
DefaultYes; if set, read the value from the incoming data; otherwise, use + the default value.Yes; same as above
DeltaNo; the value should always be added to the previous one.No; same as above
IncrementYes; if set, read the value from the incoming data; otherwise, add + 1 to the previous value.Yes; same as above
+ + +# Anomalies + +I call "anomaly" anything that I had to spent way too much time to understand. + +## Decimals With Different Operators + +This is one thing that made things pretty hard to grasp at first. For example: + +```xml + + + + + + + + +``` + +That seems simple. But there are a lot of moving pieces here: + +1. The `presence="optional"` in the decimal means that the `exponent` can be + Null and only that. +2. The `default` operator in the Exponent means the decoder must check if the + Exponent have a value or should use the default value of "0". + + There is another issue here: If the Presence Map indicates that the value is + present and the read value is 0, because the Exponent is optional, it should + be considered Null and, thus, there is no Mantissa and everything is Null. +3. The `delta` operator in the Mantissa should be used applying the incoming + value to the previous one. But, if the Exponent is Null, then there is no + Mantissa, but the previous value is kept. + +This causes a bunch of weird, "exception of the rule" dealings: + +1. The first record have the field set in the Presence Map and it is read as + "-2". That's the Exponent, reading the mantissa gives the value "1020", so + the whole decimal is "10.20"; +2. The second record have the field set in the Presence Map and it is read as + "0". Because the decimal is optional, the exponent is optional, and because + 0 is Null, there is no Exponent, and the next value is *not* the Mantissa. +3. The third record have the field set in the Presence Map and it is again, + "-2" for the Exponent and we read the Mantissa. The value read for the + Mantissa is "-20", but instead of assuming that the Mantissa was Null in the + previous record, it uses the first record value, so the Mantissa for this + record is "1000" and the value for the decimal is "10.00". + +Another weird thing I saw was related to the way the exchange was ordering the +results. It had a sequence of sell and buy orders in which + +1. The first record was the sell order, with an Exponent of 0 and a Mantissa of + "5410". That meant the value is "5410" (pretty straight). +2. The second record was the buy order. It had an Exponent of "-2" and the + Mantissa had an incoming value of 526604. That gives the value of "532014", + but because the Exponent is "-2", the actual value is "5320.14". +3. The weird thing happened in the third record, which was again a sell order. + The value should be exactly the same as the first, but the exchange sent an + Exponent of 0 and a Mantissa of "−526604". With the delta, that would bring + the value back to "5410". + +I found it weird that they kept jumping between two different Exponents instead +of using a single one, and at the time I had some issues with the delta math in +my code, so... + +--- + +### Changelog: + +2022-01-05: First release.