blog/index.md at a0f3cfbef0db4ea85ef9120396b39840fb6946b2 - blog

The source content for blog.juliobiason.me

3.1 KiB

Raw Blame History

+++ title = "Things I Learnt The Hard Way - If Your Data Has a Schema, Use a Structure" date = 2019-06-25

[taxonomies] tags = ["books", "things i learnt", "data classes", "structs"] +++

You may be tempted to use a list (or tuple, if your language allows) to keep your data if it has, say, only 2 fields. Don't.

Some languages allow unstructured data to be kept in the format of tuples: They act like lists, but you can use to store heterogeneous data (which is a cute way of "it stores fields of different types").

This languages also allow you to "destructurize" them, so you can extract elements from them without directly accessing them by index.

For example, in Python, you can have a tuple with:

a_tuple = ('A String', 1, 7.5)

And you can destructure it with

some_string, an_integer, a_float = a_tuple

See? It's simple! You don't need to create a whole structure if you're just passing a string, an integer and a float around.

Except, you do need a structure 'cause your data has a schema.

Tuples and destructuring should be used only when you need to pass data from one function to another -- and barely that. When you have this tuple being passed around, being destructured and created again -- say, you are adding one value of the tuple to another value and producing a new tuple in the same format -- then you have a structured -- and schemaed data.

And when you have a structured data, you must use a data class or a struct (or even NamedTuples, if you're using Python).

Although it may look way more simpler to keep destructuring and building the tuple over and over, in the long run you'll end up with a mess: a simple change -- like adding a new field -- will require checking every destructuring and every creation of the tuple to make sure if will stay in the same shape every time.

So: You data has a schema? Use a Data Class or Class or Struct. Only if it is schemaless, then you can use a tuple.

I've seen this used at least once. At the very start of the project, it may seem easier to just store the data as a tuple and destructure it and build it again when needed. There was even a whole module designed to receiving tuples, destructure them and rebuild new ones (for example, a function that would receive two tuples and compute the sum of the "value" field of each, building a new tuple as a result). But because of this design, to add just a new field, I had to change 14 files and do 168 changes around -- 'cause there was a function to add two tuples, but there were points where you need just one field, and there wasn't a function for it.

It would be easier to use if there were functions to extract each field, and add two tuples, and what else was needed for managing the tuples, but then you have to ask yourself: Why not use a class for that?

{{ chapters(prev_chapter_link="/books/things-i-learnt/data-types", prev_chapter_title="Types Say What Your Data Is", next_chapter_link="/books/things-i-learnt/outside-project", next_chapter_title="Don't Mess With Things Outside Your Project") }}

3.1 KiB Raw Blame History

3.1 KiB

Raw Blame History