Browse Source

Merge branch 'release/20190701'

master 20190701
Julio Biason 5 years ago
parent
commit
f58313248e
  1. 11
      content/books/things-i-learnt/_index.md
  2. 2
      content/books/things-i-learnt/cargo-cult/index.md
  3. 57
      content/books/things-i-learnt/cognitive-cost/index.md
  4. 35
      content/books/things-i-learnt/data-flow/index.md
  5. 68
      content/books/things-i-learnt/functional-programming/index.md
  6. 2
      content/books/things-i-learnt/gherkin/index.md
  7. 2
      content/books/things-i-learnt/integration-tests/index.md
  8. 2
      content/books/things-i-learnt/languages-are-more/index.md
  9. 92
      content/books/things-i-learnt/magical-number-seven/index.md
  10. 35
      content/books/things-i-learnt/outside-project/index.md
  11. 38
      content/books/things-i-learnt/patterns-not-solutions/index.md
  12. 31
      content/books/things-i-learnt/resist-easy/index.md
  13. 24
      content/books/things-i-learnt/right-tool-agenda/index.md
  14. 29
      content/books/things-i-learnt/right-tool-obvious/index.md
  15. 16
      content/books/things-i-learnt/use-structures/index.md
  16. 40
      content/books/things-i-learnt/use-timezones/index.md
  17. 55
      content/books/things-i-learnt/use-utf8/index.md

11
content/books/things-i-learnt/_index.md

@ -12,6 +12,11 @@ template = "section-contentless.html"
* [Spec First, Then Code](spec-first)
* [Write Steps as Comments](steps-as-comments)
* [Gherkin Is Your Friend to Understand Expectations](gherkin)
* [Design Patters Are Used to Name Solution, Not Find Them](patterns-not-solutions)
* [Thinking Data Flow Beats Patterns](data-flow)
* [The Magic Number Seven, Plus Or Minus Two](magical-number-seven)
* [Cognitive Cost Is The Readability Killer](cognitive-cost)
* [Learn The Basics of Functional Programming](functional-programming)
* Testing Software
* [Unit Tests Are Good, Integration Tests Are Gooder](integration-tests)
* [Testing Every Function Creates Dead Code](tests-dead-code)
@ -32,6 +37,12 @@ template = "section-contentless.html"
* [If You Know How To Handle It, Handle It](handle-it)
* [Types Say What Your Data Is](data-types)
* [If Your Data Has a Schema, Use a Structure](use-structures)
* [Don't Mess With Things Outside Your Project](outside-project)
* [Resist The Temptation Of Easy](resist-easy)
* [Always Use Timezones With Your Dates](use-timezones)
* [Always Use UTF-8 For Your Strings](use-utf8)
* Community/Teams
* [A Language Is Much More Than A Language](languages-are-more)
* [Understand And Stay Away From Cargo Cult](cargo-cult)
* ["Right Tool For The Job" Is Just To Push An Agenda](right-tool-agenda)
* [The Right Tool Is More Obvious Than You Think](right-tool-obvious)

2
content/books/things-i-learnt/cargo-cult/index.md

@ -38,4 +38,4 @@ ProductX good, or even if it fits your solution. And there is much more
[behind a product](/books/things-i-learnt/languages-are-more) than just its
development.
{{ chapters(prev_chapter_link="/books/things-i-learnt/languages-are-more", prev_chapter_title="A Language Is Much More Than A Language") }}
{{ chapters(prev_chapter_link="/books/things-i-learnt/languages-are-more", prev_chapter_title="A Language Is Much More Than A Language", next_chapter_link="/books/things-i-learnt/right-tool-agenda", next_chapter_title="Right Tool For The Job Is Just To Push An Agenda") }}

57
content/books/things-i-learnt/cognitive-cost/index.md

@ -0,0 +1,57 @@
+++
title = "Things I Learnt The Hard Way - Cognitive Cost Is The Readability Killer"
date = 2019-06-26
[taxonomies]
tags = ["en-au", "books", "things i learnt", "cognitive dissonance", "cognitive cost"]
+++
"[Cognitive dissonance](https://en.wikipedia.org/wiki/Cognitive_dissonance)"
is a fancy way of saying "I need to remember two (or more) different and
contradicting things at the same time to understand this." Keeping those
different things in your head creates a cost and it keeps accumulating the
more indirect the things are ('cause you'll have to keep all those in your
head).
<!-- more -->
(Disclaimer: I like to use the expression "cognitive dissonance" to make me
sound smarter. I usually explain what it means, though.)
To give you an example of a (very mild) cognitive cost, I'll show you this:
* You have a function called `sum()`. It does the sum of the numbers of a
list.
* You have another function, called `is_pred()`. It gets a value and, if it
fits the predicate -- a test, basically -- returns True; otherwise,
returns False.
So, pretty simple, right? One function sums numbers and another returns a
boolean.
Now, what would you say if I shown you this, in Python:
```python
sum(is_pred(x) for x in my_list)
```
Wait, didn't I say that `sum()` sums numbers? And that `is_pred()` returns a
boolean? How can I sum booleans? What's the expected result of True + True +
False?
Sadly, this works. Because someone, long time ago, didn't think booleans were
worth a thing and used an integer instead. And everyone else since then did
the same stupid mistake.
But, for you, you'll now read a line that says "summing a boolean list returns
a number". And that's two different, disparate things that you suddenly have
to keep in mind when reading that line.
That's why [types are important](/books/things-i-learnt/data-types) are
important. Also, this may sound a bit like [the magical number
seven](/books/things-i-learnt/magical-number-seven), 'cause you have to keep
two things at your mind at the same thing but, although that's not near seven,
they are not the same, with opposite (for weird meanings of "opposite", in this
case) meanings.
{{ chapters(prev_chapter_link="/books/things-i-learnt/magical-number-seven", prev_chapter_title="The Magic Number Seven, Plus Or Minus Two", next_chapter_link="/books/things-i-learnt/functional-programming", next_chapter_title="Learn The Basics of Functional Programming") }}

35
content/books/things-i-learnt/data-flow/index.md

@ -0,0 +1,35 @@
+++
title = "Things I Learnt The Hard Way - Thinking Data Flow Beats Patterns"
date = 2019-06-26
[taxonomies]
tags = ["en-au", "books", "things i learnt", "data flow", "design patterns"]
+++
When you're trying to find a solution to your problem, think on the way the
data will flow through your code.
<!-- more -->
Instead of focusing on design patterns, a better way is to think the way the
data will flow -- and be transformed -- on your code.
For example, the user will input a number. You'll get this number and find the
respective record on the database. This is a transformation -- no, it's not
"I'll get the number and receive a complete different thing based upon it",
you're actually transforming the number into a record, using the database as a
transformation.
(Yes, I know, it's not that clear at the first glance, but you have to think
that they are the same data with different representations.)
Most of the time I did that, I managed to come with more clear design for my
applications. I didn't even think about how many functions/classes it would be
needed to do these kind of transformations, that was something I came up
_after_ I could see the data flow.
In a way, this way of thinking gets things more clear 'cause you have a list
of steps of transformations you need to do, so you can write them one after
another, which prevents a lot of bad code in the future.
{{ chapters(prev_chapter_link="/books/things-i-learnt/patterns-not-solutions", prev_chapter_title="Design Patters Are Used to Name Solution, Not Find Them", next_chapter_link="/books/things-i-learnt/magical-number-seven", next_chapter_title="The Magic Number Seven, Plus Or Minus Two") }}

68
content/books/things-i-learnt/functional-programming/index.md

@ -0,0 +1,68 @@
+++
title = "Things I Learnt The Hard Way - Learn The Basics of Functional Programming"
date = 2019-06-26
[taxonomies]
tags = ["en-au", "books", "things i learnt", "functional programming"]
+++
At this point, you should at least have hard about how cool functional
programming is. There are a lot of concepts here, but at least the very basic
ones you should keep in mind.
<!-- more -->
A lot of talks about functional programming come with weird words like
"functors" and "monads". It doesn't hurt to know what they really mean
(disclaimer: I still don't). But some other stuff coming from functional
programming is actually easy to understand and grasp.
For example, immutability. This means that all your data can't change once
it's created. You have a record with user information and the user changed
this password? No, do not change the password field, create a new user record
with the updated password and discard the old one. Sure, it creates a lot of
create and destroy sequences which makes absolutely no sense (why would you
allocate memory for a new user, copy everything from the old one to the new
one, update one field, and deallocate the memory from the old one? It makes no
sense!) but, in the long run, it would prevent weird results, specially when
you understand and start use threads.
(Basically, you're avoiding a shared state -- the memory -- between parts of
your code.)
Another useful concept is pure functions. Pure functions are functions that,
called with the same parameters, always return the same result, no matter how
many times you call them. One example of a _non_ pure function is `random()`:
each time you call `random()`, you get a different number[^1]. An example of a
pure function would be something like this in Python:
```python
def mult(x):
return x * 4
```
No matter how many times you call `mult(2)`, it will always return 8. Another
example could be our immutable password change above: You could easily write a
function that receives a user record and returns a new user record with the
password changed. You could call with the same record over and over again and
it will always return the same resulting record.
Pure functions are useful 'cause they are, first most, easy to test.
Second, they are easy to chain, specially in a [data
flow](/books/things-i-learnt/data-flow) design: Because they don't have an
internal state (which is the real reason they are called pure functions), you
can easily call one after the other and no matter how many times you pass
things around, they still produce the same result. And because each function,
given the same input, produce the same result, chaining them all _also_
produces the same result given the same inputs.
Just those two concepts can make code longer (again, you're creating a new
user record instead of simply changing one field), but the final result is a
more robust code.
[^1]: Except in Haskell, but it does require sending the seed every time, so
you end up with random values based on the seed, so even there it is a pure
function.
{{ chapters(prev_chapter_link="/books/things-i-learnt/cognitive-cost", prev_chapter_title="Cognitive Cost Is The Readability Killer", next_chapter_link="/books/things-i-learnt/integration-tests", next_chapter_title="Unit Tests Are Good, Integration Tests Are Gooder") }}

2
content/books/things-i-learnt/gherkin/index.md

@ -51,4 +51,4 @@ system, you can get a better picture of the whole.
Also, you may not like to write specs. That's alright, you can replace them
with Gherkin anyway.
{{ chapters(prev_chapter_link="/books/things-i-learnt/steps-as-comments", prev_chapter_title="Write Steps as Comments", next_chapter_link="/books/things-i-learnt/integration-tests", next_chapter_title="Unit Tests Are Good, Integration Tests Are Gooder") }}
{{ chapters(prev_chapter_link="/books/things-i-learnt/steps-as-comments", prev_chapter_title="Write Steps as Comments", next_chapter_link="/books/things-i-learnt/patterns-not-solutions", next_chapter_title="Design Patters Are Used to Name Solution, Not Find Them") }}

2
content/books/things-i-learnt/integration-tests/index.md

@ -66,4 +66,4 @@ parts.
[^1]: There is no "unit" in "unit tests". "Unit test" means the test _is_ a
unit, indivisible and dependent only on itself.
{{ chapters(prev_chapter_link="/books/things-i-learnt/gherkin", prev_chapter_title="Gherkin Is Your Friend to Understand Expectations", next_chapter_title="Testing Every Function Creates Dead Code", next_chapter_link="/books/things-i-learnt/tests-dead-code") }}
{{ chapters(prev_chapter_link="/books/things-i-learnt/functional-programming", prev_chapter_title="Learn The Basics of Functional Programming", next_chapter_title="Testing Every Function Creates Dead Code", next_chapter_link="/books/things-i-learnt/tests-dead-code") }}

2
content/books/things-i-learnt/languages-are-more/index.md

@ -39,4 +39,4 @@ surface of what the whole of a language encapsulates and if you ignore the
other elements in it, you may find yourself with a cute language in a
community that is always fighting and never going forward.
{{ chapters(prev_chapter_link="/books/things-i-learnt/use-structures", prev_chapter_title="If Your Data Has a Schema, Use a Structure", next_chapter_link="/books/things-i-learnt/cargo-cult", next_chapter_title="Understand And Stay Away From Cargo Cult") }}
{{ chapters(prev_chapter_link="/books/things-i-learnt/use-utf8", prev_chapter_title="Always Use UTF-8 For Your Strings", next_chapter_link="/books/things-i-learnt/outside-project", next_chapter_title="Don't Mess With Things Outside Your Project") }}

92
content/books/things-i-learnt/magical-number-seven/index.md

@ -0,0 +1,92 @@
+++
title = "Things I Learnt The Hard Way - The Magical Number Seven, Plus Or Minus Two"
date = 2019-06-26
[taxonomies]
tags = ["en-au", "books", "things i learnt", "complexity"]
+++
"[The magical number](https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two)"
is a psychology article about the number of things one can keep in their mind
at the same time.
<!-- more -->
I've seen twice this weird construction on where a function would do some
processing, but its return value was the return of a second function and
some bit of processing. Nothing major. But the second function would also do
some processing and call a third function. And the third function would call a
fourth. And the fourth a fifth. And the fifth, a sixth function.
Something like this
```
func_1
+-- func_2
+-- func_3
+-- func_4
+-- func_5
+-- func6
```
Now, when you're trying to understand this kind of code to find a problem,
you'll have to keep in mind what the first, second, third, fourth, fifth and
sixth functions do, 'cause they are all calling each other (inside them).
This causes some serious mental overflow that shouldn't be necessary.
Not only that, but imagine that you put a log before and after `func_1`: The
log before points the data that's being send to func_1, and the log after its
result.
So you'd end up with the impression that `func_1` does a lot of stuff, when it
actually is passing the transformation along.
(I got a weird experience with a function called `expand`, which logging
before the call would show some raw, compressed data, but the after was not
the expanded data, but actually a list of already processed data from the
compressed data.)
What would be a better solution, you may ask?
Well, if instead of making `func_1` call `func_2`, you can make it return the
result (which may not be the final result, anyway) and _then_ call `func_2`
with that result.
Something like:
```
result1 = func_1
result2 = func_2(result1)
result3 = func_3(result2)
result4 = func_4(result3)
result5 = func_5(result4)
result6 = func_6(result5)
result7 = func_7(result6)
```
Now you can see _exactly_ how the data is being transfomed -- and, obviously,
the functions would have better names, like `expand`, `break_lines`,
`name_fields` and so on, so you can see that that compressed data I mentioned
before is actually being decompressed, the content is being broke line by
line, the lines are getting names in its fields and so on (and one could even
claim that it would make things clear if there was a function after
`break_lines` which would just `break_fields`, which would make `name_fields`
more obvious -- and in a construction like this it would be almost trivial to
add this additional step).
"But that isn't performant!" someone may cry. Well, maybe it's just a bit less
performant than the original chained-calls ('cause it wouldn't create and
destroy frames in the stack, it would just pile them up and then unstack them
all in the end), but heck, optimization is for compilers, not people. Your job
is to make the code _readable_ and _understandable_. If you need performance,
you can think of a better sequence of steps, not some "let's make this a mess
to read" solution.
Just a quick note: Although the famous paper mentions that the number is
around 7, new research is actually pointing that the number is way lower than
that, at 4. So simply making `func_1` call `func_2`, which would call
`func_3`, which would call `func_4` may be enough to overload someone and make
them lose the track of what the code does.
{{ chapters(prev_chapter_link="/books/things-i-learnt/data-flow", prev_chapter_title="The Magic Number Seven, Plus Or Minus Two", next_chapter_link="/books/things-i-learnt/cognitive-cost", next_chapter_title="Cognitive Cost Is The Readability Killer") }}

35
content/books/things-i-learnt/outside-project/index.md

@ -0,0 +1,35 @@
+++
title = "Things I Learnt The Hard Way - Don't Mess With Things Outside Your Project"
date = 2019-06-25
[taxonomies]
tags = ["en-au", "books", "things i learnt", "frameworks"]
+++
Simple rule: Is the code yours or from your team? Good, go break it. Does it
come from outside? DON'T. TOUCH. IT.
<!-- more -->
Sometimes people are tempted to, instead of using the proper extension tools,
change external libraries/frameworks -- for example, making changes directly
into WordPress or Django. Believe me, I've seen my fair share of this kind of
stuff going around.
This is an easy way to make the project -- the team project, that is --
a huge security problem. As soon as a new version is released, you'll -- or,
better yet, someone who was not the person who decided to mess with outside
code -- have to keep up your changes in sync with the main project and, pretty
soon, you'll find that the changes don't apply anymore and you'll leave the
external project in an old version, full of security bugs.
Not only you'd end up with something that may very soon put at risk your whole
infrastructure, you won't take any benefits from things in the new versions,
'cause hey, you're stuck in the broken version!
Sometimes doing it so is faster and cheaper, and if you would do the same
thing using extensions or actually coding around the problem, even duplicating
the framework functions, would probably take longer and make you write more
code, but in the long run, it's worth the time.
{{ chapters(prev_chapter_link="/books/things-i-learnt/use-structures", prev_chapter_title="If Your Data Has a Schema, Use a Structure", next_chapter_link="/books/things-i-learnt/resist-easy", next_chapter_title="Resist The Temptation Of Easy") }}

38
content/books/things-i-learnt/patterns-not-solutions/index.md

@ -0,0 +1,38 @@
+++
title = "Things I Learnt The Hard Way - Design Patters Are Used to Name Solution, Not Find Them"
date = 2019-06-25
[taxonomies]
tags = ["en-au", "books", "things i learnt", "design patterns"]
+++
Most of the times I saw design patterns being applied, they were applied as a
way to find a solution, so you end up twisting a solution -- and, sometimes,
the problem it self -- to fit the pattern.
<!-- more -->
My guess is that the heavy use of "let's apply _this_ design pattern" before
even understanding the problem -- or even trying to solve it -- comes as a
form of [cargo cult](/books/things-i-learnt/cargo-cult): I heard people used
this pattern and solved their problem, so let's use it too and it will solve
our problem. Or, worse: Design pattern is described by _Famous Person_, so we
must use it.
Here is the thing: Design pattern should _not_ be used as a way to find
solution to any problems. You may use some of them as base for your solution,
but you must focus on the _problem_, not the _pattern_.
"Do a visitor pattern will solve this?" is the wrong question. "What should we
do to solve our problem?" is the real question. Once you went there and solved
the problem you may look and see if it is a visitor pattern -- or whatever
pattern. If it doesn't, that's alright, 'cause you _solved the problem_. If it
did... well, congratulations, you now know how to name your solution.
I've seen this happening a lot: People have a problem; people decided to use a
pattern; the pattern doesn't actually solve the problem (not in the 100% mark,
but above 50%); what happens then is that people start twisting the problem to
fit the pattern or, worse, add new layers to transform the problem into the
pattern.
{{ chapters(prev_chapter_link="/books/things-i-learnt/gherkin", prev_chapter_title="Gherkin Is Your Friend to Understand Expectations", next_chapter_link="/books/things-i-learnt/data-flow", next_chapter_title="Thinking Data Flow Beats Patterns") }}

31
content/books/things-i-learnt/resist-easy/index.md

@ -0,0 +1,31 @@
+++
title = "Things I Learnt The Hard Way - Resist The Temptation Of Easy"
date = 2019-07-01
[taxonomies]
tags = ["en-au", "books", "things i learnt", "ides"]
+++
Sure that IDE will help you with a ton of autocomplete stuff and let you
easily build your project, but do you understand what's going on?
<!-- more -->
I'm not denying the fact that IDEs make things easier. The fact is, you should
not rely heavily on their features.
I mentioned before that you should at least know how to [run tests on the
command line](/books/things-i-learnt/tests-in-the-command-line) and the same
applies to everything in IDEs: how to build, how to run, how to run tests and,
let's be honest here, how to find proper names for your variables and
functions. 'Cause, sure, it's nice that the IDE can complete all the names of
the functions, but if the autocomplete feature was off, would you know which
function you need? In other words, have you thought at least 10 seconds about
a good name for your function so you _won't_ need to use autocomplete to
remember its name?
These days, IDEs can autocomplete almost everything, from function names to
even how to name your variables. But using the autocomplete is not always a
good solution. Finding better names is.
{{ chapters(prev_chapter_link="/books/things-i-learnt/outside-project", prev_chapter_title="Don't Mess With Things Outside Your Project", next_chapter_link="/books/things-i-learnt/use-timezones", next_chapter_title="Always Use Timezones With Your Dates") }}

24
content/books/things-i-learnt/right-tool-agenda/index.md

@ -0,0 +1,24 @@
+++
title = "Things I Learnt The Hard Way - \"Right Tool For The Job\" Is Just To Push An Agenda "
date = 2019-06-25
[taxonomies]
tags = ["en-au", "books", "things i learnt", "right tool", "agenda"]
+++
A lot of times I heard "We should use the right tool for the job!" Most of
those times it was just a way to push an agenda.
<!-- more -->
When someone claims we should use the "right tool", the sentence mean there is
a right tool and a wrong tool to do something -- e.g., using a certain
language/framework instead of the current language/framework.
But sadly, none of those times it was really the "right tool". Most of the
time, the person saying we should use the "right tool" was trying to push
their own favourite language/framework, either because they disliked the
current language/framework or because they don't want to push the "hero
project".
{{ chapters(prev_chapter_link="/books/things-i-learnt/cargo-cult", prev_chapter_title="Understand And Stay Away From Cargo Cult", next_chapter_link="/books/things-i-learnt/right-tool-obvious", next_chapter_title="The Right Tool Is More Obvious Than You Think") }}

29
content/books/things-i-learnt/right-tool-obvious/index.md

@ -0,0 +1,29 @@
+++
title = "Things I Learnt The Hard Way - The Right Tool Is More Obvious Than You Think"
date = 2019-06-25
[taxonomies]
tags = ["en-au", "books", "things i learnt", "right tool"]
+++
Maybe you're in a project that needs to process some text. Maybe you're
tempted to say "Let's use Perl" 'cause you know that Perl is very strong in
processing text.
But that may still be not the right tool.
<!-- more -->
Although Perl is an amazing tool to process files, providing every single
switch and option you'll ever need, you're missing something: You're working
on a C shop. Everybody knows C, not Perl.
Sure, if it is a small, "on the corner" kind of project, it's fine to be in
Perl; if it is important for the company, it's better that if it is a C
project.
One of the reason your hero project may fail is because of this: You may even
prove that what you thought it was a better solution is actually a better
solution, but it can't be applied 'cause nobody else can maintain it.
{{ chapters(prev_chapter_link="/books/things-i-learnt/right-tool-agenda", prev_chapter_title="Right Tool For The Job Is Just To Push An Agenda") }}

16
content/books/things-i-learnt/use-structures/index.md

@ -55,4 +55,18 @@ every time.
So: You data has a schema? Use a Data Class or Class or Struct. Only if it is
schemaless, then you can use a tuple.
{{ chapters(prev_chapter_link="/books/things-i-learnt/data-types", prev_chapter_title="Types Say What Your Data Is", next_chapter_link="/books/things-i-learnt/languages-are-more", next_chapter_title="A Language Is Much More Than A Language") }}
I've seen this used at least once. Sure, at the very start of the project, it
may seem easier to just store the data as a tuple and destructure it and build
it again when needed. There was even a whole module designed to receiving
tuples, destructure them and rebuild new ones (for example, a function that
would receive two tuples and compute the sum of the "value" field of each,
building a new tuple as a result). But because of this design, to add just a
new field, I had to change 14 files and do 168 changes around -- 'cause, sure,
there was a function to add two tuples, but there were points where you need
just one field, and there wasn't a function for it.
It would be easier to use if there were functions to extract each field, and
add two tuples, and what else was needed for managing the tuples, but then you
have to ask yourself: Why not use a class for that?
{{ chapters(prev_chapter_link="/books/things-i-learnt/data-types", prev_chapter_title="Types Say What Your Data Is", next_chapter_link="/books/things-i-learnt/outside-project", next_chapter_title="Don't Mess With Things Outside Your Project") }}

40
content/books/things-i-learnt/use-timezones/index.md

@ -0,0 +1,40 @@
+++
title = "Things I Learnt The Hard Way - Always Use Timezones With Your Dates"
date = 2019-07-01
[taxonomies]
tags = ["en-au", "books", "things i learnt", "dates", "timezones"]
+++
No matter if the date you're receiving is in your local timezone and you'll
display it in your timezone. Sooner or later, the fact that you ignored there
was a timezone behind that date will hurt you.
<!-- more -->
(Note: Most of this post when I say "date" you can think of "date and time",
although the date should also be timezone aware.)
At some point of my professional life, ignoring timezones was easy: You just
pick the date, throw in the database, then read it back and everybody was
happy.
But things are not like this anymore. People will access your site from far
away locations, the source of the date may not be in the same timezone of your
system, your system may be running in a completely different timezone of your
dev machine (it's pretty common to run things in our machines in the local
timezone but the production system will run in UTC), the display may be a
complete different timezone than your production and dev machine and so on.
So always carry the timezone with the data. Find modules/classes that support
dates with timezones (a.k.a. make things _timezone aware_), capture the
timezone as soon as possible and carry it around in all operations.
Modules/classes that don't support timezones for dates/times should, as soon
as possible, removed from the system.
Developers a bit more seasoned -- and by "seasoned" I meant "Had to deal with
times before" -- will probably claim "Hey, this is _obvious_!" And I'd have to
agree. But it's annoying how many times I got bitten by some stupid bug 'cause
we decided that "well, everything is in the same timezone, so it's all good".
{{ chapters(prev_chapter_link="/books/things-i-learnt/resist-easy", prev_chapter_title="Resist The Temptation Of Easy", next_chapter_link="/books/things-i-learnt/utf-utf8", next_chapter_title="Always Use UTF-8 For Your Strings") }}

55
content/books/things-i-learnt/use-utf8/index.md

@ -0,0 +1,55 @@
+++
title = "Things I Learnt The Hard Way - Always Use UTF-8 For Your Strings"
date = 2019-07-01
[taxonomies]
tags = ["en-au", "books", "things i learnt", "utf-8"]
+++
Long gone are the days where [ASCII](https://en.wikipedia.org/wiki/ASCII) was
enough for everyone. Long gone are the days where you can deal with strings
with no "weird" or "funny" characters.
<!-- more -->
I was born in a time when the only encoding we had was ASCII. You could encode
all strings in sequences of bytes, 'cause all characters you could use where
encoded from 1 to 255 (well, from 32 [space] to 93 [close brackets] and you
still have a few latin-accented characters in some higher positions, although
not all accents where there).
Today, accepting characters beyond that is not the exception, but the norm. To
cope with all that, we have things like
[Unicode](https://en.wikipedia.org/wiki/Unicode) and
[uTF-8](https://en.wikipedia.org/wiki/UTF-8) for encoding that in reasonable
memory space (UTF-16 is also a good option here, but that would depend on your
language).
So, as much as you to make your system simple, you will have to keep the
internal representation of your strings in UTF-8/UTF-16. Surely, you may not
receive the data as UTF-8/UTF-16, but you'll have to encode it and keep
transmitting it around as UTF-8/UTF-16 till you have to display it, at which
point you'll convert from UTF-8/UTF-16 to whatever your display supports
(maybe it even supports displaying in UTF-8/UTF-16, so you're good already).
At this point, I believe most languages do support UTF-8, which is great. You
may still have problems with inputs coming from other systems that are not
UTF-8 (old Windows versions, for example), but that's fairly easy to convert
-- the hard part is figuring out the input _encoding_, though. Also, most
developers tend to ignore this and only accept ASCII characters, or ignore
UTF-8/whatever-encoding and get a bunch of weird characters on their printing,
'cause they completely ignored the conversion on the output point. That's why
I'm repeating the mantra of UTF-8: To remind you to always capture your input,
encode it in UTF-8 and _then_ convert in the output.
One thing to keep in mind is that UTF-8 is not a "cost free" encoding as
ASCII: While in ASCII to move to the 10th character, you'd just jump 10 bytes
from the start of the string, with UTF-8 you can't, due some characters being
encoded as two or more bytes (you should read the Wikipedia page; the encoding
is pretty simple and makes a lot of sense) and, due this, you can't simply
jump 10 characters 'cause you may end up in second byte that represents a
single character. Walking through the whole string would require traversing
the string character by character, instead of simply jumping straight to the
proper position. But that's a price worth paying, in the long run.
{{ chapters(prev_chapter_link="/books/things-i-learnt/use-timezones", prev_chapter_title="Always Use Timezones With Your Dates", next_chapter_link="/books/things-i-learnt/languages-are-more", next_chapter_title="A Language Is Much More Than A Language") }}
Loading…
Cancel
Save