Posting about Rust in real life

2 years ago · a6281ac87f
2 changed files with 287 additions and 0 deletions
--- a/content/thoughts/real-life-rust.md
+++ b/content/thoughts/real-life-rust.md
@ -0,0 +1,138 @@
 +++
 title = "Rust in Real Life"
 date = 2022-07-26
 [taxonomies]
 tags = ["rust"]
 +++
 For a while, I've been talking about Rust, making presentations, going to
 meetups...
 But a few months back I had the opportunity to finally work in a real
 project in Rust.
 So, how was it?
 <!-- more -->
 ## Cargo is magic
 The first application I used Rust was a small part of a bigger project. I had
 to capture the values coming in a websocket and store them in a database.
 There were two options for languages straight away: Python and C. Python was
 being used in other parts of the company, so it would have more eyes in case
 something went wrong. C was used in another application of the same project, so
 I could keep the project itself in a single language. Both languages had a
 couple of problems: I wasn't sure if Python could handle the load of a
 continuous stream of the websocket and I didn't want to write my own websocket
 and JSON parser in C.
 And that's why I picked Rust for this application: I had the performance of C
 with a very good package manager, plus a thousand packages already available.
 So Cargo was the thing that drove the inclusion of Rust in the project. And the
 language proved quite capable, as the application kept running to the point we
 forgot it was running.
 ## `.unwrap()` is the enemy
 I point in my presentations how you can do use `.unwrap()` (and `.expect()`) to
 avoid dealing with errors, and although that would close your application, you
 have total control on *where* it can close itself (compared to a
 NullPointerException or reading NULL values or not capturing the proper
 exceptions). But, in the end, `.unwrap()` will hurt you. Badly.
 That happened in the second application I wrote: The main part of the
 application was reading a bunch of bytes, and the meaning of those bytes were
 in the bits themselves, in a combination of bitmap and UTF-8-like numbers. But
 it wasn't simply parsing that was involved: There was a socket to be read, and
 the parsed data should be stored in a database, and there were usually problems
 involved in it -- the socket may be closed on the server side, we could lose
 connectivity, the parser could produce weird values in case of a missed bit,
 which couldn't be stored in the database...
 For all the possible problems (which are pretty clear, as `Result` is the base
 for almost everything), and because I was in a hurry to deliver the
 application, I did use a lot of `.expect()` around -- again, with the idea
 that, if it crashes, at least I told it it could crash, and it would give me a
 somewhat traceable message. The reality is that issues happened with such
 frequency (specially the parser receiving weird bits that would produce weird
 values) that the application would not run for very long.
 The solution to this constant crashes was quite simple, although laborious:
 replace every `.unwrap()` and `.expect()` with `if let Ok(_)` and `match`. That
 gave me total control on how to deal with unexpected values/results. The result
 was that the application run without stop for days, to the point that we,
 again, forgot it was running -- except when the data changed and we needed to
 update our filters.
 ## Cargo again
 In this second application, there were a bunch of little finicky things in the
 protocol that were really hard to grasp. Fortunately, we captured some packets
 from the service, which allow us to test the parser locally. All I needed was
 something to give me a harness to throw those bits and see how the code would
 process them.
 With C, this would probably mean building another executable for testing and
 running it instead of the real executable (and, to be honest, that's what Rust
 does) but Cargo hid all the complexities of getting this done. I just dropped a
 `test.rs` into my modules, marked it as `#[cfg(test)]` (meaning, build this
 only if the configuration is the test configuration), and `cargo test` would
 build the code and run the tests.
 The fact that I had a testing framework and a test runner just there was a huge
 helper, specially when thing broke down.
 ## Should've `try`ed more
 One of the side-effects of switching every `.unwrap()` and `.expect()` for some
 explicit error management was the increase in indentation -- 'cause *all* I did
 was do this replace, but I did not break things into smaller functions.
 Rust have the `try` operator -- `?` -- but that requires that the function
 using it should return a `Result`, which I kinda neglected in the first pass
 'cause, well, the only exit on all functions was success, and failure meant
 `panic!()` (due `.unwrap()`).
 If I was using `Result` as return values from the start, I have the impression
 that the code would not be a mess of 7-8 indentation levels. So, another thing
 I would have "gained" if I hadn't used `.unwrap()`.
 ## Async doesn't make sense till it does
 The third application in the project required a lot of I/O -- reading from
 multiple databases, sending data through a socket, writing again in the
 database... It seemed a perfect fit for an async experiment.
 In the initial version I wrote, I used tasks (async functions) the same way I
 did with threads. It initially produced a bunch of errors from the borrow
 checker that I couldn't figure out why -- at this point, I could understand
 exactly why the borrow checker complained about something in an application
 using threads, but the errors were really confusing, to the point that I may
 have mentioned that "async is unnatural for Rust". And, when I did manage to
 avoid the borrow checker complaints, the performance was... abysmal. Something
 like 0.8 records processed per second, which was extremely low for what we
 expected.
 Due this bad performance, I removed all the async things and used threads. That
 was in my ballpark -- I knew what I did wrong when the borrow checker
 complained -- and the performance did improve: Now it was processing 7 records
 per second.
 During the rewrite, I kept reading about async and how it works, till I came
 with a mental model to work with async (more about this in a future post). I
 did managed to take some time later to actually apply this mental model -- and
 then the errors from the borrow checker made sense, and I felt productive
 again. The result? 70 records per second, a whole 10x improvement from simple
 threads.
 ## Conclusion
 All that I learnt in a space of 6 months. I ended up switching jobs to a place
 that doesn't have anything in Rust (yet 😈), and although the road for Rust is
 a bit steep and with some tight corners, it is still worth going.
 (And, as far as I know, all those applications are *still* running...)
--- a/content/thoughts/real-life-rust.pt.md
+++ b/content/thoughts/real-life-rust.pt.md
@ -0,0 +1,149 @@
 +++
 title = "Rust na Vida Real"
 date = 2022-07-26
 [taxonomies]
 tags = ["rust"]
 +++
 Já faz algum tempo que eu tenho falado sobre Rust, fazendo apresentações, indo
 a meetups...
 Mas a alguns meses eu tive a oportunidade de finalmente trabalhar num projeto
 real em Rust.
 Então, como é que foi?
 <!-- more -->
 ## Cargo é mágico
 A primeira aplicação que eu usei Rust foi uma parte pequena de um grande
 projeto. Eu tinha que capturar valores vindos de um websocket e guardar os
 mesmos num banco de dados.
 Haviam duas opções de linguagens que eu poderia usar: Python e C. Python já
 estava sendo usado em outras partes da empresa, e isso garantiria mais olhos
 caso algo desse errado. C estava sendo usado em outra aplicação do mesmo
 projeto, e eu poderia manter todo o projeto na mesma linguagem. Ambas
 linguagens tinham alguns problemas: Eu não tinha certeza que Python conseguiria
 lidar com a carga de dados de um stream contínuo do websocket e eu não queria
 ter que escrever meu próprio processamento de websocket e parser de JSON em C.
 E foi por isso que eu usei Rust nesta aplicação: Eu tinha a performance de C
 com um excelente gerenciador de pacotes, e mais milhares de pacotes
 disponíveis.
 Assim, Cargo foi quem decidiu o uso de Rust no projeto. E a linguagem se provou
 bem capaz, pois a aplicação fico rodando até o ponto que acabamos esquecendo
 que ela estava rodando.
 ## `.unwrap()` é o inimigo
 Um ponto que eu faço nas minhas apresentações é que você pode usar `.unwrap()`
 (e `.expect()`) para evitar ter que lidar com erros, e embora o uso deste faça
 com que sua aplicação seja encerrada, você tem total controle sobre *onde* ela
 pode ser encerrada (comparado com um NullPointerException, ou ler valores NULL,
 ou não capturar as exceções corretas). Mas, no fim das contas, `.unwrap()` vai
 te machucar. Bastante.
 Foi o que aconteceu com a segunda aplicação que eu escrevi: A parte principal
 da aplicação era ler um conjunto de bytes, e o significado destes bytes estava
 nos bits que os compunham, em uma combinação de bitmaps e números num formato
 tipo UTF-8. Mas não era só o parsing que estava envolvido: Havia um socket a
 ser lido, e os dados parseados tinham que ser guardados num banco de dados, e
 haviam os problemas usuais envolvidos nisso -- o socket poderia ser fechado
 pelo servidor, nós poderíamos perder a conexão de rede, o parser poderia
 produzir valores estranhos no caso de um bit perdido, que não poderia ser
 guardado no banco de dados...
 Para cada um dos problemas possíveis (que eram bem claros, já que `Result` é a
 base de quase tudo), e como eu estava com pressa para entregar a aplicação, eu
 usei um monte de `.unwrap()`s pelo código -- de novo, a ideia era que, se a
 aplicação crasheasse, pelo menos eu disse que ela podia crashear, e iria me dar
 uma mensagem mais ou menos indicando onde. A realidade é que problemas aconteciam
 com tal frequência (especialmente o parser recebendo bits estranhos que
 produziam valores estranhos) que a aplicação não ficava rodando por muito
 tempo.
 A solução para esses crashes constantes foi bem simples, embora trabalhoso:
 trocar todo `.unwrap()` e `.expect()` por `if let Ok(_)` e `match`. Isso me deu
 controle total do que fazer nos casos de valores não esperados. O resultado foi
 que a aplicação rodou sem problemas por dias, ao ponto que nós, mais uma vez,
 esquecemos que ela estava rodando -- exceto quando os dados de entrada mudavam
 e nós tínhamos que atualizar nossos filtros.
 ## Cargo de novo
 Nesta segunda aplicação, havia um monte de coisinhas chatas no protocolo que
 eram realmente complicadas de entender. Felizmente nós conseguimos capturar
 alguns pacotes do serviço, que permitiria testar o parser localmente. Tudo que
 eu precisava era algo que me desse uma rede de suporte para jogar esses bits e
 ver como o código processaria.
 Com C, isso normalmente significa criar outro executável para os testes e rodar
 esse executável ao invés do executável real (e, pra ser honesto, é exatamente
 isso que o Rust faz) mas o Cargo escondeu toda a complexidade de fazer isso. Eu
 só criei `test.rs` nos meus módulos, marquei o mesmo com `#[cfg(test)]`
 (indicando que o mesmo só existe na configuração de teste) e `cargo test`
 compilou o código e rodou os testes.
 O fato que eu tinha um framework de test e um executor de testes logo ali foi
 de grande ajuda, especialmente quando um problema era encontrado.
 ## Deveria ter tentado (`try`ed) mais
 Um dos efeitos colaterais de trocar todo `.unwrap()` e `.expect()` for alguma
 forma explícita de tratamento de erro foi o aumento da indentação do código --
 porque *tudo* que eu fiz foi fazer essa alteração, mas eu não quebrei o código
 em funções menores.
 Rust tem o operador `try` -- `?` -- mas isso requer que a função com o operador
 retorne um `Result`, que eu negligenciei na primeira passagem porque, bom, a
 única saída das funções era o sucesso e falhas significavam `panic!()` (por
 causa do `.unwrap()`).
 Se eu tivesse usando `Result` como resultado desde o começo, eu tenho a
 impressão que o código não ficaria uma bagunça com 7-8 níveis de indentação. Ou
 seja, outra coisa que eu teria "ganho" se eu não tivesse usado `.unwrap()`.
 ## Async não faz menor sentido até que faz
 A terceira aplicação no projeto precisava fazer um monte de I/O -- ler de
 vários bancos de dados, enviar dados por socket, escrever de volta no banco de
 dados... Parecia o perfeito experimento para um projeto async.
 Na primeira versão que eu escrevi, eu usei tasks (funções async) da mesma forma
 que eu faço com threads. Isso gerou um monte de erros do borrow checker que eu
 não conseguia entender o porquê -- neste ponto, eu já conseguia entender
 exatamente porque o borrow checker estava reclamando de alguma coisa numa
 aplicação com threads, mas os erros eram tão confusos que eu devo ter
 mencionado algo como "async não é natural em Rust". E, quando eu finalmente
 consegui evitar todas as reclamações do borrow checker, a performance foi...
 terrível. Algo como 0.8 registros por segundo, que era extremamente baixo para
 o que estávamos esperando.
 Com essa performance horrível, eu removi todas as coisas async e usei threads.
 Isso estava mais com o que eu estava acostumado -- eu sabia exatamente o que eu
 tinha feito de errado quando o borrow checker reclamava de algo -- e a
 performance melhorou: O processamento passou a 7 registros por segundo.
 Enquanto eu estava nessa reescrita, eu fiquei lendo sobre async e como
 funciona, até que eu consegui formar um modelo mental para trabalhar desta
 forma (mais sobre isso num post futuro). Eu consegui ter algum tempo para
 efetivamente aplicar esse modelo mental -- e aí os erros do borrow checker
 começaram a fazer sentido, e eu me senti produtivo de novo. O resultado? 70
 registros por segundo, uma melhoria de 10x sobre o uso de threads.
 ## Conclusão
 Tudo isso eu aprendi num espaço de 6 meses. Eu acabei trocando de emprego para
 um lugar onde não há nada em Rust (por enquanto 😈), e embora a estrada do Rust
 seja ingrime e cheia de curvas fechadas, eu ainda acho que vale a pena.
 (E, até onde eu sei, todas as aplicações aqui *continuam* rodando...)
 <!-- 
 vim:spelllang=pt:
 -->