Julio Biason
1 year ago
4 changed files with 189 additions and 0 deletions
After Width: | Height: | Size: 152 KiB |
@ -0,0 +1,104 @@ |
|||||||
|
+++ |
||||||
|
title = "Overthinking Rust Iterators" |
||||||
|
date = 2023-07-06 |
||||||
|
|
||||||
|
[taxonomies] |
||||||
|
tags = ["rust", "iterators", "request", "stream"] |
||||||
|
+++ |
||||||
|
|
||||||
|
I had some issue recently with Rust iterators, and that led me to think *a lot* |
||||||
|
about iterators in Rust. |
||||||
|
|
||||||
|
<!-- more --> |
||||||
|
|
||||||
|
What I wanted to do was something not exactly direct in Rust: |
||||||
|
|
||||||
|
- The issue was an external REST API; |
||||||
|
- The API returns the data in chunks, providing a paging mechanism; |
||||||
|
- The API indicates that there are more data with a `next` field, which either |
||||||
|
has the URL for the next page or an empty string if you're in the last page |
||||||
|
and there is no more data; |
||||||
|
- On my side, I wanted something akin to (which is basically an iterator, |
||||||
|
anyway) |
||||||
|
|
||||||
|
```rust |
||||||
|
let service = Service(connection_information); |
||||||
|
let data = service.data(); // This provides the iterator |
||||||
|
while let Some(record) = data.next() { |
||||||
|
do_something(&record); |
||||||
|
} |
||||||
|
``` |
||||||
|
|
||||||
|
- The `.data()` iterator would get the first page and start iterating over |
||||||
|
those results; |
||||||
|
- Once the results were all consumed, if the API informed that there is more |
||||||
|
data, the iterator (or *something*) would request more information, adjust |
||||||
|
itself for the new data and just keep chugging till all the data was |
||||||
|
produced. |
||||||
|
|
||||||
|
Notice that the iterator I want have two sides: One is to spew information from |
||||||
|
previous request from memory/cache; the second is requesting (or triggering the |
||||||
|
request somewhere) for more data. |
||||||
|
|
||||||
|
# Back to Iterators |
||||||
|
|
||||||
|
Basic iterators work like this: |
||||||
|
|
||||||
|
![](normal-iterator.png "A basic view of an iterator") |
||||||
|
|
||||||
|
... which you have a dataset, create an iterator over them and each call of |
||||||
|
`.next()` on it will advance the iterator over the next element of the data and |
||||||
|
return a reference to that data; once it reaches the end of data, it returns a |
||||||
|
`None`, indicating that there are no more data. |
||||||
|
|
||||||
|
The fun thing about iterators is that they need to hold their own state: Which |
||||||
|
is the current element that I'm pointing to? The `.next()` receives a mutable |
||||||
|
reference of self exactly due this: It changes its state on each call of |
||||||
|
`.next()`. |
||||||
|
|
||||||
|
What I need is, basically, an iterator that does that **and**, once it sees |
||||||
|
`None`, retrieves more data and starts over. This raises the question: How does |
||||||
|
the iterator gets more data? |
||||||
|
|
||||||
|
# The Fat Iterator Approach |
||||||
|
|
||||||
|
The idea I had was to create a fat iterator that would "hold" its own data and |
||||||
|
iterate over it. |
||||||
|
|
||||||
|
![](fat-iterator.png "A fat iterator which has its own data") |
||||||
|
|
||||||
|
Because the data is simply a `Vec<>`, I could do something like: |
||||||
|
|
||||||
|
1. Pull data from service; |
||||||
|
2. Update the `data` inside the iterator; |
||||||
|
3. Create a new iterator over said `data`; |
||||||
|
4. Call `.next()` on the iterator till it turns into `None`; |
||||||
|
5. If there is more data, do the request and jump to 2. |
||||||
|
|
||||||
|
If we jump back to the fact that `.next()` updates the iterator internal state, |
||||||
|
this means that I'd need to keep the data **and** its iterator in the same |
||||||
|
structure. And that causes issues with the borrow checker, 'cause I can't own |
||||||
|
part of the data when I own the whole data (yes, it feels like a problem with |
||||||
|
the borrow check, but still). |
||||||
|
|
||||||
|
The idea seems solid, except I'd be fighting the borrow checker to a point I'm |
||||||
|
not capable yet. |
||||||
|
|
||||||
|
# The "Request Someone Else" Iterator |
||||||
|
|
||||||
|
The other idea I had (but couldn't figure out how it would work) was to, |
||||||
|
instead of `service.data()` return an iterator, it would return the data holder |
||||||
|
and *that* could create an iterator over itself. The weird thing about this is |
||||||
|
that the iterator would have to have a mutable reference to the source data, so |
||||||
|
it could call the parent when it reached the end of the data, and the parent |
||||||
|
would get a new data source and the iterator would "reset itself" after calling |
||||||
|
it -- which sounds more complex than it should. |
||||||
|
|
||||||
|
(I could also make the parent holder have a `Cell<>` over data to have just |
||||||
|
internal mutability over it, but again, sounds more complex than it should). |
||||||
|
|
||||||
|
# The Solution |
||||||
|
|
||||||
|
Sorry, no solution (yet). I'm still tinkering with it and I'll update this |
||||||
|
once I find something that works and it doesn't require two (or more) things |
||||||
|
(mutably) interacting between themselves. |
After Width: | Height: | Size: 32 KiB |
@ -0,0 +1,85 @@ |
|||||||
|
+++ |
||||||
|
title = "The Problem With Gradual Typing Examples" |
||||||
|
date = 2023-07-12 |
||||||
|
draft = true |
||||||
|
|
||||||
|
[taxonomies] |
||||||
|
tags = ["programming languages", "typing", "static typing", "progressive typing"] |
||||||
|
+++ |
||||||
|
|
||||||
|
[Gradual Typing](https://en.wikipedia.org/wiki/Gradual_typing) offers the mixed |
||||||
|
results of static typing and dynamic typing: You can let the |
||||||
|
compiler/interpreter fight with the types in the runtime when you don't specify |
||||||
|
them, but you can say the type of a variable and the compiler/interpreter will |
||||||
|
pick incompatible type operations before it runs. |
||||||
|
|
||||||
|
But the examples I've seen always pull things in, in my option, the wrong way. |
||||||
|
|
||||||
|
<!-- more --> |
||||||
|
|
||||||
|
The general example of a bad thing for gradual typing is this type of code: |
||||||
|
|
||||||
|
``` |
||||||
|
def add(a, b): |
||||||
|
return a + b |
||||||
|
``` |
||||||
|
|
||||||
|
The issue is that there are multiple values that can be added. If we consider |
||||||
|
just the primitive types of Python, we would get something like: |
||||||
|
|
||||||
|
``` |
||||||
|
from typing import Union |
||||||
|
|
||||||
|
CanAdd = Union[str, float, int] |
||||||
|
|
||||||
|
def add(a: CanAdd, b: CanAdd) -> CanAdd: |
||||||
|
return a + b |
||||||
|
``` |
||||||
|
|
||||||
|
This code obviously breaks on other things can be added. For example, you can |
||||||
|
create an class, override its `add`/`+` operator and, thus, objects of said |
||||||
|
class can be added. But because our just-primitives types doesn't list our new |
||||||
|
class, the compiler/interpreter would never accept a call of `add()` with those |
||||||
|
objects. |
||||||
|
|
||||||
|
(There is another issue, sometimes cited, sometimes don't, about the fact that |
||||||
|
there is nothing saying that the type of `b` must be the same type of `a` and |
||||||
|
thus one could add an integer to a string, which is... wrong.) |
||||||
|
|
||||||
|
But I have a question: |
||||||
|
|
||||||
|
**Is that a real world kind of function?** |
||||||
|
|
||||||
|
Oh, don't get me wrong: Simpler functions, with just as few lines is quite |
||||||
|
normal. Something that would be *real*[^1] would be something like: |
||||||
|
|
||||||
|
``` |
||||||
|
def item_total(qty: int, price: float) -> float: |
||||||
|
return qty * price |
||||||
|
``` |
||||||
|
|
||||||
|
I guess this is more common than `add()`, because it is an operation that |
||||||
|
happens most of the time. And this would get the case of |
||||||
|
|
||||||
|
``` |
||||||
|
def receive_api_request(json_data): |
||||||
|
... |
||||||
|
for item in json_data['items']: |
||||||
|
total += item_total(item['price'], item['qty']) |
||||||
|
``` |
||||||
|
|
||||||
|
If you don't type check, this would produce exactly the same results. But the |
||||||
|
code is wrong, and if someone do a small change in `item_total()`, you'd end up |
||||||
|
with a strange thing in which only this call produces the wrong results, while |
||||||
|
all other interfaces would (probably) produce the correct result. |
||||||
|
|
||||||
|
{% note() %} |
||||||
|
Surely there are issues, nonetheless. Python, which I used in this example, |
||||||
|
would still not accept anything that could be automatically coerced to int, |
||||||
|
even if such interface doesn't exist, although they are working on support |
||||||
|
types with "protocols". |
||||||
|
{% end %} |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
[^1]: For different values of "real". |
Loading…
Reference in new issue