The Crate of the Week is a Rust (programming language, not game) segment in the This Week in Rust newsletter that highlights a single Rust library in a given week. This helps Rust community members discover and promote crates. You can submit a crate suggestion as well as vote on crates in this forum thread by “liking” comments.
I was curious whether there were any crates that had never made it onto this-week-in-rust.org as a Crate of the Week, despite having a lot of upvotes, so I decided to take a look. I figured it would take an afternoon or so. Little did I know that it would take many afternoons for this busy stay-at-home-dad.
I downloaded all of the comments in the TWIR crate of the week thread as JSON via this url:
?print=true parameter – this, combined with a
&page= parameter lets you page through 1,000 posts in a topic thread at a time.
I then associated each url with the number of likes it had received via its parent
Post and sorted them by likes in descending order.
Originally, I had meant to only compare links submitted within the same week, but eventually decided it just made more sense to look for the most popular links overall. Furthermore, I originally limited the urls to the single most clicked link in a given post, so one post would result in one link (if it had any). I regret both of those decisions, but oh well ¯\_(ツ)_/¯
I then began the arduous process of clearing out crates of the week from the resulting list by searching the
this-week-in-rust/content folder with ripgrep, then decided to automate it with a quick Ruby script, which also turned out to be more difficult than I bargained for. Trying to recreate my Ruby code in Rust only led to more hair tearing so at this point, I’m satisfied with having manually cleared out nominated crate URLs for crate URLs with 5 or more likes.
I now have a much healthier respect for data munging. I might get back to it, but I need a break from this “simple” task for a little bit 😵💫
…for the top 25 crates, anyway. The full results can be found here: https://raw.githubusercontent.com/briankung/cotw_second_chance/main/output.csv
You can view my attempt to use Rust as a scripting language here: https://github.com/briankung/cotw_second_chance
I initially decided to do this entirely in Rust as much as possible because it seemed like a small task suitable for learning. For instance, instead of downloading the JSON separately, I decided to download the file in Rust using tokio and reqwest. That actually went off pretty much without a hitch.
Then I got a bit sidetracked figuring out that I didn’t have to declare every field in my JSON structure before I deserialized it with serde – though I did find the fantastic transform.tools website to help with that. Also serde and its documentation seemed really opaque to me at first – was I supposed to implement
Deserialize, and if so, then how? Finally, I pointed the right
serde_json method at the JSON data and carried on. I originally wanted to make sure that the posts were in order and grouped by week, but I found the
time crate to be similarly difficult to find examples for, and a bewilderingly empty book left me delaying the
time related features. Ironically, I didn’t group the posts by week in the end, anyway, so I could have ignored the timestamps.
And finally when I thought I had gotten something halfway acceptable, I still found a good deal of crates that had already been CotW winners in my filtered list of crates. “How many incorrect entries could there be?” I wondered naively, and then began to search
"crate_name site:this-week-in-rust.org" on Google and discovered that the answer to my question was “quite a lot, actually.” I ended up using ripgrep in the command line to search the
this-week-in-rust repo, then got frustrated and jumped ship back to Ruby to automate it. After writing a script in my native tongue of Ruby, this is how I felt:
However, the biggest problem was that I underestimated how difficult it would be to beat the data I had into the proper shape. As it turns out, data munging is not a simple task. Comments aren’t uniform, URLs aren’t uniform, and even issues of This Week in Rust aren’t uniform. At one point I was searching for a markdown header like
"# Crate of the Week" and discovered that one issue had it as
"# Crate & Quote of the Week".
In the end, I did finally use a lot of libraries I’d been wanting to use and got a hang of a bit more corners of the Rust world, so it was still a worthwhile endeavor. While there must be smarter ways to handle this problem, I’ve run out of enthusiasm for it, so I’m putting it on the backburner. You’re welcome to take a crack at it yourself! I’d love to hear about it if you do.
Oh, and…don’t forget to nominate crates in the Crate of the Week thread!