香港赛马会彩券管理局

Blog Archives

The never-ending editor war (?)

The never-ending editor war (?)

The creation of this blog post was prompted by this tweet, asking an age-old question: @spacemacs— Bruno Rodrigues (@brodriguesco) May 16, 2019 This is actually a very important question, that I have been asking myself for a long time. An IDE, and plain text editors, are a very important tools to anyone writing code. Most working hours are spent within such a program,...

Read more »

For posterity: install {xml2} on GNU/Linux distros

For posterity: install {xml2} on GNU/Linux distros

Today I’ve removed my system’s R package and installed MRO instead. While re-installing all packages, I’ve encountered one of the most frustrating error message for someone installing packages from source: Error : /tmp/Rtmpw60aCp/R.INSTALL7819efef27e/xml2/man/read_xml.Rd:47: unable to load shared object '/usr/lib64/R/library/xml2/libs/xml2.so': libicui18n.so.58: cannot open shared object file: No such file or directory ERROR: installing Rd objects failed for package ‘xml2’ This library, libicui18n.so.58 is a...

Read more »

Fast food, causality and R packages, part 2

Fast food, causality and R packages, part 2

I am currently working on a package for the R programming language; its initial goal was to simply distribute the data used in the Card and Krueger 1994 paper that you can read here (PDF warning). However, I decided that I would add code to perform diff-in-diff. In my previous blog post I showed how to set up the structure of your new package....

Read more »

Fast food, causality and R packages, part 1

Fast food, causality and R packages, part 1

I am currently working on a package for the R programming language; its initial goal was to simply distribute the data used in the Card and Krueger 1994 paper that you can read here (PDF warning). The gist of the paper is to try to answer the following question: Do increases in minimum wages reduce employment? According to Card and Krueger’s paper from...

Read more »

Historical newspaper scraping with {tesseract} and R

Historical newspaper scraping with {tesseract} and R

I have been playing around with historical newspapers data for some months now. The “obvious” type of analysis to do is NLP, but there is also a lot of numerical data inside historical newspapers. For instance, you can find these tables that show the market prices of the day in the L’Indépendance Luxembourgeoise: I wanted to see how easy it was to...

Read more »

Get text from pdfs or images using OCR: a tutorial with {tesseract} and {magick}

Get text from pdfs or images using OCR: a tutorial with {tesseract} and {magick}

In this blog post I’m going to show you how you can extract text from scanned pdf files, or pdf files where no text recognition was performed. (For pdfs where text recognition was performed, you can read my other blog post). The pdf I’m going to use can be downloaded from here. It’s a poem titled, D’Léierchen (Dem Léiweckerche s?i Lidd), written by Michel...

Read more »

Pivoting data frames just got easier thanks to `pivot_wide()` and `pivot_long()`

Pivoting data frames just got easier thanks to `pivot_wide()` and `pivot_long()`

There’s a lot going on in the development version of {tidyr}. New functions for pivoting data frames, pivot_wide() and pivot_long() are coming, and will replace the current functions, spread() and gather(). spread() and gather() will remain in the package though: You may have heard a rumour that gather/spread are going away. This is simply not true (they’ll stay around forever) but I...

Read more »

Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 2

Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 2

In part 1 of this series I set up Vowpal Wabbit to classify newspapers content. Now, let’s use the model to make predictions and see how and if we can improve the model. Then, let’s train the model on the whole data. Step 1: prepare the data The first step consists in importing the test data and preparing it. The test data need...

Read more »

Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit

Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit

Can I get enough of historical newspapers data? Seems like I don’t. I already wrote four (1, 2, 3 and 4) blog posts, but there’s still a lot to explore. This blog post uses a new batch of data announced on twitter: For all who love to analyse text, the BnL released half a million of processed newspaper articles. Historical news from 1841-1878. They directly...

Read more »

Manipulating strings with the {stringr} package

Manipulating strings with the {stringr} package

This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for free here. This is taken from Chapter 4, in which I introduce the {stringr} package. Manipulate strings with {stringr} {stringr} contains functions to manipulate strings. In Chapter 10, I will teach you about regular expressions, but the functions contained in {stringr} allow you to already...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)
香港赛马会彩券管理局