Adventures in Learning Full Stack Web Development

Artsy Scraper CLI


For my CLI Data Gem project, I created the ArtsyScraperCli gem. This gem is a virtual art gallery that lists artwork genes and prints 10 works of art based on the user’s selection. Data for this gem is pulled from using Selenium Webdriver. I chose Selenium Webdriver to scrape data because Artsy’s website uses Javascript to render their HTML. I also used the gem iterm-imgcat to print each artwork image in iTerm.

The greatest challenge was choosing an object oriented design. I started out with 3 classes: Controller, Gene, and Artwork. After stubbing out some code and researching OO design patterns, I decided to add a Scraper class that would handle scraping both the genes and artworks. Since I hit the OO portion of the Ruby curriculum, drawing things out has really helped me:


Debugging the Scraper

Scraping with Selenium Webdriver was a great learning experience. The initialize method starts a browser in the background and connects to it. The scrape_genes method gets the artsy website and returns all the elements of the class that contains the gene names.

class Scraper

  def initialize
    options = ['headless'])
    @driver = Selenium::WebDriver.for(:chrome, options: options)

  def scrape_genes
    elements = @driver.find_elements(css: ".cf-categories__category") do |gene_element|, gene_element)

While trying to drill down and scrape the artworks from each gene’s page, I ran into an error I had never seen:

stale element reference: element is not attached to the page document Selenium::WebDriver::Error::StaleElementReferenceError

It turns out that when Selenium Webdriver clicked on the gene element in order to scrape the artworks from the selected gene page, the page wasn’t loading fast enough. I added a while loop that would check for the data-loading css selector and let the user know the page was still loading. In other words, if any of the elements on the page have an attribute of data-loading, Ruby will wait on scraping the artwork data.

  def scrape_gene_artworks(gene)
    #wait for page to load 
    while @driver.find_elements(css: ".cf-artworks[data-loading='true']").any?
      #wait before checking again
      puts "The list of #{} artworks is loading..."
      sleep 0.1
    art_elements = @driver.find_elements(css: ".artwork-item") do |art_element|
      title = art_element.find_element(css: ".artwork-item-title").text
      artist_name = art_element.find_element(css: ".artwork-item-artist").text
      image_url = art_element.find_element(css: "img").attribute("src"), artist_name, image_url)

OO Ruby really reduced my velocity as far as finishing lessons and labs, but it helped me learn to slow down and let my creativity and curiousity guide my coding journey. Seeing artwork print out in iTerm after scraping it was very rewarding: