For my CLI Data Gem project, I created the ArtsyScraperCli gem. This gem is a virtual art gallery that lists artwork genes and prints 10 works of art based on the user’s selection. Data for this gem is pulled from Artsy.com using Selenium Webdriver. I chose Selenium Webdriver to scrape data because Artsy’s website uses Javascript to render their HTML. I also used the gem iterm-imgcat to print each artwork image in iTerm.
The greatest challenge was choosing an object oriented design. I started out with 3 classes: Controller, Gene, and Artwork. After stubbing out some code and researching OO design patterns, I decided to add a Scraper class that would handle scraping both the genes and artworks. Since I hit the OO portion of the Ruby curriculum, drawing things out has really helped me:
Debugging the Scraper
Scraping with Selenium Webdriver was a great learning experience. The initialize
method starts a browser in the background and connects to it. The scrape_genes
method gets the artsy website and returns all the elements of the class that contains the gene names.
class Scraper
def initialize
options = Selenium::WebDriver::Chrome::Options.new(args: ['headless'])
@driver = Selenium::WebDriver.for(:chrome, options: options)
end
def scrape_genes
@driver.get('https://www.artsy.net/collect')
elements = @driver.find_elements(css: ".cf-categories__category")
elements.map do |gene_element|
Gene.new(gene_element.text, gene_element)
end
end
While trying to drill down and scrape the artworks from each gene’s page, I ran into an error I had never seen:
stale element reference: element is not attached to the page document Selenium::WebDriver::Error::StaleElementReferenceError
It turns out that when Selenium Webdriver clicked on the gene element in order to scrape the artworks from the selected gene page, the page wasn’t loading fast enough. I added a while
loop that would check for the data-loading
css selector and let the user know the page was still loading. In other words, if any of the elements on the page have an attribute of data-loading
, Ruby will wait on scraping the artwork data.
def scrape_gene_artworks(gene)
gene.element.click
#wait for page to load
while @driver.find_elements(css: ".cf-artworks[data-loading='true']").any?
#wait before checking again
puts "The list of #{gene.name} artworks is loading..."
sleep 0.1
end
art_elements = @driver.find_elements(css: ".artwork-item")
art_elements.map do |art_element|
title = art_element.find_element(css: ".artwork-item-title").text
artist_name = art_element.find_element(css: ".artwork-item-artist").text
image_url = art_element.find_element(css: "img").attribute("src")
Artwork.new(title, artist_name, image_url)
end
end
OO Ruby really reduced my velocity as far as finishing lessons and labs, but it helped me learn to slow down and let my creativity and curiousity guide my coding journey. Seeing artwork print out in iTerm after scraping it was very rewarding: