Multitran Translation Scraper Already Operational!

I was able to build a translation scraper from scratch in less than a day. It uses Requests to download a url, Beautiful Soup to parse the html, and its select function to find all translations on a Multitran (Мультитран) entry page. It wasn't the easiest project because Multitran is still using 90s style tables without a lot of markup to hook onto. But I found a way by carefully studying the hyperlinks of the anchor tags of the translation entries. So that's our secret sauce. For now, it outputs the translations to the console; later, I'll make an array out of them to replace non-desired translations of Russian terms with the desire one.
Fork

import requests
from bs4 import BeautifulSoup

url = 'http://www.multitran.ru/c/m.exe?l1=2&l2=1&s=%EA%EE%F1%EC%E8%F7%E5%F1%EA%E8%E9%20%EB%E5%F2%E0%F2%E5%EB%FC%ED%FB%E9%20%E0%EF%EF%E0%F0%E0%F2'
# edit url manually until import function is developed
r = requests.get(url)
r
print r.status_code
#a status code of 200 means that everything is okay
soup = BeautifulSoup(r.content, 'html.parser')

translations = soup.select("a[href*=m.exe?t=]")
#the secret sauce

for translation in translations:
print translation.text
#prints out all translations

#That's a wrap!
# Copyright Peter Charles Gleason, 2017

Onward and upward!

Comments