I was able to build a translation scraper from scratch in less than a day. It uses Requests to download a url, Beautiful Soup to parse the html, and its select function to find all translations on a Multitran (Мультитран) entry page. It wasn't the easiest project because Multitran is still using 90s style tables without a lot of markup to hook onto. But I found a way by carefully studying the hyperlinks of the anchor tags of the translation entries. So that's our secret sauce.
For now, it outputs the translations to the console; later, I'll make an array out of them to replace non-desired translations of Russian terms with the desire one.
Fork
import requests
from bs4 import BeautifulSoup
url = 'http://www.multitran.ru/c/m.exe?l1=2&l2=1&s=%EA%EE%F1%EC%E8%F7%E5%F1%EA%E8%E9%20%EB%E5%F2%E0%F2%E5%EB%FC%ED%FB%E9%20%E0%EF%EF%E0%F0%E0%F2'
# edit url manually until import function is developed
r = requests.get(url)
r
print r.status_code
#a status code of 200 means that everything is okay
soup = BeautifulSoup(r.content, 'html.parser')
translations = soup.select("a[href*=m.exe?t=]")
#the secret sauce
for translation in translations:
print translation.text
#prints out all translations
#That's a wrap!
# Copyright Peter Charles Gleason, 2017
Onward and upward!
Comments
Post a Comment
Comments are welcome and a good way to garner free publicity for your website or venture.