Automating web download tasks with Selenium and Python – Part 2

So we can read an excel sheet, and manipulate a browser – how do we put these skills together to do something that saves us time (= money)?

Divide and conquer

When approaching any programming problem, I like to break it down into chunks. It’s not easy trying to solve a problem in its entirety, unless it’s a very easy one – in which case why would you be trying to solve it?

We want to go through each of our spreadsheet items, go to the webpage and download it. Let’s do the first one

with open('filename.xlsx') as file:
  df = pd.read_excel(file)

Since our excel sheet has the first column as the header, we can look at each column by specifying its name. In this case, the items we want to download are under the “Manufacturer Code” heading.

print(df['Manufacturer Code'])
output:
0    DNMG 15 04 08-MF 4335 
1    DNMG 11 04 08-PM 4335 
2    DNMG 15 06 16-MR 4335 
3    DNMG 15 06 12-MR 4335 
4    DNMG 15 06 08-MF 4335 
Name: Manufacturer Code, dtype: object

Let’s pick the first one to download

item_code = df['Manufacturer Code'][0]

Now in the website I wanted to download the files from, I noticed there was a standard URL to access each item’s page.

driver.get('www.company.com/productdetails?c=' + item_code)

This should take us to the item’s page. Now we want to check we’re logged in and download the files we need. Let’s first define some helper functions. The download button on the page I was using had no unique class names or IDs, so I have to use an XPath expression to find it. I’ve also added a small check that the download button exists, we could add this check to every element we look for.

def log_in(driver):
    logins = driver.find_elements_by_class_name('login')
    logins[0].click()

    user_fields = driver.find_elements_by_name('Username')
    pw_fields = driver.find_elements_by_name('Password')
    login_buttons = driver.find_elements_by_id('loginbutton')

    user_fields[0].send_keys('yourlogin@email.com')
    pw_fields[0].send_keys('yourPassword')
    login_buttons[0].click()

def download_file(driver):
    download_button = driver.find_elements_by_xpath("""//a[@ng-click="checkIfLoggedIn($event, 'link3DModelBasic')"]""")
    if download_button:
        download_button[0].click()
    else:
        raise Exception("Download button not found")

Ok, let’s put this all together to download a single item from the list

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import selenium.common.exceptions as web_e
driver_options = Options()
#driver_options.add_argument("--headless")
driver = webdriver.Chrome('./chromedriver', options=driver_options)
driver.get('https://www.sandvik.coromant.com/en-gb/pages/default.aspx')
#Log me in!
log_in(driver)
#Get first item in excel sheet
with open('filename.xlsx') as file:
  df = pd.read_excel(file)
item_code = df['Manufacturer Code'][0]
#Go to the item page
driver.get('https://www.sandvik.coromant.com/en-gb/products/Pages/productdetails.aspx?c=' + item_code)
download_file(driver)

Easy! This needs a little bit of tweaking, as the browser will download the files to your standard location. It might also ask you whether you want to open or save the file the first time. You can tick the “don’t ask me again” box when you choose to save. These are small issues that we can fix later by changing the options that the browser starts up with.

Now – since we were doing “divide and conquer” we’ve only downloaded a single file. How easy is it to change it to download all the files in the list? Let’s have a look.

If we swap

item_code = df['Manufacturer Code'][0]

For

for item_code in df['Manufacturer Code']:

and indent the rest of the code accordingly, the browser loops through the excel sheet and downloads everything for us!

That’s the beauty of divide and conquer, by solving a small problem one change solves a much bigger problem.

We’ll look at tweaking the browser, some problems we might encounter along the way and running it headless in the next part.

Send your questions to rowan@workyourtech.com

Leave a Reply

Your email address will not be published. Required fields are marked *