So we can read an excel sheet, and manipulate a browser – how do we put these skills together to do something that saves us time (= money)?
Divide and conquer
When approaching any programming problem, I like to break it down into chunks. It’s not easy trying to solve a problem in its entirety, unless it’s a very easy one – in which case why would you be trying to solve it?
We want to go through each of our spreadsheet items, go to the webpage and download it. Let’s do the first one
with open('filename.xlsx') as file: df = pd.read_excel(file)
Since our excel sheet has the first column as the header, we can look at each column by specifying its name. In this case, the items we want to download are under the “Manufacturer Code” heading.
print(df['Manufacturer Code']) output: 0 DNMG 15 04 08-MF 4335 1 DNMG 11 04 08-PM 4335 2 DNMG 15 06 16-MR 4335 3 DNMG 15 06 12-MR 4335 4 DNMG 15 06 08-MF 4335 Name: Manufacturer Code, dtype: object
Let’s pick the first one to download
item_code = df['Manufacturer Code']
Now in the website I wanted to download the files from, I noticed there was a standard URL to access each item’s page.
driver.get('www.company.com/productdetails?c=' + item_code)
This should take us to the item’s page. Now we want to check we’re logged in and download the files we need. Let’s first define some helper functions. The download button on the page I was using had no unique class names or IDs, so I have to use an XPath expression to find it. I’ve also added a small check that the download button exists, we could add this check to every element we look for.
def log_in(driver): logins = driver.find_elements_by_class_name('login') logins.click() user_fields = driver.find_elements_by_name('Username') pw_fields = driver.find_elements_by_name('Password') login_buttons = driver.find_elements_by_id('loginbutton') user_fields.send_keys('firstname.lastname@example.org') pw_fields.send_keys('yourPassword') login_buttons.click() def download_file(driver): download_button = driver.find_elements_by_xpath("""//a[@ng-click="checkIfLoggedIn($event, 'link3DModelBasic')"]""") if download_button: download_button.click() else: raise Exception("Download button not found")
Ok, let’s put this all together to download a single item from the list
from selenium import webdriver from selenium.webdriver.chrome.options import Options import selenium.common.exceptions as web_e driver_options = Options() #driver_options.add_argument("--headless") driver = webdriver.Chrome('./chromedriver', options=driver_options) driver.get('https://www.sandvik.coromant.com/en-gb/pages/default.aspx') #Log me in! log_in(driver) #Get first item in excel sheet with open('filename.xlsx') as file: df = pd.read_excel(file) item_code = df['Manufacturer Code'] #Go to the item page driver.get('https://www.sandvik.coromant.com/en-gb/products/Pages/productdetails.aspx?c=' + item_code) download_file(driver)
Easy! This needs a little bit of tweaking, as the browser will download the files to your standard location. It might also ask you whether you want to open or save the file the first time. You can tick the “don’t ask me again” box when you choose to save. These are small issues that we can fix later by changing the options that the browser starts up with.
Now – since we were doing “divide and conquer” we’ve only downloaded a single file. How easy is it to change it to download all the files in the list? Let’s have a look.
If we swap
item_code = df['Manufacturer Code']
for item_code in df['Manufacturer Code']:
and indent the rest of the code accordingly, the browser loops through the excel sheet and downloads everything for us!
That’s the beauty of divide and conquer, by solving a small problem one change solves a much bigger problem.
We’ll look at tweaking the browser, some problems we might encounter along the way and running it headless in the next part.
Send your questions to email@example.com