Repetitive tasks are boring. Let’s face it; if you don’t have to sit in front of a compute, click a link and type something in to a field, why do it? Computers are amazing tools that are capable of doing all sorts of boring tasks if we just knew how to setup the task. Although there are many different ways to do this, today we are going to focus on the Python programming language.
Why Python?
There are a multitude of programming languages out there. I’ve personally programmed at least a little bit in the following languages: BASIC, Fortran, Visual Basic, C/C++, JavaScript, Perl, PHP, Pascal, SQL, and Python. According to CodingDojo.com and Tiobe.com, Python is at #5 for the most in-demand programming languages in 2016. Topping the list are languages like Java and C# but Python is ranked pretty high. And not just on these two sites; if you do a search for the most popular or in-demand programming languages, chances are really good that you will see Python in the top 10 if not in the top 5.
If you are going to learn a new skill and spend the time to get pretty good at it, it would be nice if that skill was fun, helpful, or valuable in some way. Sure, underwater basket weaving may be challenging, but is it really valuable? Not to me.
Computer programming is a valuable skill but only if you can program in a language that is popular enough to be able to create something interesting or get paid to program. Looking at lists like the ones I mentioned above, you can narrow down your choices to the top 10 or really, the top 5. Combining both lists, we narrow things down to Java, C#, SQL, C/C++, and Python. How do we choose?
There are trade-offs with any choice. You have to consider the end result and how you can reach several goals with one skill or task. Plus, it helps to see what’s popular in your geographic area or the geographic area you want to work (if you are trying to learn a valuable skill). Although C# is the most popular in my area, I decided to learn Python. Not because it is easier than C# or another language, but because of Minecraft.
Minecraft is an amazing game. Considered a sand-box game, Minecraft lets you pretty much do anything you want. You can cut down trees, you can burn them down, you can plant saplings. You can mine resources, kill monsters, farm beets, grow sugar cane and much, much more. But what does that have to do with Python? And for that matter, Rasberry Pi?
Minecraft is most often played as a Windows game or a console game (XBox, Playstation, etc) but it can also be played on a small computer called a Rasberry Pi that runs Linux. While the Windows or console versions are built on who knows what programming language, the Rasberry Pi version is built using Python.
So, how do you get a kid interested in programming? Find something they already like (such as Minecraft) and use programming to make it better. Knowing Python programming and having a Rasberry Pi, a child (or adult….) can modify the game, add cool stuff, and generally tweak things to their hearts content. A Rasberry Pi is only about $50 for a kit (with all the wires and stuff) so the only obstacle to our plan is knowing Python.
Learning Python
Enter Automate the Boring Stuff with Python. Remember, our overall goal here is to automate repetitive tasks. What book could be more applicable than a book with the title Automate the Boring Stuff with Python. A quick search on Amazon will find the book for about $25. If you are lucky, your local library may have the book that you can read for free. But….. I have a secret…….
(…looking around to make sure no one is listening….)
…. you can read the entire book online for FREE.
Yes, free. The entire book, for free. There may be better resources online to learn Python like CodeAcademy or Learn Python, but for our goal, to automate tasks, this option seemed to be the best fit and the right price. Plus, if you would rather read a physical book, you can easily buy it.
For anyone that has programmed before, Python will be quick and easy to learn, at least the basics. There are functions to manipulate strings, build lists (arrays), something called a tuple (which was new to me), loops, and more. If you’ve programmed before, you understand most things that a programming language can do so you just need the specific Python code to get those thing done.
The main benefit of this particular book is that it’s geared towards automating tasks so many of the examples and sample code is focused on that. That works great if automating tasks is your goal, like it is for us.
Automating with Python
Our goal today is to automate, as much as possible, the monthly OIG (Office of the Inspector General) check and the monthly GSA (U.S. General Services Administration). In many companies (specifically healthcare related), all or most employees must be checked for OIG and GSA exceptions. These exceptions would prevent an employee from working in a healthcare related field for a variety of reasons. You can read more about the OIG here and GSA here but for our uses, we just need to know that if anyone is on either list, they may not be able to work in healthcare.
To look for OIG exclusions, the process is as follows:
- Go to https://exclusions.oig.hhs.gov/
- You have the option to search for an individual (default), Multiple Individuals, Single Entities, or Multiple Entities. Although you can use the Multiple Individuals since we are going to automate things, let’s stick with the default Single Individual. It will be easier to save individual search results that we can refer back to later.
- Enter the last name and first name of the employee into the appropriate fields
- Click the search button.
- Verify the results (including investigating any exclusions) and document the search (usually by printing or saving the screen as a PDF)
- Repeat steps 3-5 for all employees
This procedure is not that bad for a list of 20 or so employees and really doesn’t take that long (especially when you use the Multiple Individuals search when you can put in up to 5 names).
It does, however, get a little tedious when your list gets to 50+ people. It’s not hard, just time consuming and sorta boring. And when something is time consuming and boring, we want to automate it if possible.
To look for GSA exclusions, the process is as follows:
- Go to https://www.sam.gov and click on Search Records
- In the Quick Search box, enter the employee’s name
- Click the Search button
- Verify the results (including investigating any exclusions) and document the search (usually by printing or saving the screen as a PDF)
- Click the Clear Search button
- Repeat steps 2-5 for all employees
Just like the OIG check, the GSA check is not difficult or time consuming for 20 or so employees but it seem to take more time than the OIG check to get the results. Even an extra 15 seconds per employee can turn in to many extra minutes waiting when you have a list of 100+ to check. Another great candidate to automate.
The Task
In the end, we want to be able to run a program that do the procedures above with little or no intervention from us. It should just work. To simplify things, here is what our program will do:
- Get a list of employees
- Run the OIG check for all employees
- Run the GSA check for all employees
- Check to make sure all search results have been saved (and check for possible exclusions)
- Email the results
As you are building any program, there will be a lot of trial and error as you figure out what works, what doesn’t, and how to get around obstacles. The code below is the culmination of several individual pieces and many iterations.
This is not the only way to solve this problem and I am not a professional coder. There is always room for improvement but to a certain point, why spend a lot of time to improve something that works. With all of that said, let’s get started.
The Program
Here is the program as it stands in June 2016:
#! python3 # oig-gsa-check.py - OIG/gsa check - v1.0 # ********************** # This progam automates OIG & GSA checks into one program. It imports an employee list from an Excel sheet # and loads these names into two lists, lastNames and firstNames. It then opens # the OIG website, fills in the first and last name of the employee, and clicks # the search button. After the results are returned, the page is saved as a PDF # with the file name to include the month the check was performed and the name # of the employee. The program then loops to the next employee and continues until # all employees on the list are checked. # The program then moves to the GSA check and does a similar action. Once both checks # are complete, the filenames are checked to make sure all employees have saved # results. The program also checks the size of the files to indicate any exceptions, # reporting those exceptions on screen and via an email # # Note: the PDF printer must be set as default. This program does not choose the # printer, only prints the page to the default printer. If you want physical copies # printed, set your desk printer as the default from selenium import webdriver from selenium.webdriver.common.keys import Keys import win32com.client import time import datetime import openpyxl import win32gui import sys import os import smtplib # ****** Set constant variables (URL, Year, Month, etc) address = r'https://exclusions.oig.hhs.gov/' year = datetime.date.today().year month = datetime.date.today().month day = datetime.date.today().day pdfFilePath = 'C:\\OIG-GSA\\' count = 0 timeDelay = 0 maxTimeDelay = 10 months = { 1: "January", 2: "February", 3: "March", 4: "April", 5: "May", 6: "June", 7: "July", 8: "August", 9: "September", 10: "October", 11: "November", 12: "December"} lastNameCol = 2 firstNameCol = 3 empListDateRow = 1 empListDateCol = 6 excelSheetName = r'T:\GSA - OIG List.xlsx' smtpServer = '[enter your SMTP server here]' smtpFrom = 'oig-gsa@here.com' # enter your From email account here smtpToExceptions = 'it@here.com' # enter email address to receive exceptions smtpToSuccess = 'it@here.com' # enter email address to receive success message smtpBody = '' # ***** Variables for OIG/GSA verify exceptionFileSizeOIG = 75000 # Exceptions start around 81k exceptionFileSizeGSA = 115000 # Exceptions start around 121k # ***** Setup lists and assign values firstNames = [] lastNames = [] # ***** Opens the Excel file wb = openpyxl.load_workbook(excelSheetName) sheet = wb.get_sheet_by_name('Sheet1') # ***** Checks the date in the Excel file to see if the month listed is the same as the current month. If not, exit empListDate = sheet.cell(row=empListDateRow, column=empListDateCol).value if (month != empListDate.month): print ("This list may be outdated. Please verify. List date is: " + str(empListDate)[0:10]) #sys.exit() # ********** Load data from Excel sheet for row in range(2, sheet.max_row + 1): if(sheet.cell(row=row, column=firstNameCol).value == None): continue firstNames.append(sheet.cell(row=row, column=firstNameCol).value) lastNames.append(sheet.cell(row=row, column=lastNameCol).value) # ***** Checks to make sure there are the same number of last names as first names. If not, exit if (len(firstNames) != len(lastNames)): print('There is a mismatch in the number of last names vs. first names') sys.exit() # ***** Open/start Chrome driver = webdriver.Chrome() # ***** ***** ***** Start of OIG-specific code (does include some common snippets) print("\nStarting OIG check...") for count in range(0,len(firstNames)): # ***** Open the URL(address) in the current browser session driver.get(address) # ***** Find the first name field and fill in the First Name of person elem = driver.find_element_by_id("ctl00_cpExclusions_txtSPFirstName") elem.send_keys(firstNames[count]) # ***** Find the last name field and fill in the Last Name of the person elem = driver.find_element_by_id("ctl00_cpExclusions_txtSPLastName") elem.send_keys(lastNames[count]) # ***** Click on the Search button elem = driver.find_element_by_id("ctl00_cpExclusions_ibSearchSP") elem.click() # ***** Send the SHIFT+CTRL+P command to print using the system dialog shell = win32com.client.Dispatch("WScript.Shell") shell.SendKeys('^+p') #prints using system dialog - CTRL-SHIFT-P # ***** Checks to see if the Print window has popped up and if not, wait 1 second while True: try: printWindow = win32gui.FindWindow(None, 'Print') win32gui.SetForegroundWindow(printWindow) except: time.sleep(1) else: break shell.SendKeys('%p') # send ALT-P # ***** Checks to see if the Save PDF File as window has popped up and if not, wait 1 second while True: try: printWindow = win32gui.FindWindow(None, 'Save PDF File as') win32gui.SetForegroundWindow(printWindow) except: time.sleep(1) timeDelay += 1 if (timeDelay > maxTimeDelay): print("Max time delay reached") shell.SendKeys("%p") timeDelay = 0 else: timeDelay = 0 break # ***** Enter the path and file name for the PDF file shell.SendKeys(pdfFilePath+'OIG Search Results - ' + str(year)+' - '+str(month)+' - ' +lastNames[count]+', ' + firstNames[count]+'.pdf') time.sleep(1) shell.SendKeys('%s') # send ALT-S print("Saving OIG check for "+firstNames[count]+" " + lastNames[count]) time.sleep(1) print("\nOIG check complete. " + str(len(firstNames)) + " employees checked.") print("***********************************************") # ****** Sets variables for GSA check address = r'https://www.sam.gov' count = 0 # ***** Clicks on the Search Records button to get search ready driver.get(address) elem = driver.find_element_by_link_text("Search Records") elem.click() print("\nStarting GSA check...") for count in range(0,len(firstNames)): # ***** Enters name into search box elem = driver.find_element_by_id("q") elem.send_keys(lastNames[count] + ", "+firstNames[count]) # ***** Clicks Search button elem = driver.find_element_by_id("RegSearchButton") elem.click() # ***** Trigger print using system dialog - SHIFT-CTRL-P shell = win32com.client.Dispatch("WScript.Shell") shell.SendKeys("^+p") #prints using system dialog # ***** Checks to see if the Print window has popped up and if not, wait 1 second while True: try: printWindow = win32gui.FindWindow(None, 'Print') win32gui.SetForegroundWindow(printWindow) except: time.sleep(1) else: break shell.SendKeys("%p") # ***** Checks to see if the Save PDF File as window has popped up and if not, wait 1 second while True: try: printWindow = win32gui.FindWindow(None, 'Save PDF File as') win32gui.SetForegroundWindow(printWindow) except: time.sleep(1) timeDelay += 1 if (timeDelay > maxTimeDelay): print("Max time delay reached") shell.SendKeys("%p") timeDelay = 0 else: timeDelay = 0 break # ***** Enter the path and file name for the PDF file shell.SendKeys(pdfFilePath+'GSA Search Results - ' + str(year)+' - '+str(month)+' - ' +lastNames[count]+', ' + firstNames[count]+'.pdf') time.sleep(1) shell.SendKeys('%s') # send ALT-S print("Saving GSA check for "+firstNames[count]+" " + lastNames[count]) time.sleep(1) # ***** Clears search form for next search elem = driver.find_element_by_xpath("//input[@title='Clear Search']") elem.click() print("\nGSA check complete " + str(len(firstNames)) + " employees checked.") print("***********************************************\n") exceptionOIG = 0 exceptionGSA = 0 exceptionListOIG = [] exceptionListGSA = [] smtpBody = smtpBody + 'The following are the possible OIG-GSA exceptions for ' + months[month] + ' ' + str(year) + ':<br><br>' # ***** ***** ***** ***** Verify OIG/GSA results # ***** Loops through the pdfFilePath directory for a list of filenames (and subfolders) for folderName, subfolders, filenames in os.walk(pdfFilePath): # ***** Iterates through list of file names in the filenames value for filename in filenames: # ***** Gets the size of each file pdfFileSize = os.path.getsize(pdfFilePath + filename) # if file size is larger than normal, most likely an exception if(pdfFileSize>exceptionFileSizeOIG and filename.find('OIG',0)!=-1): exceptionOIG = exceptionOIG + 1 exceptionListOIG.append(filename) elif(pdfFileSize>exceptionFileSizeGSA and filename.find('GSA',0)!=-1): exceptionGSA += 1 exceptionListGSA.append(filename) print('OIG exceptions: ' + str(exceptionOIG) + " out of " + str(len(firstNames)) + " employees that were checked.") print(*exceptionListOIG, sep='\n') smtpBody = smtpBody + '<br>'.join(x for x in exceptionListOIG) + '<br>' print('GSA exceptions: ' + str(exceptionGSA) + " out of " + str(len(firstNames)) + " employees that were checked.") print(*exceptionListGSA, sep='\n') smtpBody += '<br>' smtpBody = smtpBody + '<br>'.join(x for x in exceptionListGSA) headers = ["From: " + smtpFrom, "Subject: OIG-GSA Check - Possible Exceptions", "To: " + smtpToExceptions, "MIME-Version: 1.0", "Content-Type: text/html"] headers = "\r\n".join(headers) # ***** ***** ***** ***** Send email list of all possible exceptions print ('\n\nSending OIG-GSA check email...') smtpObj = smtplib.SMTP(smtpServer, 25) # ***** If no exceptions, send success message. Otherwise, send exceptions if(len(exceptionListGSA)+len(exceptionListOIG) == 0): smtpTo = smtpToSuccess smtpBody = 'The OIG-GSA checks are complete. ' + str(len(firstNames)) + ' employees were checked on ' + str(month) + '/' + str(day) + '/' + str(year) + '.<br><br>No exclusions were found' else: smtpTo = smtpToExceptions smtpObj.sendmail (smtpFrom, smtpTo, headers + '\r\n\r\n' + smtpBody) smtpObj.quit()
I’ve tried to add comments to the relevant parts of the code not only for your benefit but for the benefit of anyone who looks at the program in the future (including me). Let’s go through it step by step.
OIG automation code walkthrough
First off, all of the lines that start with # are comments, which includes the first 18 lines and some others scattered throughout the program. I won’t get into the fine details of things like the first line, why the formatting is like it is, etc. The goal here is to explain what each part does.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import win32com.client
import time
import datetime
import openpyxl
import win32gui
import sys
import os
import smtplib
This sections tells the Python interpreter that you want to use some pre-written code in your programs. Specifically:
- Selenium – control browsers
- win32com.client – control the print alert in Windows
- time – pause or sleep the program
- datetime – get the date so we can save the files with descriptive file names
- openpyxl – ability to access Excel sheets
- sys – exit the program in the middle (in case of a problem with the employee list)
- os – step through the directory with all of the search results and look at the files
- smtplib – send emails
# ****** Set constant variables (URL, Year, Month, etc) address = r'https://exclusions.oig.hhs.gov/' year = datetime.date.today().year month = datetime.date.today().month day = datetime.date.today().day pdfFilePath = 'C:\\OIG-GSA\\' count = 0 timeDelay = 0 maxTimeDelay = 10 months = { 1: "January", 2: "February", 3: "March", 4: "April", 5: "May", 6: "June", 7: "July", 8: "August", 9: "September", 10: "October", 11: "November", 12: "December"}
Next, we setup a few variables that we will use throughout the program. We set the URL or address (the r in front of the value tells Python to treat the string as a raw string – useful when entering strings with special characters) to the search page of the OIG site. Then, we get the year part of today’s date and save it as year. Same thing for month and day. We then set the variable filePath to the path we want to save the resulting PDF files.
We initialize our loop variable count (lists in Python start with 0 as the first entry) and set the timeDelay to 0 and the maxTimeDelay to 10 (we’ll get into why in a bit). Finally, we set up a dictionary to easily give us the full month from the month number.
In Python, a dictionary is a lot like a 2-dimensional array. When you reference the key (in this case, numbers), the dictionary returns the element (in this case, the names of the month). We could have just used an array, but this is a good opportunity to use a dictionary.
lastNameCol = 2 firstNameCol = 3 empListDateRow = 1 empListDateCol = 6 excelSheetName = r'T:\GSA - OIG List.xlsx'
Continuing with the variables, we set lastNameCol to the column number for the last name in the Excel sheet. Same with the firstNameCol. We set empListDateRow and empListDateCol to correspond to the location of the date on the employee list. The date is there remind us when the last time the list was updated so we can do each check with the most up-to-date information. Finally, we set excelSheetName to the location and file name of the employee list Excel file.
smtpServer = '[enter your SMTP server here]'
smtpFrom = 'oig-gsa@here.com' # enter your From email account here
smtpToExceptions = 'it@here.com' # enter email address to receive exceptions
smtpToSuccess = 'it@here.com' # enter email address to receive success message
smtpBody = ''
These variables setup the email server, the From email address, and the different email addresses for exceptions or success (if needed; here we are using the same email address for both success and exceptions). The body of the email is setup later in the program.
# ***** Variables for OIG/GSA verify exceptionFileSizeOIG = 75000 # Exceptions start around 81k exceptionFileSizeGSA = 115000 # Exceptions start around 121k
Here, we add some file size limits, one for the OIG and GSA. These values will help us identify possible exceptions automatically. After looking back at monthly searches for the past year, all of the OIG PDF files were around 50k bytes if they did not have any search results and over 81k if there was results found (more text in the file). Using that information, we can use Python to check the file sizes and point us in the direction of possible exceptions that we need to research further. Same goes for the GSA except the file sizes are a little different. We’ll discuss it more near the end of the program.
# ***** Setup lists and assign values firstNames = [] lastNames = []
We setup the employee lists, getting them ready for the Excel data.
# ***** Opens the Excel file wb = openpyxl.load_workbook(excelSheetName) sheet = wb.get_sheet_by_name('Sheet1')
Now comes the data. We open the Excel file that contains the employee list and get a reference to the sheet we need, Sheet1.
# ***** Checks the date in the Excel file to see if the month listed is the same as the current month. If not, exit empListDate = sheet.cell(row=empListDateRow, column=empListDateCol).value if (month != empListDate.month): print ("This list may be outdated. Please verify. List date is: " + str(empListDate)[0:10]) sys.exit()
We pull the list date from the spreadsheet and check to make sure the list has been updated during the same month that we are doing the test. If not, we exit the program. You can always comment out the sys.exit(). The program will let you know the list may be outdated but it will continue to do the checks.
# ********** Load data from Excel sheet for row in range(2, sheet.max_row + 1): if(sheet.cell(row=row, column=firstNameCol).value == None): continue firstNames.append(sheet.cell(row=row, column=firstNameCol).value) lastNames.append(sheet.cell(row=row, column=lastNameCol).value)
Here, we go through the spreadsheet with a for loop loading up the firstNames and lastNames lists with employee information. If we reach a cell that is blank, we skip that row and go to the next. The loop stops at the last row with data.
# ***** Checks to make sure there are the same number of last names as first names. If not, exit if (len(firstNames) != len(lastNames)): print('There is a mismatch in the number of last names vs. first names') sys.exit()
We compare the number of elements in the firstNames list and the lastNames list. If the numbers don’t match, an error is displayed and the program exits. Since most people have both a first and last name, this is a good check to make sure all of the data was added properly. It’s not foolproof, but it’s an easy check.
# ***** Open/start Chrome driver = webdriver.Chrome()
We open a Chrome window and get prepared to run the first check on the OIG website.
# ***** ***** ***** Start of OIG-specific code (does include some common snippets) print("\nStarting OIG check...") for count in range(0,len(firstNames)): # ***** Open the URL(address) in the current browser session driver.get(address)
Just to give some feedback, the program displays that the OIG check is starting in the command window. We iterate through a for loop that starts at 0 (since lists in Python start at 0) and stops when the last element in firstNames is reached. Since we’ve already checked to make sure the number of firstNames is the same as the number of lastNames, we only need to check one or the other. Using the Chrome window we opened a second ago, we open the OIG website, https://exclusions.oig.hhs.gov/.
# ***** Find the first name field and fill in the First Name of person elem = driver.find_element_by_id("ctl00_cpExclusions_txtSPFirstName") elem.send_keys(firstNames[count]) # ***** Find the last name field and fill in the Last Name of the person elem = driver.find_element_by_id("ctl00_cpExclusions_txtSPLastName") elem.send_keys(lastNames[count]) # ***** Click on the Search button elem = driver.find_element_by_id("ctl00_cpExclusions_ibSearchSP") elem.click()
Web pages are setup for people to use. We see a button that says search, we click on it. It doesn’t matter what the button is named in the code, users are just interested in what it looks like on the page. Unfortunately, we can’t easily use Python or another programming language to visually look at a page. We need some way to identify specific elements of the page. Looking around in the code of the search page, I found the names of the form field for the first name, last name, and the name of the search button.
Using that information, we can get a reference to the first name field and send that field the text of the first name of the first employee. We then move to the last name and fill it in. To do the search, we get a reference to the search button and get Python to click the button.
# ***** Send the SHIFT+CTRL+P command to print using the system dialog shell = win32com.client.Dispatch("WScript.Shell") shell.SendKeys('^+p') #prints using system dialog - CTRL-SHIFT-P # ***** Checks to see if the Print window has popped up and if not, wait 1 second while True: try: printWindow = win32gui.FindWindow(None, 'Print') win32gui.SetForegroundWindow(printWindow) except: time.sleep(1) else: break
One of the great things about the Selenium module is when commands are sent to the browser, the program pauses until the browser has finished loading. That way, your program is not sending data or commands to a busy browser. While that might not seem important, when your program is sending data or commands to a browser that’s busy, those commands get skipped and your program doesn’t do what it is suppose to do.
After the search button is clicked and the browser returns control to the program, we reference the window and send a SHIFT-CTRL-P command that prints using the system dialog.
Occasionally, the print dialog takes a few seconds to pop-up and if we try to focus on the print dialog window before it is created, the program will crash. To fix this issue, a loop was created that will look for a window with the title Print and then focus on that window. If there is not window with the title Print or the program can’t focus on that window, we sleep the program for 1 second then look for the window again. If the window is found, we exit the loop and send an ALT-P to the print dialog box to initiate the print (see the code below):
shell.SendKeys('%p') # send ALT-P # ***** Checks to see if the Save PDF File as window has popped up and if not, wait 1 second while True: try: printWindow = win32gui.FindWindow(None, 'Save PDF File as') win32gui.SetForegroundWindow(printWindow) except: time.sleep(1) timeDelay += 1 if (timeDelay > maxTimeDelay): print("Max time delay reached") shell.SendKeys("%p") timeDelay = 0 else: timeDelay = 0 break
After we send the ALT-P command, we are taken to the Save PDF File as dialog box. It is important to mention that the program will print the search results to the default printer. If you don’t have a PDF printer set as your default, your dialog box may be different. The PDF printer I’m using is the Adobe PDF printer but with just a few changes, the program can be modified to work with other PDF printers.
Like the CTRL-SHIFT-P command, the ALT-P command will occasionally experience some delays and we need to use a similar loop like we did before. This time, if the Save PDF File as dialog box is not found, we will sleep the program for 1 second and increment the timeDelay variable. Once timeDelay gets to 15 seconds, we output that the max time delay was reached and we send another ALT-P command. Then, we go through the loop again. In my experience, there will be 5-6 Max time delay reached messages displayed, scattered throughout the list of 100 people.
# ***** Enter the path and file name for the PDF file shell.SendKeys(pdfFilePath+'OIG Search Results - ' + str(year)+' - '+str(month)+' - ' +lastNames[count]+', ' + firstNames[count]+'.pdf') time.sleep(1) shell.SendKeys('%s') # send ALT-S print("Saving OIG check for "+firstNames[count]+" " + lastNames[count]) time.sleep(1)
After the Save PDF File as dialog box pops up, we fill in the file name we want to use for each search result. The program pauses for a second to give the dialog box plenty of time to process. It then sends the ALT-S to save the file and displays a message in the command window that the search results for a particular person are being saved. The file names are setup to look like this:
C:\OIG-GSA\OIG Search Results - 2016 - 06 - Doe, John.pdf
At the end of the loop, the program pauses for a second before starting the loop all over again with the next employee.
Wow, that was a long description but I hope it was in depth enough to provide some good information.
But wait, there’s more….. We still need to do the GSA check. Before we get into the GSA check, the program outputs that the OIG check was complete and how many employees were checked.
print("\nOIG check complete. " + str(len(firstNames)) + " employees checked.") print("***********************************************")
GSA automation code walkthrough
The code for the OIG and GSA checks are very similar so we’ll only look at the parts that are different enough to be interesting.
# ***** Enters name into search box elem = driver.find_element_by_id("q") elem.send_keys(lastNames[count] + ", "+firstNames[count])
The GSA check has one form field for the first and last names or the business name so we combine the last name with the first name, add a comma in between, and send that to the field.
# ***** Clears search form for next search elem = driver.find_element_by_xpath("//input[@title='Clear Search']") elem.click()
Due to the structure of the site, we can’t just access the search page by going to the original URL. When you access the search page, it adds several variables to the URL. There had to be another way to get back to the search form. Luckily, the web page had a button that cleared the search and takes you back to the search form. It took a bit of trial and error to access the button, but the way that worked was to access the button by the title.
After that, the loop repeats for the next employee.
print("\nGSA check complete " + str(len(firstNames)) + " employees checked.") print("***********************************************\n")
After the loop finishes, the program displays that the GSA check is complete and how many employees were checked, just like with the OIG check.
Automating the results
Now that both the OIG and GSA checks have been completed, we can look at all of the files and see if we can identify any possible exceptions to display and email.
exceptionOIG = 0 exceptionGSA = 0 exceptionListOIG = [] exceptionListGSA = [] smtpBody = smtpBody + 'The following are the possible OIG-GSA exceptions for ' + months[month] + ' ' + str(year) + ':<br><br>'
The variables exceptionsOIG, exceptionsGSA, exceptionListOIG and exceptionListGSA are initialized. They will be used to count and contain the file names of any file that are larger than the file sizes we setup at the beginning of the program. We also add some text to the body of the email that we will be sending later.
# ***** ***** ***** ***** Verify OIG/GSA results # ***** Loops through the pdfFilePath directory for a list of filenames (and subfolders) for folderName, subfolders, filenames in os.walk(pdfFilePath): # ***** Iterates through list of file names in the filenames value for filename in filenames: # ***** Gets the size of each file pdfFileSize = os.path.getsize(pdfFilePath + filename) # if file size is larger than normal, most likely an exception if(pdfFileSize>exceptionFileSizeOIG and filename.find('OIG',0)!=-1): exceptionOIG = exceptionOIG + 1 exceptionListOIG.append(filename) elif(pdfFileSize>exceptionFileSizeGSA and filename.find('GSA',0)!=-1): exceptionGSA += 1 exceptionListGSA.append(filename)
In this code, we look at the individual search result files and do several things. First, we walk through the directory and get the file size of each file. If the file name contains the letters OIG and is larger than the exception file size for the OIG results, we increment the exceptionOIG variable and add the file name to the exceptionListOIG list. Same thing for the GSA. We walk the directory one time and check all of the files depending on the file name.
print('OIG exceptions: ' + str(exceptionOIG) + " out of " + str(len(firstNames)) + " employees that were checked.") print(*exceptionListOIG, sep='\n') smtpBody = smtpBody + '<br>'.join(x for x in exceptionListOIG) + '<br>' print('GSA exceptions: ' + str(exceptionGSA) + " out of " + str(len(firstNames)) + " employees that were checked.") print(*exceptionListGSA, sep='\n') smtpBody += '<br>' smtpBody = smtpBody + '<br>'.join(x for x in exceptionListGSA)
After going through all of the files, it’s time for some answers. We output a list of the OIG exceptions to the command window including how many possible exceptions there were out of the total number. Python let’s us print all the items in a list with a character or string to separate them (like we did with the new line character, \n). Then, we add the file names to the body of the email, separating each line with a <BR> (new line HMTL code) for ease of viewing. Same thing for the GSA list.
headers = ["From: " + smtpFrom, "Subject: OIG-GSA Check - Possible Exceptions", "To: " + smtpToExceptions, "MIME-Version: 1.0", "Content-Type: text/html"] headers = "\r\n".join(headers)
Here, the header of the email is setup. The last header value, Content-Type, is what let’s us use the <BR> HTML tag to display each exception on a different line in the email.
# ***** ***** ***** ***** Send email list of all possible exceptions print ('\n\nSending OIG-GSA check email...') smtpObj = smtplib.SMTP(smtpServer, 25) # ***** If no exceptions, send success message. Otherwise, send exceptions if(len(exceptionListGSA)+len(exceptionListOIG) == 0): smtpTo = smtpToSuccess smtpBody = 'The OIG-GSA checks are complete. ' + str(len(firstNames)) + ' employees were checked on ' + str(month) + '/' + str(day) + '/' + str(year) + '.<br><br>No exclusions were found' else: smtpTo = smtpToExceptions smtpObj.sendmail (smtpFrom, smtpTo, headers + '\r\n\r\n' + smtpBody) smtpObj.quit()
We finish up by displaying that we are sending the OIG-GSA check email and then setup the email itself. An email object is created first. Then, the program checks the two lists to see if there are any possible exclusions. If both list is empty (or more specifically, it the length of the arrays added together is equal to zero), the body of the email is updated to say that the checks were successful with no exceptions found. The email address we are sending the message to is updated just in case you want to send successful messages to one address and exceptions to another. Once all that is updated, we send the email and close the email object.
And now, at long last, our program is finished. It’s not perfect but it works. The whole process, if done manually with 100 employees, takes over 3 hours. Automated with Python, it took about 30 minutes of computer time and less than 1 minute typing in the command to run the program. Not bad for a little bit of coding, a time reduction of over 97%.
Here is a short video showing the program working with a list of 15 randomly generated names (names generated on http://random-name-generator.info/). The program took about 5 minutes to run the 15 names through both the OIG and GSA databases. The only user input was running the program (5 seconds maybe?).
Around 3:00 minutes you can see the program working through a temporary delay, displaying a Max time delay reached message. There is another delay around 4:05 minutes. When the program fails to find the correct window, it delays for 1 second up to a max of 10 seconds
Summary
There you have it: one way to automate the tedious, boring task of performing monthly OIG and GSA checks. We discussed why Python was a good choice, what steps were needed to perform the goal task, how each part of the program performed those tasks, and some concepts concerning automating your own tasks with Python.
Now it’s your turn. Find a task on the computer that is repetitive and see if you can use Python to make it easier. You never know, you just might learn an important, valuable skill while you’re at it.
Directives
To automate tasks with Python, consider the following
- Have a good, clear understanding of the task – Make note of each staep, no matter how small, to accomplish your goal. You may click two or three links to get where you need to go and not think about it but you must tell a program exactly what steps to take.
- Start small and simple – The program that you start with will often not look like the end result. Get your program to complete one repetition of the task successfully before adding any loops.
- There is more than one way to skin a cat – If you hit a road block with your programming, Google is your friend. Think about using keyboard combinations, mouse clicks, etc to accomplish the task. If one way doesn’t work, try another input method.
- After it works for one iteration, add a loop – After you have a program that will do the task once, add in a loop to repeat the task. It doesn’t help much to automate a task you only do once.
- Once you have a finished program, add comments – You’ll forget not only what this program does but how it does it very quickly. Comments help not only you but anyone that comes after you to better understand the program
Leave a Reply
You must be logged in to post a comment.