Help & Info
Instructors
There are 3 instructors for this course:
- Marie-Helene (Marie) Burle, from WestGrid and Compute Canada
- Grace Fishbein, from ACENET and Compute Canada
- Lydia Vermeyden, from ACENET and Compute Canada
Schedule
Day 1
9:00–9:15am Introductions and Course Plan
9:15–11:00am Intro to Command Line
11:00–11:15am Break
11:15–1:00pm Intro to Python
Day 2
9:00–11:00am Dive into Python part 1
11:00–11:15am Break
11:15–1:00pm Dive into Python part 2
Day 3
9:00–11:00am Room 1: Topic Modeling; Room 2: Web Scraping
11:00–11:15am Break
11:15–1:00am Room 1: Natural Language Processing; Room 2: API querying \\
Day 4
5:30 am -5:30 pm Office hours
Day 5
5:30 -8:30 am
9:00–11:00am Project Presentations
11:00–11:15am Break
11:15–1:00pm Project Presentations
Project selection
There will be an independent study project as part of the course, for which we will assign small groups (2-3 people depending on shared interests). We have built a google form to help us assign groups and projects. Please fill out the form no later than May 14th as this will allow us to plan for projects and groups ahead of time and tailor content accordingly.
Google form for project selection:
Project Selection Form.
If you don't have a project in mind you would like to complete there are several suggestions to choose from on the form:
Web scraping
This is an automated way to pull specific data from webpages. So this could be text, links, images etc., but the key is that this data is kept on web pages in a format and quantity such that it is difficult or time consuming to gather manually. Consider, you want to gather all the public health notices on all public health canadian websites that mention "lockdown". there are going to be several websites with information that is not in an easily exportable form. Perhaps you want to do this repeatedly over time. Python allows you to write a script that can gather this data for you automatically.
API querying
This is similar to web scraping and is also all about gathering data. Some websites (like twitter) have an application programming interface (API) . This is basically a little application over the big application (twitter for example) that can extract data from the big application and respond to commands from programs like python. Lots of social media platforms and many popular websites have APIs that can be used to gather curated datasets. (ex. I want all the tweets in the last month containing "snow in April").
Topic modelling
This is a tool that came out of machine learning, and it uses an algorithm to sort textual data at the individual word level. It treats the textual data like a "bag of words" or "several bags of words" for multiple manuscripts and collects the text words into topics. These topics are defined by a set number of keywords, and gives the researcher an insight into the kinds of words that are clustering together in their textual dataset.
Natural language processing (NLP)
Natural language processing is a subfield of artificial intelligence focusing on the computational processing of human languages. Its applications are diverse and include language translation, predictive text, message filtering, text analytics, survey analysis, voice assistants, processing of notes…
With the explosion of machine learning techniques such as deep neural networks in recent years, this field has become extremely active, with rapidly growing potential.
Interactive game 1
Build a text based adventure game based on a theme of your choice.
Interactive game 2
Build an interactive rock-paper-scissors (-lizard–Spock) game.
Interactive game 3
Build a mad libs generator.
Custom notification
Build a desktop notification application which gives you notifications based on your specific criteria.
Software installation
You will need to install a number of software on your machine for this course.
Python and Python packages
The simplest way to install Python and a number of very useful packages and tools is to install Anaconda. Follow the instructions from that link for your operating system.
Terminal
Linux and MacOS users already have a terminal. So this section is for Windows users only.
Windows users should install Git for Windows. While we will not use Git in this course, this software comes with a good Bash emulation called "Git Bash".
JupyterLab
Once you have a terminal and Anaconda installed, you will be able to install JupyterLab. As this installation requires the use of the command line, we will help you with this at the start of the course if you have any issues.
A good text editor
Microsoft Word and other word processors are not text editors: they add a lot of invisible formatting to the text you type. For this course, you need a text editor, so they are not suitable.
Notepad—which comes with Windows—is a text editor, but it is too limited.
Examples of good free text editors suitable for beginners are Visual Studio Code, Atom, Notepad++, Sublime Text. Once you have installed one, it would be a good idea to familiarize yourself with it.
Access to our training JupyterHub
For this course, we will use a temporary JupyterHub.
Here is how to log in:
- Go to https://uu.c3.ca.
- Sign in with the username & password we will give you during the course.
- Set the server options according to the image below:
These are the only values that you should edit:
Change the time to 8.0
Change the memory to 2000
Make sure the interface is set to JupyterLab
- Press start.
Please note that, unlike other JupyterHubs you might have used, this JupyterHub is not permanent and can only be used for this course.
Resources
Books
There are many books on Python, several of which can be accessed online for free, either directly, or through your university.
Books by O'Reilly
- Think Python, 2nd Edition, by Allen B. Downey
- Python Pocket Reference, 5th Edition, by Mark Lutz
- Introducing Python, by Bill Lubanovic
- Python in a Nutshell, 3rd Edition, by Alex Martelli, Anna Ravenscroft, and Steve Holden
- Learning Python, 5th Edition, by Mark Lutz
- Python Cookbook, 3rd Edition, by David Beazley and Brian K. Jones
- The Hitchhiker's Guide to Python, by Kenneth Reitz and Tanya Schlusser
- Fluent Python, by Luciano Ramalho
- High Performance Python, by Micha Gorelick and Ian Ozsvald
- Web Scraping with Python, by Ryan Mitchell
- Python Data Science Handbook, by Jake VanderPlas
- Python for Data Analysis, by Wes McKinney
- Foundations for Analytics with Python, by Clinton W. Brownley
- Data Wrangling with Python, by Jacquiline Kazil and Katharine Jarmul
- Data Visualization with Python and Javascript, by Kyran Dale
- Natural Language Processing with Python, by Steven Bird and Ewan Klein
- Thoughtful Machine Learning with Python, by Matthew Kirk
- Python for Finance, by Yves Hilpisch
Books by No Starch Press
- Automate the Boring Stuff with Python, by Al Sweigart
- Python Crash Course, by Eric Matthews
- Python Playground, by Mahesh Venkitachalam
- Doing Math with Python, by Amit Saha
- Invent Your Own Computer Games with Python, by Al Sweigart
Other books
- Python Machine Learning, by Sebastian Raschka
- Practical Programming: An Introduction to Computer Science Using Python 3, by Paul Gries, Jennifer Campbell, and Jason Montojo
- Python for Dummies, by Stef Maruch and Aahz Maruch
- Python Essential Reference, 4th Edition, by David Beazley
- Head First Python, by Paul Barry
- Python for Data Science for Dummies, by John Paul Mueller and Luca Massaron
- Beginning Programming with Python for Dummies, by John Paul Mueller
- Python for Everybody, by Charles Severance
Course Lessons and Materials
You can find the lesson notes and the materials for the sessions on different days in this Google Drive folder. It will also have code examples from past projects and everything from the coursepak (syllabus and readings)
Feel free to use some of the code to build your own project, and create folders to share work within your project group.