Skip to article frontmatterSkip to article content

An intensive course on the modern practice of scientific computing aimed at the late-undergraduate or graduate level in the physical sciences, mathematics and engineering.

Introduction

Scientific computing, broadly understood as the use of computational resources to solve research-specific problems, has in the last decade grown immensely in its reach across disciplines as well as changing in nature. The vast majority of computational effort in science during the XXth century was devoted to the solution of numerical problems (first exclusively with Fortran and then including other languages). But in the past two decades, higher-level environments that allow for more rapid and flexible development and incorporate extensive libraries, data analysis and interactive exploration have become extremely popular. This was pioneered by the creation of Matlab and IDL, followed by systems with symbolic capabilities like Mathematica and Maple, all of them commercial products.

Simultaneously, we reached the point where today, scientists from virtually every discipline need to perform extensive computational work. This is partly thanks to the existence of such systems that enable someone who is not an expert at low-level programming to perform sophisticated computational tasks, but also fueled by the explosion of data that is flooding many fields in a way that inescapably mandates a change towards semi-automated, large-scale computational analysis and modeling (for a particularly interesting example of this, see Josh Bloom’s current work on automatic real-time astronomical data acquisition and analysis)

In the last decade, open source alternatives to the above commercial systems have emerged, and the most successful ones are all based on the Python programming language, supported by a vast array of additional tools and libraries to do everything from basic array manipluation to sophisticated modeling and visualization.

This course will present a practical perspective on these tools, targeted at the level of graduate students in the physical sciences, mathematics and engineering. By practical we mean that we will not spend much time proving theorems regarding the convergence rate of a particular algorithm, for example. We will spend time discussing issues that sit at the intersection of scientific computing and software engineering, such as: documentation, testing and validation of codes, as well as version control and the participation of scientists as developers of their own research tools in open-source collaborations. This is important, because I have found through painful experience that often, scientific research projects have difficulties not because we aren’t smart enough (we never are) or we don’t work hard enough (we actually do put in a lot of time). The problems arise because of a nasty interlocking mess of poor practices regarding tool development, data management and accrual of verifyable results that leads to highly sub-optimal outcomes. This course will then try to at least touch on these topics as a natural part of the workflow. Rather than repeating here what others have said better I’d encourage you to look at Greg Wilson’s excellent Software Carpentry materials, as well as a related post by Joel Spolsky which, despite being made in the context of a commercial software house, in my opinion applies 100% to our problem of interest.

Format

The course is organized around 24 hands-on 2-hour sessions, where we will mix short presentations with work on actual problems to illustrate and explore. The class will take place in a computer lab where systems that have all the prerequisites pre-installed are available, but you are also welcome to bring your own laptop if you so desire. I will present short problems during class which you will be given some time to try and solve, and whose solution we will then discuss immediately; longer problems will be given as homework.

At the end of the course, a final project will be required of each student, which can take one of two forms:

Tools required

If you wish to install the tools in your own system, you can find the necessary information in my “Starter Kit” page. It is important that you actually run the checklist script, which for example on my Ubuntu 9.10 system produces:

> python workshop_checklist.py 
Running tests:
__main__.test_imports('setuptools',) ... MOD: setuptools, version: 0.6c9
ok
__main__.test_imports('IPython',) ... MOD: IPython, version: 0.11.bzr.r1346
ok
__main__.test_imports('numpy',) ... MOD: numpy, version: 1.3.0
ok
__main__.test_imports('scipy',) ... MOD: scipy, version: 0.7.0
ok

[ lots of intermediate output removed ]

Simple plot generation. ... ok
Plots with math ... ok

----------------------------------------------------------------------
Ran 21 tests in 37.571s

OK

***************************************************************************
               TESTS FINISHED
***************************************************************************

If the printout above did not finish in 'OK' but instead says 'FAILED', copy
and send the *entire* output (all info above and summary below) to the
instructor for help.

==================
System information
==================
os.name      : posix
os.uname     : ('Linux', 'maqroll', '2.6.31-17-generic', '#54-Ubuntu SMP Thu Dec 10 16:20:31 UTC 2009', 'i686')
platform     : linux2
platform+    : Linux-2.6.31-17-generic-i686-with-Ubuntu-9.10-karmic
prefix       : /usr
exec_prefix  : /usr
executable   : /usr/bin/python
version_info : (2, 6, 4, 'final', 0)
version      : 2.6.4 (r264:75706, Dec  7 2009, 18:45:15) 
[GCC 4.4.1]
==================

Prerequisites

This course assumes the level of a graduate student in the physical sciences, math or engineering, with a grasp of calculus, linear algebra, ordinary differential equations, basic statistics and elmentary numerical methods.

In addition, we will assume a reasonable familiarity with programming. If you have never programmed in your life, I am afraid this is not the course for you: I will not be teaching basic programming concepts from scratch. Python is an extremely readable language, and its syntax is regular and easy to remember, but this doesn’t change the fact that some ideas in programming require a certain amount of practice to understand.

You must also work through (not just read, but actually type in and execute) the basic Python tutorial, as well as the introductory NumPy tutorial. I will explain Python syntax and language details as we go, but I will do so quickly; it is therefore really important you have some practice with the basics of the language and of Numpy arrays first.

Specifically, I assume you have read and understood these topics (you don’t have to be an expert, I will cover all of these, but they should be reasonably familiar to you in general, and specifically in how Python implements them):

(hint: all of these are covered in the basic tutorials linked above).

Please see the “Starter Kit” page for detailed links to multiple references available online.

Topics

The following is a tentative list of topics to be covered. I emphasize tentative because I hope to adjust the actual workflow of the course based on feedback from the students regarding background, interest and relevance to your own projects.