Code
math.atan?
To be frank: this notebook is rather boring. In this class, we will use the software package Python. The best way to learn new software (and probably most things) is when motivated by a particular problem. Would you read assembly instructions for furniture you do not plan to own? Probably not. In other notebooks we will pursue specific questions driven by neuronal data, and use our desire to understand these data to motivate the development and application of computational methods. But not in this notebook. Here, we focus on basic coding techniques and principles in Python in the abstract, without motivation. You - poor learner - must trust that these ideas and techniques will eventually be useful. We begin by dipping our toe into the Python pool, and learning the basic strokes; the fun and interesting parts in the “real world” of neuronal data happen later.
Let us delay no further. In the following examples, you are asked to execute code in Python. If your Python experience is limited, you should actually do this, not just read the text below. If you intend to ignore this advice - and not execute the code in Python - then instead walk to the local coffee shop, get a double espresso, and return to attempt these examples.
This notebook follows in spirit and sometimes in detail notebook 2 of MATLAB for Neuroscientists, an excellent reference for learning to use MATLAB in neuroscience with many additional examples. If you have not used Python before, there are many excellent resources online (e.g., the Python Data Science Handbook).
There are two ways to interact with this notebook. First, you could run it locally on your own computer using Jupyter. This is an excellent choice, because you’ll be able to read, edit, and excute the Python code directly and you can save any changes you make or notes that you want to record. The second way is to open this notebook in your browser and execute the examples directly in your browser, without installing additional software on your computer. In any case, we encourage you to execute each line of code in this file!
Throughout this notebook, we assume that you are running Python 3. Most of the functions used here are the same in Python 2 and 3. One noteable exception however is division. If you are using Python 2, you will find that the division operator /
actually computes the floor of the division if both operands are integers (i.e., no decimal points). For example, in Python 2, 4/3
equals 1
. While, in Python 3, 4/3
equals 1.333
.
We encourage you to use Python 3 for the sake of compatibility with this notebook, as well as for compatibility with future releases of Python.
We begin this notebook with an “on-ramp” to analysis in Python. The purpose of this on-ramp is to introduce you immediately to some aspects of Python. You may not understand all aspects of the Python language here, but that’s not the point. Instead, the purpose of this on-ramp is to illustrate what can be done. Our advice is to simply run the code below and see what happens…
Q: Try to read the code above. Can you see how it loads data, extracts useful information to print, then selects an interval of data to plot?
A: If you’ve never used Python before, that’s an especially difficult question. Please continue on to learn more!
Execute the following commands in Python:
Q: What does Python return? Does it make sense?
Enter the following command in Python:
Q: Does this answer make sense?
Q: Can you use parentheses to change the answer?
A function is a program that operates on arguments. Standard math functions and variables (and other useful things) can be accessed from the math
and numpy
modules. To use the math
and numpy
modules, we must first import both:
In this style, we indicate which module, or namespace, contains the function we want to call: x = np.arange(10)
or plt.plot(x, y)
.
You will often begin your data analysis with import
statements, to load the functionality you need. We can now call functions from math using numpy.*
. For example,
Above, sin
is the sine function. It operates on the argument 2*pi
. Notice that, once we have imported the numpy
module, Python knows the value of pi
. Here’s another example function that operates on arguments:
Q: What is math.atan
?
A: To answer this, try using Python Help. To start the Python Help, simply put a ?
at the end of math.atan
and then run this code block.
You should see a description of the function pop up at the bottom of the window.
Python Help is extremely useful, but may not work in a web browser. You can always look there when you have questions about a function, or search the internet for help, i.e., google it.
In Python, there are several different data structures that are designed to store more than one element. Here we will focus on the array
data structure, but if you are curious to know how and when to use other structures, there is a good explanation here. Let’s define an array:
A scalar is a single number. Consider,
Q: What do you find?
A: Notice that the scalar operates on each element of the array.
Let’s create an array and multiply it by itself,
Q: What does this return?
A: We see that the operator *
performs element-by-element multiplication of the values in array a
.
Q: What operation does np.multiply()
perform?
To see a list of the variables you’ve defined, type who
or whos
in a code block by themselves. Notice whos
provides more information.
The functions who
and whos
can be extremely useful, but may not work in a web browser.
To examine the dimensions of an array, we can ask for the shape
,
We find that the shape of a
is (1,4)
or 1 row and 4 columns. Notice we have two options to execute the shape
function:
In a.shape
we return the attribute shape
of the variable a
.
In np.shape(a)
we apply the function shape
from numpy
to the variable a
.
The result is equivalent.
By doing so, we get rid of all the variables. To do so, type %reset
and enter y
Q. What command could we use to confirm there are no variables in the workspace?
A. Consider who
.
The %reset
command is an example of a magic. Magics are commands that start with the %
symbol and use a language other than Python. They are only available in the notebook environment. In fact, the set of magics that is available is specific to the notebook kernel. This means that if you have a Jupyter notebook running a Ruby kernel the magics will be different.
A matrix is an array with more than one dimensio. Consider the following:
This creates a matrix with two rows and three columns. Consider,
Q: Can you see the two rows and three columns?
We can manipulate matrices like we manipulate vectors.
Matrices and vectors are arrays of numbers, and sometimes we want to access individual elements or small subsets of these lists. That’s easy to do in Python. Consider,
Python indexes from 0 (like C, C++, Java, and unlike MATLAB and Fortran which start at 1). To access the 2nd element of a
or b
, type a[1] / b[1]
. We’ll be a bit fancier with our printing now to distinguish variables. Calling str(a)
converts the variable a
to a string that can be printed easily. Adding two strings just concatenates them: "hi" + " bye" = "hi bye".
Q. Do the results make sense? How would you access the 4th element of each vector?
We can combine a
and b
to form a matrix with a
as the first row and b
as the second. Note that we apply the function array()
to the list [a,b]
, which it converts to a matrix.
To learn the size (or shape) of c
we use shape()
:
The shape of c
is [2 5]
. It has two rows and five columns. To access the individual element in the 1st row and 4th column of c
, type c[0,3]
We access matrices using ‘row, column’ notation. So c[0,3]
means print the element in row 0, column 3 of c
.
Q. How would you print all rows in the 2nd column of c
?
Often we are interested in only some of the elements of a matrix or vector. For example, we might want to look at the data from a single experimental trial which is stored in a particular row of a matrix. Alternatively, we might want to find out when the values in a time series cross a given boundary. Doing this is simple in Python.
Slicing means that we want to look at a specific portion of a vector or matrix, for example, the first row of a matrix. We will continue with the matrix c
from the previous example. The notation ‘:
’ means ‘all indices’. To access all columns in the entire first row of c
, type c[0,:]
. To access the 2nd thru 4th columns of the first row of c
, type c[0,1:4]
.
The notation 1:4
means all integers from 1 up to, but not including 4, which in this case gives columns 1, 2, and 3.
Leaving out the number before the colon tells Python to start at index 0. Leaving out the number after the colon tells Python to continue all the way to the end.
We can also tell Python how to step through the indices. To access only the even columns of c
, we can use the following:
This code tells Python to start at 0, continue to the end, and step by 2. The result should be the values in row 0, columns 0, 2, and 4 of c
. We could write this explicitly as c[0,0:5:2]
.
#### Selecting elements that satisfy a condition Sometimes we’re interested in locating particular values within a matrix or vector. As an example, let’s first define a vector.
Q. Calculate the shape of a
. What is the maximum value of a
? Hint: Use the max()
function.
Now let’s find all values in a
that exceed 10.
This is called logical indexing, let’s look at what a>10
returns:
When we index a
using this array lgIdx
we get back only the entries in a
corresponding to True
, as above:
Sometimes we want to know the actual indices in a where a > 10
. We can get them using the nonzero()
array method, which returns the index of all entries that were True
, or non-zero.
The command nonzero()
can be used as both a function and a method. A method is called by adding it after the object it is meant to operate on with a period in between ( lgIdx.nonzero()
). A function is called with the argument explicitly provided inside the parentheses ( nonzero(lgIdx)
). Basically, a function and a method do the same thing, but a function needs to be given an argument, while a method assumes that the argument is the object that the method is attached to. Note that if we use nonzero()
as a function, we need to tell it to look in NumPy for the definition (i.e. add `` at the beginning of the function call).
Now we have another way to select the desired elements of a
:
We can use these two types of indexing to change subsets of the values of a
.
Q: How does a
change in the first and second print statements?
We can perform these same logical operations for a matrix,
Notice that the last line collapses the True
entries to an array, ordered by row and then by column. If you’ve used MATLAB, this is the opposite of what it does!
It’s not easy to look at lists of numbers and gain an intuitive feeling for their behavior, especially when the lists are long. In these cases, it’s better to visualize the lists of numbers by plotting them. Consider
Q. Looking at the values in ‘y’ printed above, can you tell what’s happending?
A. Not really … let’s visualize y
vs x
instead.
To visualize y
versus x
let’s plot it. To do so, let’s first import some basic plotting routines from matplotlib
, which provides a nice 2D plotting library. We’ll also tell Python to show matplotlib
graphics inline, in this notebook.
Let’s start by plotting a simple example for x
and y
,
Q. Does the plot above make sense for the variables x
and y
?
Now, let’s go back to the definitions of x
and y
that we started this example with and plot y
versus x
.
The plot of x
versus y
should look a bit jagged, and not smooth like a sinusoid. To make the curve smoother, let’s redefine x
as,
Q. Compare this definition of x
to the definition above. How do these two definitions differ?
Q. What is the size of x
? Does this make sense?
Now let’s replot the sine function.
Q. Does this plot make sense, given your knowledge of x
, y
, and trigonometry?
Continuing the example in the previous section, let’s define a second vector
and plot it:
We’d now like to compare the two variables y
and z
. To do this, let’s plot both vectors on the same figure, label the axes, and provide a legend,
Notice that we’ve included a third input to the function plot
. Here the third input tells Python to draw the curve in a particular color: 'r'
for red. There are many options we can use to plot; to see more, check out the documentation for plot.
We can also label the axes, give the figure a title, and provide a legend,
To futher edit this plot, you might decide - for example - that the font size for the labels is too small. We can change the default with:
To generate a single Gaussian random number in Python, use the function in the NumPy random
module.
Let’s generate a vector of 1000 Gaussian random numbers:
… and look at a histogram of the vector:
Q. Does this histogram make sense? Is it what you expect for a distribution of Gaussian random variables?
See Python Help (hist?
) to learn about the function hist()
.
Sometimes we’ll want to repeat the same command over and over again. For example, what if we want to plot sin(x + k*pi/4)
where k
varies from 1 to 5 in steps of 1; how do we do it? Consider the following:
That’s horrible code! All I did was cut and paste the same thing four times. As a general rule, if you’re repeatedly cutting and pasting in code, what you’re doing is inefficient and typically error prone. There’s a much more elegant way to do this, and it involves making a for
loop. Consider:
Now let’s declare a for
loop where k
successively takes the values 1, then 2, then 3, …, up to 5. Note, any code we want to execute as part of the loop must be indented one level. The first line of code that is not indented, in this case show()
below, executes after the for loop completes
The small section of code above replaces all the cutting-and-pasting. Instead of cutting and pasting, we update the definition of y
with different values of k
and plot it within this for-loop.
Q. Spend some time studying this for-loop. Does it make sense?
Important: Python uses indentation to define for
loops.
We’ve spent some time in this notebook writing and executing code. Sometimes we’ll need to write our own Python functions. Let’s do that now.
Our function will do something very simple: it will take as input a vector and return as output the vector elements squared plus an additive constant.
If have a vector, v
, and a constant, b
, we would like to call:
vsq = my_square_function(v, b)
This won’t work! We first need to define my_square_function
. Let’s do so now,
The function begins with the keyword def
followed by the function name and the inputs in parentheses. Notice that this first line ends with a colon :
. All of the function components that follow this first line should be indented one level. This is just like the for
loop we applied earlier; the operations performed by the for loop were indented one leve.
When defining the function, the code the function executes should be indented one level.
The text inside triple quotes provides an optional documentation string that describes our function. While optional, including a ‘doc string’ is an important part of making your code understandable and reuseable.
The keyword return
exits the function, and in this case returns the expression x * x + c
. Note that a return statement with no arguments returns None
, indicating the absence of a value.
With the function defined, let’s now call it. To do so we first define the inputs, and then run the function, as follows:
Q. Try to make a function, my_power, so that y = power(x,n)
evaluates \(y = x^n\), (in Python you can use x**n
to take the power)
For our last example let’s load a data file on the web in the .csv
format into Python.
To do so, let’s first import the pandas
module,
Now, let’s load a data file using the function read_csv
,
The variable df
that holds the loaded data is a Python DataFrame. We can think of it as a simple table that holds our data.
Let’s print it,
We see that the columns in the dataframe consist of two variables: d
and t
. Our collaborator who provided the data tells us that these correspond to the voltage recording (d
) and a time axis (t
) for her data.
Let’s define variables to hold the data corresponding to each key,
Here we convert the data in each column to a numpy array, because we’d (probably) like numpy to function on these values.
Now, let’s plot the LFP data versus the time axis,
Contributed by @mateouma
Let’s do some statistics. First, our standard imports.
With numpy, we can find the mean and standard deviation of our LFP data.
Now, let’s use numpy to randomly generate numbers according to a normal distribution with the same mean and standard deviation as the LFP data. The syntax is np.random.normal(mean, sd, size)
, where mean
, sd
, and size
are variables or numbers.
We can use a histogram to compare the distribution of the data with a normal distribution with the same mean and standard deviation.
As we can see, our data doesn’t look normally distributed, but in practice we should use a statistical test to make this assessment.