Basic Python Datatypes#

Python is a dynamically typed language – this means that you don’t need to specify ahead of time what kind of data you are going to store in a variable. Nevertheless, there are some core datatypes that we need to become familiar with as we use the language.

The first set of datatypes are similar to those found in other languages (like C/C++ and Fortran): floating point numbers, integers, and strings.

Floating point is essential for computational science. A great introduction to floating point and its limitations is: What every computer scientist should know about floating-point arithmetic by D. Goldberg.

The next set of datatypes are containers. In python, unlike some languages, these are built into the language and make it very easy to do complex operations. We’ll look at these later.

Some examples come from the python tutorial: http://docs.python.org/3/tutorial/

integers#

Integers are numbers without a decimal point. They can be positive or negative. Most programming languages use a finite-amount of memory to store a single integer, but in python will expand the amount of memory as necessary to store large integers.

The basic operators, +, -, *, and / work with integers

2+2+3
7
2*-4
-8

Note: integer division is one place where python 2 and python 3 different

In python 3.x, dividing 2 integers results in a float. In python 2.x, dividing 2 integers results in an integer. The latter is consistent with many strongly-typed programming languages (like Fortran or C), since the data-type of the result is the same as the inputs, but the former is more inline with our expectations

1/2
0.5

To get an integer result, we can use the // operator.

1//2
0

Python is a dynamically-typed language—this means that we do not need to declare the datatype of a variable before initializing it.

Here we’ll create a variable (think of it as a descriptive label that can refer to some piece of data). The = operator assigns a value to a variable.

a = 1
b = 2

Functions operate on variables and return a result. Here, print() will output to the screen.

a + b
3
a * b
2

Note that variable names are case sensitive, so a and A are different

A = 2048
print(a, A)
1 2048

Here we initialize 3 variable all to 0, but these are still distinct variables, so we can change one without affecting the others.

x = y = z = 0
print(x, y, z)
0 0 0
z = 1
z
1

Python has some built in help (and Jupyter/ipython has even more)

try doing:

help(x)

alternatively, try:

x?

(this only works in Jupyter)

Another function, type() returns the data type of a variable

type(x)
int

Note in languages like Fortran and C, you specify the amount of memory an integer can take (usually 2 or 4 bytes). This puts a restriction on the largest size integer that can be represented. Python will adapt the size of the integer so you don’t overflow

a = 12345678901234567890123456789012345123456789012345678901234567890
print(a)
print(a.bit_length())
print(type(a))
12345678901234567890123456789012345123456789012345678901234567890
213
<class 'int'>

floating point#

when operating with both floating point and integers, the result is promoted to a float.

1. + 2
3.0

but note the special integer division operator

1.//2
0.0

It is important to understand that since there are infinitely many real numbers between any two bounds, on a computer we have to approximate this by a finite number. There is an IEEE standard for floating point that pretty much all languages and processors follow.

The means two things

  • not every real number will have an exact representation in floating point

  • there is a finite precision to numbers – below this we lose track of differences (this is usually called roundoff error)

On our course website, I posted a link to a paper, What every computer scientist should know about floating-point arithmetic – this is a great reference on understanding how a computer stores numbers.

Consider the following expression, for example:

0.3/0.1 - 3
-4.440892098500626e-16

Here’s another example: The number 0.1 cannot be exactly represented on a computer. In our print, we use a format specifier (the stuff inside of the {}) to ask for more precision to be shown:

a = 0.1
print("{:30.20}".format(a))
        0.10000000000000000555

we can ask python to report the limits on floating point

import sys
sys.float_info
sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

Note that this says that we can only store numbers between 2.2250738585072014e-308 and 1.7976931348623157e+308

We also see that the precision is 2.220446049250313e-16 (this is commonly called machine epsilon). To see this, consider adding a small number to 1.0. We’ll use the equality operator (==) to test if two numbers are equal:

Quick Exercise

Define two variables, \(a = 1\), and \(e = 10^{-16}\).

Now define a third variable, b = a + e

We can use the python == operator to test for equality. What do you expect b == a to return? run it an see if it agrees with your guess.

modules#

The core python language is extended by a standard library that provides additional functionality. These added pieces are in the form of modules that we can import into our python session (or program).

The math module provides functions that do the basic mathematical operations as well as provide constants (note there is a separate cmath module for complex numbers).

In python, you import a module. The functions are then defined in a separate namespace—this is a separate region that defines names and variables, etc. A variable in one namespace can have the same name as a variable in a different namespace, and they don’t clash. You use the “.” operator to access a member of a namespace.

By default, when you type stuff into the python interpreter or here in the Jupyter notebook, or in a script, it is in its own default namespace, and you don’t need to prefix any of the variables with a namespace indicator.

import math

math provides the value of pi

math.pi
3.141592653589793

This is distinct from any variable pi we might define here

pi = 3
print(pi, math.pi)
3 3.141592653589793

Note here that pi and math.pi are distinct from one another—they are in different namespaces.

floating point operations#

The same operators, +, -, *, / work are usual for floating point numbers. To raise an number to a power, we use the ** operator (this is the same as Fortran)

R = 2.0
math.pi * R**2
12.566370614359172

operator precedence follows that of most languages. See

https://docs.python.org/3/reference/expressions.html#operator-precedence

in order of precedence:

  • quantites in ()

  • slicing, calls, subscripts

  • exponentiation (**)

  • +x, -x, ~x

  • *, @, /, //, %

  • +, -

(after this are bitwise operations and comparisons)

Parentheses can be used to override the precedence.

Quick Exercise

Consider the following expressions. Using the ideas of precedence, think about what value will result, then try it out in the cell below to see if you were right.

  • 1 + 3*2**2

  • 1 + (3*2)**2

  • 2**3**2

The math module provides a lot of the standard math functions we might want to use.

For the trig functions, the expectation is that the argument to the function is in radians—you can use math.radians() to convert from degrees to radians, ex:

math.cos(math.radians(45))
0.7071067811865476

Notice that in that statement we are feeding the output of one function (math.radians()) into a second function, math.cos()

When in doubt, as for help to discover all of the things a module provides:

help(math.sin)
Help on built-in function sin in module math:

sin(x, /)
    Return the sine of x (measured in radians).

complex numbers#

python uses ‘j’ to denote the imaginary unit

1.0 + 2j
(1+2j)
a = 1j
b = 3.0 + 2.0j
print(a + b)
print(a * b)
(3+3j)
(-2+3j)

we can use abs() to get the magnitude and separately get the real or imaginary parts

print("magnitude: ", abs(b))
print("real part: ", a.real)
print("imag part: ", a.imag)
magnitude:  3.605551275463989
real part:  0.0
imag part:  1.0

strings#

python doesn’t care if you use single or double quotes for strings:

a = "this is my string"
b = 'another string'
print(a)
print(b)
this is my string
another string

Many of the usual mathematical operators are defined for strings as well. For example to concatenate or duplicate:

a + b
'this is my stringanother string'
a + ". " + b
'this is my string. another string'
a * 2
'this is my stringthis is my string'

There are several escape codes that are interpreted in strings. These start with a backwards-slash, \. E.g., you can use \n for new line

a = a + "\n"
print(a)
this is my string

Quick Exercise

The input() function can be used to ask the user for input.

  • Use help(input) to see how it works.

  • Write code to ask for input and store the result in a variable. input() will return a string.

  • Use the float() function to convert a number entered as input to a floating point variable.

  • Check to see if the conversion worked using the type() function.

“”” can enclose multiline strings. This is useful for docstrings at the start of functions (more on that later…)

c = """
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor 
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis 
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. 
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore 
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt 
in culpa qui officia deserunt mollit anim id est laborum."""
print(c)
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor 
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis 
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. 
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore 
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt 
in culpa qui officia deserunt mollit anim id est laborum.

a raw string does not replace escape sequences (like \n). Just put a r before the first quote:

d = r"this is a raw string\n"
d
'this is a raw string\\n'

slicing is used to access a portion of a string.

slicing a string can seem a bit counterintuitive if you are coming from Fortran. The trick is to think of the index as representing the left edge of a character in the string. When we do arrays later, the same will apply.

Also note that python (like C) uses 0-based indexing

Negative indices count from the right.

a = "this is my string"
print(a)
print(a[5:7])
print(a[0])
print(d)
print(d[-2])
this is my string
is
t
this is a raw string\n
\

Quick Exercise:

Strings have a lot of methods (functions that know how to work with a particular datatype, in this case strings). A useful method is .find(). For a string a, a.find(s) will return the index of the first occurrence of s.

For our string c above, find the first . (identifying the first full sentence), and print out just the first sentence in c using this result

there are also a number of methods and functions that work with strings. Here are some examples:

print(a.replace("this", "that"))
print(len(a))
print(a.strip())    # Also notice that strip removes the \n
print(a.strip()[-1])
that is my string
17
this is my string
g

Note that our original string, a, has not changed. In python, strings are immutable. Operations on strings return a new string.

a
'this is my string'
type(a)
str

As usual, ask for help to learn more:

#help(str)

We can format strings when we are printing to insert quantities in particular places in the string. A {} serves as a placeholder for a quantity and is replaced using the .format() method:

a = 1
b = 2.0
c = "test"
print("a = {}; b = {}; c = {}".format(a, b, c))
a = 1; b = 2.0; c = test

But the more modern way to do this is to use f-strings

print(f"a = {a}; b = {b}; c = {c}")
a = 1; b = 2.0; c = test

Note the f preceding the starting "