# Create a string seq
= 'ATGAAGGGTCC'
seq # Call the function len() to retrieve the length of the string
= len(seq)
size # Call the function print() to print a text
print('The sequence has', size, 'bases.')
The sequence has 11 bases.
At the end of this class, you will:
You need to have a computer, and either:
An IDE (integrated development environment) is an improved text editor. It is a software that provides functionalities like syntax highlighting, auto completion, help, debugger… For example Visual Studio Code (install and learn how to use it with Python), but any other IDE will work.
vgilbart/python-intro
to copy from. This is a free solution up to 60 hours of computing and 15 GB per month.Python is a programming language first released in 1991 and implemented by Guido van Rossum.
It is widely used, with various applications, such as:
It supports different types of programming paradigms (i.e. way of thinking) including the procedural programming paradigm. In this approach, the program moves through a linear series of instructions.
# Create a string seq
= 'ATGAAGGGTCC'
seq # Call the function len() to retrieve the length of the string
= len(seq)
size # Call the function print() to print a text
print('The sequence has', size, 'bases.')
The sequence has 11 bases.
numpy
for tables/matrices, matplotlib
for graphs, scikit-learn
for machine learning…)Python is an interpreted language, this means that all scripts written in Python need a software to be run. This software is called an interpreter, which “translate” each line of the code, into instructions that the computer can understand. By extension, the interpreter that is able to read Python scripts is also called Python. So, whenever you want your Python code to run, you give it to the Python interpreter.
One way to launch the Python interpreter is to type the following, on the command line of a terminal:
python3
You can also try python
, /usr/bin/env python3
, /usr/bin/python3
… There are many ways to call python!
You can see where your current python is located by running which python3
.
From this, you can start using python interactively, e.g. run:
print("Hello world")
Hello world
To get out of the Python interpreter, type quit()
or exit()
, followed by enter
. Alternatively, on Linux/Mac press [ctrl + d]
, on Windows press [ctrl + z]
.
To run a script, create a folder named script
, in which a file named intro.py
contains:
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
print("Hello world")
and run
./script/intro.py
You should get the same output as before, that is:
Hello world
The shebang #!
followed by the interpreter /usr/bin/env python3
can be put at the beginning of the script in order to ommit calling python3
in command-line. If you don’t put it, you will have to run python3 script/intro.py
instead of simply ./script/intro.py
.
The -*- coding: UTF-8 -*-
specify the type of encoding to use. UTF-8 is used by default (which means that this line in the script is not necessary). This accepts characters from all languages. Other valid encoding are available, such as ascii (English characters only).
Some common errors can occur at this step:
bash: script/intro.py: No such file or directory
i.e. you are not in the right directory to run the file.
Solution: run ls */
and make sure you can find script/: intro.py
, if not go to the correct directory by running cd <insert directory name here>
bash: script/intro.py: Permission denied
i.e. you don’t have the right to execute your script.
Solution: run ls -l script/intro.py
and make sure you have at least -rwx
(read, write, exectute rights) as the first 4 characters, if not run chmod 744 script/intro.py
to change your rights.
You will manipulate values such as integers, characters or dictionaries. These values can be stored in memory using variables. To assign a value to a variable, use the =
operator as follow:
= 'ATGAAGGGTCC' seq
To output the variable value, either type the variable name or use a function like print()
:
seq
'ATGAAGGGTCC'
print(seq)
ATGAAGGGTCC
We can change a variable value by assigning it a new one:
= seq + 'AAAA' # The + operator can be used to concatenate strings
seq seq
'ATGAAGGGTCCAAAA'
A variable can have a short name (like x and y) or a more descriptive name (seq, motif, genome_file). Rules for Python variable names:
help('keywords')
to find the list of keywords).Are the following variables names legal?
2_sequences
_sequence
seq-2
seq 2
You can try to assign a value to these variable names to be sure of your answer!
A function stores a piece of code that performs a certain task, and that gets run when called. It takes some data as input (parameters that are required or optional), and returns an output (that can be of any type). Some functions are predefined (but we will also learn how to create our own later on).
To run a function, write its name followed by parenthesis. Parameters are added inside the parenthesis as follow:
# round(number, ndigits=None)
= round(number = 5.76543, ndigits = 2)
x print(x)
5.77
Here the function round()
needs as input a numerical value. As an option, one can add the number of decimal places to be used with digits. If an option is not provided, a default value is given. In the case of the option ndigits, None
is the default. The function returns a numerical value, that corresponds to the rounded value. This value, just like any other, can be stored in a variable.
To get more information about a function, use the help()
function.
If you provide the parameters in the exact same order as they are defined, you don’t have to name them. If you name the parameters you can switch their order. As good practice, put all required parameters first.
round(5.76543, 2)
5.77
round(ndigits = 2, number = 5.76543)
5.77
In Table 6.1 you will find some basic but useful python functions:
Function | Description |
---|---|
print() |
Print into the screen the values given in argument. |
help() |
Execute the built-in help system |
quit() or exit() |
Exit from Python |
len() |
Return the length of an object |
round() |
Round a numbers |
In python, you will also hear about methods. This vocabulary belongs to a programming paradigm called “Object-oriented programming” (OOP).
A method is a function that belongs to a specific class of objects. It is defined within a class and operates only on objects from that class. Methods can access and modify the object’s state.
To get more information about a function or an operator, you can use the help()
function. For example, in interactive mode, run help(print)
to display the help of the print()
function, giving you information about the input and output of this function. If you need information about an operator, you will have to put it into quotes, e.g. help('+')
If the help is long, press [enter]
to get the next line or [space]
to get the next ‘page’ of information.
To quit the help, press q
.
Each programming language has its own set of data types, from the most basics (bool
, int
, string
) to more complex structures (list
, tuple
, set
…).
Booleans represent one of two values: True
or False
.
When you compare two values, the expression is evaluated and Python returns the Boolean answer:
print(10 > 9)
True
Python provides three kinds of numerical type:
int
(\(\mathbb{Z}\)), integersfloat
(\(\mathbb{R}\)), real numberscomplex
(\(\mathbb{C}\)), complex numbersPython will assign a numerical type automatically.
= 1
x = 2.8
y = 1j + 2 # j is the convention in electrical engineering z
type(x)
int
type(y)
float
type(z)
complex
String type represents textual data composed of letters, numbers, and symbols. The character string must be expressed between quotes.
"""my string"""
'''my string'''
"my string"
'my string'
are all the same thing. The difference with triple quotes is that it allows a string to extend over multiple lines. You can also use single quotes and double quotes freely within the triple quotes.
# A multi-line string
= '''This is a multi-line string. This is the first line.
my_str This is the second line.
"What's your name?," I asked.
He said "Bond, James Bond."
'''
print(my_str)
This is a multi-line string. This is the first line.
This is the second line.
"What's your name?," I asked.
He said "Bond, James Bond."
You can get the number of characters inside a string with len()
.
print(seq)
len(seq)
ATGAAGGGTCCAAAA
15
Strings have specific methods (i.e. functions specific to this class of object). Here are a few:
Method | Description |
---|---|
.count() |
Returns the number of times a specified value occurs in a string |
.startswith() |
Returns true if the string starts with the specified value |
.endswith() |
Returns true if the string ends with the specified value |
.find() |
Searches the string for a specified value and returns the position of where it was found |
.replace() |
Returns a string where a specified value is replaced with a specified value |
They are called like this:
'A') seq.count(
7
To get the help()
of the .count()
method, you need to run help(str.count)
.
seq
starts with the codon ATG
T
into U
in seq
Data structures are a collection of data types and/or data structures, organized in some way.
List is a collection which is ordered and changeable. It allows duplicate members. They are created using square brackets []
.
= ['ATGAAGGGTCCAAAA', 'AGTCCCCGTATGAT', 'ACCT', 'ACCT'] seq
List items are indexed, the first item has index [0]
, the second item has index [1]
etc.
1] seq[
'AGTCCCCGTATGAT'
You can count backwards, with the index [-1]
that retrieves the last item.
As a list is changeable, we can change, add, and remove items in a list after it has been created.
1] = 'ATG'
seq[ seq
['ATGAAGGGTCCAAAA', 'ATG', 'ACCT', 'ACCT']
You can specify a range of indexes by specifying the start (included) and the end (not included) of the range.
0:2] seq[
['ATGAAGGGTCCAAAA', 'ATG']
By leaving out the start value, the range will start at the first item:
2] seq[:
['ATGAAGGGTCCAAAA', 'ATG']
Similarly, by leaving out the end value, the range will end at the last item.
Indexes also conveniently work on str
types.
print(seq[0])
print(seq[0][0:5])
print(seq[0][2])
print(seq[0][-1])
ATGAAGGGTCCAAAA
ATGAA
G
A
You can get how many items are in a list with len()
.
len(seq)
4
Lists have specific methods. Here are a few:
Method | Description |
---|---|
.append() |
Inserts an item at the end |
.insert() |
Inserts an item at the specified index |
.extend() |
Append elements from another list to the current list |
.remove() |
Removes the first occurance of a specified item |
.pop() |
Removes the specified (by default last) index |
l = ['AAA', 'AAT', 'AAC']
, and add AAG
at the end, using .append()
.T
into U
in the element AAT
, using .replace()
.Tuple is a collection which is ordered and unchangeable. It allows duplicate members. Tuples are written with round brackets ()
.
= ('Y', 'Tyr', 'Tyrosine') my_favorite_amino_acid
Just like for the list, you can get items with their index. The only difference is that you cannot change a tuple that has been created.
Tuples have specific methods. Here are a few:
Method | Description |
---|---|
.count() |
Returns the number of times a specified value occurs |
.index() |
Searches for a specified value and returns the position of where it was found |
Try to change the value of the first element of my_favorite_amino_acid
and see what happens.
Set is a collection which is unordered and unindexed. It does not allow duplicate members (they will be ignored). Sets are written with curly brackets {}
.
= {'BRCA1', 'TP53', 'EGFR', 'MYC'} seq
Once a set is created, you cannot change its items directly (as they don’t have index), but you modify the set by removing and adding items.
Sets have specific methods. Here are a few:
Method | Description |
---|---|
.add() |
Adds an element to the set |
.difference() |
Returns a set containing the difference between two sets |
.intersection() |
Returns a set containing the intersection between two sets |
.union() |
Returns a set containing the union of two sets |
.remove() |
Remove the specified item |
.pop() |
Removes a random element |
Get the common genes between the following sets:
= {'BRCA1', 'TP53', 'EGFR', 'MYC'}
organism1_genes = {'TP53', 'MYC', 'KRAS', 'BRAF'} organism2_genes
Dictionaries are used to store data values in key: value
pairs. A dictionary is a collection which is ordered (as of Python >= 3.7), changeable and does not allow duplicates keys. Dictionaries are written with curly brackets {}
, with keys and values.
= {
organism1_genes #key: value;
'BRCA1': 'DNA repair',
'TP53': 'Tumor suppressor',
'EGFR': 'Cell growth',
'MYC': 'Regulation of gene expression'
}
Dictionary items can be referred to by using the key name.
"BRCA1"] organism1_genes[
'DNA repair'
Dictionaries have specific methods. Here are a few:
Method | Description |
---|---|
.items() |
Returns a list containing a tuple for each key value pair |
.keys() |
Returns a list containing the dictionary’s keys |
.values() |
Returns a list of all the values in the dictionary |
.pop() |
Removes the element with the specified key |
.get() |
Returns the value of the specified key |
From the dictionary organism1_genes
created as example, get the value of the key BRCA1
. If the key does not exist, return Unknown
by default. Try your code before and after removing the BRCA1
key:value pair.
Check the help of get
by running help(dict.get)
.
You can get the data type of any object by using the function type()
. You can (more or less easily) convert between data types.
Function | Description |
---|---|
bool() |
Convert to boolean type |
int() , float() |
Convert between integer or float types |
complex() |
Convert to complex type |
str() |
Convert to string type |
list() , tuple() , set() |
Convert between list, tuple, and set types |
dict() |
Convert a tuple of order (key, value) into a dictionary type |
bool(1)
True
int(5.8)
5
str(1)
'1'
list({1, 2, 3})
[1, 2, 3]
set([1, 2, 3, 3])
{1, 2, 3}
dict((('a', 1),
'f', 2),
('g', 3))) (
{'a': 1, 'f': 2, 'g': 3}
In the previous section we have learned how data can be represented in different types and gathered in various data structures. In this section we will see how we can manipulate data in order to do more complex tasks.
Operators are used to perform operations on variables and values. We will present a few common ones here.
Arithmetic operators are used with numeric values to perform common mathematical operations:
Operator | Name |
---|---|
+ |
Addition |
- |
Substraction |
* |
Multiplication |
/ |
Division |
** |
Power |
Do not use the ^
operator to raise to a power. That is actually the operator for bitwise XOR, which we will not cover.
Python will convert data type according to what is necessary. Thus, when you divide two int
you will obtain a float
number, if you add a float
to an int
, you will get a float
, …
# Example
2/10
0.2
+
also conveniently work on str
types.
'AC' + 'AT'
'ACAT'
Assignment operators are used to assign values to variables:
Operator | Example as | Same as |
---|---|---|
= |
x = 5 |
x = 5 |
+= |
x += 5 |
x = x + 5 |
-= |
x -= 5 |
x = x - 5 |
The same principle applies to multiplication, division and power, but are less commonly used.
Comparison operators are used to compare two values:
Operator | Name |
---|---|
== |
Equal |
!= |
Not equal |
> |
Greater than |
>= |
Greater than or equal to |
< |
Less than |
<= |
Less than or equal to |
# Example
2 == 1 + 1
True
You should never use equalty operators (==
or !=
) with floats or complex values.
# Example
2.1 + 3.2 == 5.3
False
This is a floating point arithmetic problem seen in other programming languages. It is due to the difficulty of having a fixed number of binary digits (bits) to accurately represent some decimal number. This leads to small rounding errors in calculations.
2.1 + 3.2
5.300000000000001
If you need to use equalty operators, do it with a degree of freedom:
= 1e-6 ; abs((2.1 + 3.2) - 5.3) < tol tol
True
Logical operators are used to combine conditional statements:
Operator | Description |
---|---|
and |
Returns True if both statements are true |
or |
Returns True if one of the statements is true |
not |
Reverse the result, returns False if the result is true |
# Example
False and False, False and True, True and False, True and True
(False, False, False, True)
# Example
False or False, False or True, True or False, True or True
(False, True, True, True)
# Example
True or not True
True
Operator | Description |
---|---|
in |
Returns True if a sequence with the specified value is present in the object |
not in |
Returns True if a sequence with the specified value is not present in the object |
# Example
'ACCT' in seq
False
Operator precedence describes the order in which operations are performed.
The precedence order is described in the table below, starting with the highest precedence at the top:
Operator | Description |
---|---|
() |
Parenthesis |
** |
Power |
* / |
Multiplication, division |
+ - |
Addition, substraction |
== ,!= ,> ,>= ,< ,<= ,is ,is not ,in ,not in , |
Comparisons, identity, and membership operators |
not |
Logical NOT |
and |
AND |
or |
OR |
If two operators have the same precedence, the expression is evaluated from left to right.
Try to guess what will output the following expressions:
1+1 == 2 and "actg" == "ACTG"
True or False and True and False
"Homo sapiens" == "Homo" + "sapiens"
'Tumor suppressor' in organism1_genes
Verify with Python.
Conditionals allows you to make decisions in your code based on certain conditions.
if something is true:
do task a
otherwise:
do task b
The comparison (==
, !=
, >
, >=
, <
, <=
), logical (and
, or
, not
) and membership (in
, not in
) operators can be used as conditions.
In Python, this is written with an if ... elif ... else
statement like so:
# Define gene expression levels
= 100
gene1_expression = 50
gene2_expression
# Analyze gene expression levels
if gene1_expression > gene2_expression:
print("Gene 1 has higher expression level.")
elif gene1_expression < gene2_expression:
print("Gene 2 has higher expression level.")
else:
print("Gene 1 and Gene 2 have the same expression level.")
Gene 1 has higher expression level.
The elif
keyword is Python’s way of saying “if the previous conditions were not true, then try this condition”. The following code is equivalent to the one before:
# Analyze gene expression levels
if gene1_expression > gene2_expression:
print("Gene 1 has higher expression level.")
else:
if gene1_expression < gene2_expression:
print("Gene 2 has higher expression level.")
else:
print("Gene 1 and Gene 2 have the same expression level.")
Gene 1 has higher expression level.
Are these two codes equivalent?
# Code A
if "ATG" in dna_sequence:
print("Start codon found.")
elif "TAG" in dna_sequence:
print("Stop codon found.")
else:
print("No interesting codon not found.")
# Code B
if "ATG" in dna_sequence:
print("Start codon found.")
if "TAG" in dna_sequence:
print("Stop codon found.")
else:
print("No interesting codon not found.")
An if
statement cannot be empty, but if for some reason you have an if statement with no content, put in the pass
statement to avoid getting an error.
= 33
a = 200
b
if b > a:
pass
Python relies on indentation (the spaces at the beginning of the lines).
Indentation is not just for readability. In Python, you use spaces or tabs to indent code blocks. Python uses it to determine the scope of functions, loops, conditional statements, and classes.
Any code that is at the same level of indentation is considered part of the same block. Blocks of code are typically defined by starting a line with a colon (:
) and then indenting the following lines.
When you have nested structures like a conditional statement inside another conditional statement, you must further to show the hierarchy. Each level of indentation represents a deeper level of nesting.
It’s essential to be consistent with your indentation throughout your code. Mixing tabs and spaces can lead to errors, so it’s recommended to choose one and stick with it.
Here are three codes, they all are incorrect, can you tell why?
Of course, you can run them and read the error that Python gives!
= ["MET", "ARG", "THR", "GLY"]
amino_acid_list
if "MET" in amino_acid_list:
print("Start codon found.")
if "GLY" in amino_acid_list:
print("Glycine found.")
else:
print("Start codon not found.")
= "ATGCTAGCTAGCTAG"
dna_sequence
if "ATG" in dna_sequence:
print("Start codon found.")
if "TAG" in dna_sequence
print("Stop codon found.")
= 7
x
if x > 5:
print("x is greater than 5")
if y > 10:
print("x is greater than 10")
elif y = 10:
print("x equals 10")
else:
print("x is less than 10")
Iteration involves repeating a set of instructions or a block of code multiple times.
There are two types of loops in python, for
and while
.
Iterating through data structures like lists allows you to access each element individually, making it easier to perform operations on them.
When using a for loop, you iterate over a sequence of elements, such as a list, tuple, or dictionary.
for item in data_structure:
do task a
The loop will execute the indented block of code for each element in the sequence until all elements have been processed. This is particularly useful when you know the number of times you need to iterate.
= [
all_codons 'AAA', 'AAC', 'AAG', 'AAT',
'ACA', 'ACC', 'ACG', 'ACT',
'AGA', 'AGC', 'AGG', 'AGT',
'ATA', 'ATC', 'ATG', 'ATT',
'CAA', 'CAC', 'CAG', 'CAT',
'CCA', 'CCC', 'CCG', 'CCT',
'CGA', 'CGC', 'CGG', 'CGT',
'CTA', 'CTC', 'CTG', 'CTT',
'GAA', 'GAC', 'GAG', 'GAT',
'GCA', 'GCC', 'GCG', 'GCT',
'GGA', 'GGC', 'GGG', 'GGT',
'GTA', 'GTC', 'GTG', 'GTT',
'TAA', 'TAC', 'TAG', 'TAT',
'TCA', 'TCC', 'TCG', 'TCT',
'TGA', 'TGC', 'TGG', 'TGT',
'TTA', 'TTC', 'TTG', 'TTT'
]
= 0
count for codon in all_codons:
if codon[1] == 'T':
+= 1
count
print(count, 'codons have a T as a second nucleotide.')
16 codons have a T as a second nucleotide.
What it does is the following: it processes each element in the list all_codons
, called in the following code codon
. If the codon
has as a second character a T
, it adds 1 to a counter (the variable called count
).
You cannot modify an element of a list that way.
for codon in all_codons:
if 'T' in codon :
= codon.replace('T', 'U')
codon
print(all_codons)
['AAA', 'AAC', 'AAG', 'AAT', 'ACA', 'ACC', 'ACG', 'ACT', 'AGA', 'AGC', 'AGG', 'AGT', 'ATA', 'ATC', 'ATG', 'ATT', 'CAA', 'CAC', 'CAG', 'CAT', 'CCA', 'CCC', 'CCG', 'CCT', 'CGA', 'CGC', 'CGG', 'CGT', 'CTA', 'CTC', 'CTG', 'CTT', 'GAA', 'GAC', 'GAG', 'GAT', 'GCA', 'GCC', 'GCG', 'GCT', 'GGA', 'GGC', 'GGG', 'GGT', 'GTA', 'GTC', 'GTG', 'GTT', 'TAA', 'TAC', 'TAG', 'TAT', 'TCA', 'TCC', 'TCG', 'TCT', 'TGA', 'TGC', 'TGG', 'TGT', 'TTA', 'TTC', 'TTG', 'TTT']
This is because all_codons
was converted to an iterator in the for
statement.
An iterator is a special object that gives values in succession.
In the previous example, the iterator returns a copy of the item in a list, not a reference to it. Therefore, the codon
inside the for
block is not a view into the original list, and changing it does not do anything.
A way to modify the list would be to use an iterable to access the original data. The range(start, stop)
function creates an iterable to count from one integer to another.
for i in range(2, 10):
print(i, end=' ')
2 3 4 5 6 7 8 9
We could count from 0 to the size of the list, loop though every element of the list by calling them by their index, and modify them if necessary. That’s what the following code does:
for i in range(0, len(all_codons)):
if 'T' in all_codons[i] :
= all_codons[i].replace('T', 'U')
all_codons[i]
print(all_codons)
['AAA', 'AAC', 'AAG', 'AAU', 'ACA', 'ACC', 'ACG', 'ACU', 'AGA', 'AGC', 'AGG', 'AGU', 'AUA', 'AUC', 'AUG', 'AUU', 'CAA', 'CAC', 'CAG', 'CAU', 'CCA', 'CCC', 'CCG', 'CCU', 'CGA', 'CGC', 'CGG', 'CGU', 'CUA', 'CUC', 'CUG', 'CUU', 'GAA', 'GAC', 'GAG', 'GAU', 'GCA', 'GCC', 'GCG', 'GCU', 'GGA', 'GGC', 'GGG', 'GGU', 'GUA', 'GUC', 'GUG', 'GUU', 'UAA', 'UAC', 'UAG', 'UAU', 'UCA', 'UCC', 'UCG', 'UCU', 'UGA', 'UGC', 'UGG', 'UGU', 'UUA', 'UUC', 'UUG', 'UUU']
Another useful function that returns an iterator is enumerate()
. It is an iterator that generates pairs of index and value. It is commonly used when you need to access both the index and value of items simultaneously.
= 'ATGCATGC'
seq
# Print index and identity of bases
for i, base in enumerate(seq):
print(i, base)
0 A
1 T
2 G
3 C
4 A
5 T
6 G
7 C
# Loop through sequence and print index of G's
for i, base in enumerate(seq):
if base in 'G':
print(i, end=' ')
2 6
A while loop continues executing a set of statement as long as a condition is true.
while condition is true:
do task a
This type of loop is handy when you’re not sure how many iterations you’ll need to perform or when you need to repeat a block of code until a certain condition is met.
= 'TACTCTGTCGATCGTACGTATGCAAGCTGATGCATGATTGACTTCAGTATCGAGCGCAGCA'
seq = 'ATG'
start_codon
# Initialize sequence index
= 0
i # Scan sequence until we hit the start codon
while seq[i:i+3] != start_codon:
+= 1
i
# Show the result
print('The start codon begins at index', i)
The start codon begins at index 19
Remember to increment i, or you’ll get stuck in a loop.
Actually, the previous code is quite dangerous. You can also get stuck in a loop… if the start_codon
does not appear in seq
at all.
Indeed, even when you go above the given length of seq
, the condition seq[i:i+3] != start_codon
will still be true because seq[i:i+3]
will output an empty string.
9999:9999+3] seq[
''
So, once the end of the sequence is reached, the condition seq[i:i+3] != start_codon
will always be true, and you’ll get stuck in an infinite loop.
To get interrupt a process, press [ctrl + c]
.
Iteration stops in a for
loop when the iterator is exhausted. It stops in a while
loop when the conditional evaluates to False
. There is another way to stop iteration: the break
keyword. Whenever break
is encountered in a for
or while
loop, the iteration stops and execution continues outside the loop.
= 'ACCATTTTTTGGGGGGGCGGGGGGAGGGGGGG'
seq = 'ATG'
start_codon
# Initialize sequence index
= 0
i # Scan sequence until we hit the start codon
while seq[i:i+3] != start_codon:
+= 1
i if i+3 > len(seq): # Get out of the loop if we parsed the full seq
print('Codon not found in sequence.')
break
else:
print('The start codon starts at index', i)
Codon not found in sequence.
Also, note that the else
statement can be used in for
and while
loops. In for
loops it is executed when the loop is finished. In while
loops, it is executed when the condition is no longer true. In both case, the loops need to not encounter a break
to enter in the else
block.
In addition to the break statement, there is also the continue statement in Python that can be used to alter the flow of iteration in loops. When continue is encountered within a loop, it skips the remaining code inside the loop for the current iteration and moves on to the next iteration.
Here’s an example showcasing the continue statement in a loop:
# List of DNA sequences
= ['ATGCTAGCTAG', 'ATCGATCGATC', 'ATGGCTAGCTA', 'ATGTAGCTAGC']
dna_sequences
# Find sequences starting with a start codon
for sequence in dna_sequences:
if sequence[:3] != 'ATG': # Check if the sequence does not start with a start codon
print(f"Sequence '{sequence}' does not start with a start codon. Skipping analysis.")
continue # Skip further analysis for this sequence
print(f"Analyzing sequence '{sequence}' for protein coding regions...")
# Additional analysis code here
else:
print('All sequences were processed.')
Analyzing sequence 'ATGCTAGCTAG' for protein coding regions...
Sequence 'ATCGATCGATC' does not start with a start codon. Skipping analysis.
Analyzing sequence 'ATGGCTAGCTA' for protein coding regions...
Analyzing sequence 'ATGTAGCTAGC' for protein coding regions...
All sequences were processed.
The continue statement in this example skips the analysis code for sequence that does not start with a start codon.
Given a list of DNA sequences, find the first sequence that contains a specific motif 'TATA'
, print the sequence, and stop the process. If no sequence contains the motif, print a message accordingly. You must use only one for
loop.
With the input given below, the output should look like this:
# List of DNA sequences with a TATA
= [
dna_sequences 'ATGCTACAGCTAG',
'ATCGATATAATC', # TATA
'ATGGCTAGCTA',
'ATGTAGCTAGC',
'ATGTAGCTATA' # TATA
]
for ...
# Your code here
Sequence 'ATCGATATAATC' contains the 'TATA' motif.
# List of DNA sequences without a TATA
= [
dna_sequences 'ATGCTACAGCTAG',
'ATCGATACAATC',
'ATGGCTAGCTA',
'ATGTAGCTAGC'
]
for ...
# Your code here
No sequence contains the 'TATA' motif.
Analyze a DNA sequence to count the number of consecutive 'A'
nucleotides. You must use only one while
loop.
With the input given below, the output should look like this:
# DNA sequence to analyze
= 'ATGATAAGAGAAAGTAAAAGCGATCGAAAAAA'
dna_sequence
while ...
# Your code here
Number of consecutive 'A's: 6
Congrats! You now know the (very) basics of Python programming.
If you want to keep on practising with simple exercises, you can check out w3schools.
For more biology-related exercises check out pythonforbiologist.org, they have exercises availables in each chapters.
For french speakers, the AFPy (Association Francophone Python) has a learning tool called HackInScience.
Or keep on googling for more python exercises!
To continue your learning journey, follow lesson 2.
A python conference organized by the AFPy (Association Francophone Python) is held in Strasbourg in the end of October 2024!
Here are some references and ressources that (greatly) inspired this class:
6.4 Comment your code
Except for the shebang and coding specifications seen before, all things after a hashtag
#
character will be ignored by the interpreter until the end of the line. This is used to add comments in your code.Comments are used to: