SlideShare a Scribd company logo
Barcelona Python Developers Seminars biopython, doctest and  makefiles
This is me Giovanni Phd student in a Population Genetics lab Not a biopython dev (that could be not my real photo)
Intro BioPython  -> a collection of standard python modules for bioinformatics Advantages of using open source libraries in science: more reproducibility easier to compare results less errors less time spent
BioPython – some use cases The human genome sequencing project (2001): TCCATGGCCTCCCGGCAAGCCTAAGCTAGCGCAATTGTCAGACGCACAGGACCGGTCTGGGGAGACCAATGTGTTCAGACAACGATTCCCAGCTAGTACCACTGTTTGACTCGGAAGATGTGTACAACTATTGTAGCGACTGTGTCCCATCATTGCATTCAAACCCAAGTAATTGATGGATCAACAAAGGATACACTCCAAAAGTCGCACAGAGATTGGTCATCTTAACGCGAGATTAAACATGCGTCTATACGCCCGTGTTAAGTTCGGCCGCCATCGTACAAATAAGCGAGNNNNTATCAATCTAATCTTAAACCGGCTCTTGAGAAGGGCTAGCGGCGTTAGGACCCGCTGCCGGCCGTGAGCGTGCGTTCACTCTGAACAGCGCCATCGATGGGTCGCTTGTGTAGCTATTTTAAGGACGCGACATAGGCCCTGGGGCAGTTACTGGGGCATGCCCACTATATCCGCGGGCAAGTTGGTATTCAGCTATGTTTATCTCTCGCCCAATGCGTGAAAGCGCCAAACGTGGGTAGAGGACTTAGCAATTTGGGGCATGCCCTGCTCTTTTAGATCTGTTAAGCAATCCGCGCGTAGGGCTCGCTGCGTCGTAAATGTGAGCGCAAGTCACCGACGCAGTGGTAATATACGTGTAACTGATCATCNNNNNNTCCCGAACCATGCCTTCTAACAGGAGATGCCCAAGGTCGAGGGTCACCGCCAACGACCGGCTGATCCCTGTTGGTGAGGATTTATGGAGGTGGACTGTCAGGTAGGCAAGAACTCTGGGTGAATTTGCGAGCGCTATCTCTAAGTTACACGCTTTACTGGGGCATGCCCGGGCCGTAGAAGTTACTGGGGCATGCCCCACGTAATAGGTTTTCATGAGGAGATGTTTGGTCTGATTCTCGAGATTGTGGCTAAGTATTGAGTCAGACTTACTGGGGCATTTACTGGGGCATGCCCGCCCTGCTCTTTTAGATCTGTTAAGCAATCCGCGCGTAGGGCTCGCTGCGTCGTAAATGTGAGCGCAAGTCACCGACGCAGTGGTAATATACGTGTAACTGATCATCTTCATGATTCCCGAACCATGCCTTCTAACAGGAGATGCCCAAGGTCGAGGGTCACCGCCAACGACCGGCTGATTTACTGGGGCATGCCCCCCNNNNNGAGGATTTNNNNTGGAGCCTATCTCACATTTTAAACTTCAATCATCATAACACGTGCGCACTTTTTCCGCGCTTGACGGCGAAGTGACTGGCCACTTCCTGCTCCCTGTTTTTCCCAATACCTGACAAGTGTGGCATCTGTCCCCCTGAAGAGGACTAGAGTATCATTACGGGGGGCTTGACACTTACCTTCATAGG............. Up to ~3*10 9  characters Lot of regexs (perl-ists like it) Could be obtained for <1000$ in the near future
BioPython – use cases Conversion between different formats Structure data into objects (genes, proteins, species, etc..) Match regular expressions/motifs Launch external tools (web or local) Retrieve data from public online resources Interrogate databases
BioPython documentation How the documentation of a project like biopython should be? follow strict specifications (it does already, epydoc) be always up-to-date have many examples of usage (there are many in the tutorials) A python module called ' doctest '  that can help in doing this.
def   say_hello (name):   ''' print hello <name> to the screen    example:   >>> say_hello('Albert Einstein')   hello Albert Einstein!!!   '''   print 'hello ' + name + '!!!' doctest doctest allows to incorporate examples of the usage of a function in its docstring, and use them as tests. Example of say_hello's usage function's docstring (everything in green)
The docstring The docstring is what is shown when you ask for help for a function; >>>  help (say_hello) Help on function say_hello in module __main__: say_hello(name) print hello <name> to the screen  example: >>> say_hello('Albert Einstein') hello Albert Einstein!!!
doctest – how does it works #!/usr/bin/env python def   sum (x, y):   '''    sums two numbers example:   >>> print sum(1, 2)   3   '''   return x + y if  __name__ == ' __main__ ': import doctest doctest.testmod() doctest.testmod () looks for any line beginning with ' >>> ' and execute it as a python command The result is compared with the subsequent lines (expected output). If there are differences, an error is raised. If 'print sum(1, 2)' doesn't return 3, an error is raised
doctest - examples BioPython - SeqIO.parse
doctest – file parsing example In bioinformatics there are many formats with semi-homonymous names  ped, tped, bed, tmap, pdb, fasta... It is useful to put an example of input file in every parser function
Choose good examples Write the doctest along with who will use the script (e.g. A fellow scientist) Ask them 'how this function is supposed to behave in this example?' Simplify: round all numbers to multiples of 100, put comments
Doctest – Pros and Cons Pros: docs always up to date Usage examples Quick tests when you are coding Cons: Functions that read files (StringIO? NamedTempFile?) Still need to write a unittest Can't use lines longer than 80 characters (PEP8) Random generators / statistics / rounding
Bioinformatics – a different approach The approach between programming software and programming experiments is different: Testing has different dimensions (biological meaning, reproducibility) Usually you write numerous scripts, each one carrying out a small task, and glue them with a pipeline/wrapper script/makefile/automated builds tool/xml described workflow/insert others here I am a makefile guy
What is a makefile? gnu/make is an utility for building C/C++ programs. It can be used to save shell commands (...) with their options and re-execute them at will. Example:   :$  make all   python  retrieve_data.py --option1 --option2   perl  convert_format.pl --input inputfile --option3   perl  convert_format.pl --inputfile inputfile2
Simplest Makefile example $: cat Makefile help :    echo  'execute “make all” to carry out the whole analysis' get_data :   python  retrieve_data.py --database ensembl --specie Human --output sequences.fasta calculate_results :   perl  calculate_results.pl --option1 --option2 --input sequence.fasta --output results.txt all : get_data calculate_results
Makefiles – Pros Conditional execution If there is no need to execute a command, it is skipped (checks if the expected output file already exists and is up-to-date) Chaining commands You can define the order in which commands must be executed (download sequences first, then read them) Support for clusters Syntax is ugly, but standard
Make - Cons Gnu/Make has a very ugly syntax Really, I hate its syntax I am looking for substitutes in python: scons paver waf (google summer of code project) Still haven't start using them ¿Implement something in biopython?
A more complicated Makefile Variables like %, $@, $<  Modificators like -, @ addprefix, addsuffix ?? Triple parentesis ??
Thanks for the attention! Did you like the talk?
BioPython – use cases Single Nucleotides Polymorphisms are positions in the genome that tend to vary most between different individuals We are working with data on 650.000 SNPs on 1000 of individuals Need to organize data on objects (SNPs, Genotypes, Individuals, Populations), use a database for support, calculate statistics on them
Doctest – a closer look #usr/bin/env python def   say_hello (name):   '''    print hello (name) to the screen  example:   >>> say_hello('Albert Einstein')   hello Albert Einstein!!!   '''   print  ' hello  ' + name + ' !!! ' if  __name__ == ' __main__ ': import doctest doctest.testmod() normal doc example of function usage expected output body of the function call to the doctest module new function definition

More Related Content

PDF
Linux intro 5 extra: makefiles
PDF
Linux intro 4 awk + makefile
PPTX
Functions in python
ODP
Biopython
PPTX
PYTHON -Chapter 2 - Functions, Exception, Modules and Files -MAULIK BOR...
PDF
4. python functions
PDF
C++ prgms io file unit 7
DOC
Inheritance
Linux intro 5 extra: makefiles
Linux intro 4 awk + makefile
Functions in python
Biopython
PYTHON -Chapter 2 - Functions, Exception, Modules and Files -MAULIK BOR...
4. python functions
C++ prgms io file unit 7
Inheritance

What's hot (20)

PDF
PyCon 2013 : Scripting to PyPi to GitHub and More
PPTX
Python Programming Essentials - M25 - os and sys modules
PDF
PPT
Programming in Computational Biology
PDF
Introduction to Python for Bioinformatics
ODP
Python course Day 1
PDF
Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.
PDF
The Ring programming language version 1.8 book - Part 9 of 202
PDF
PDF
Python for Linux System Administration
PDF
Commit ускоривший python 2.7.11 на 30% и новое в python 3.5
PPT
Java 7 - short intro to NIO.2
ODP
DOCX
Automate the boring stuff with python
PPTX
Systemcall1
PDF
The Ring programming language version 1.9 book - Part 11 of 210
PDF
Python于Web 2.0网站的应用 - QCon Beijing 2010
PDF
Python basic
PDF
Showdown of the Asserts by Philipp Krenn
PyCon 2013 : Scripting to PyPi to GitHub and More
Python Programming Essentials - M25 - os and sys modules
Programming in Computational Biology
Introduction to Python for Bioinformatics
Python course Day 1
Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.
The Ring programming language version 1.8 book - Part 9 of 202
Python for Linux System Administration
Commit ускоривший python 2.7.11 на 30% и новое в python 3.5
Java 7 - short intro to NIO.2
Automate the boring stuff with python
Systemcall1
The Ring programming language version 1.9 book - Part 11 of 210
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python basic
Showdown of the Asserts by Philipp Krenn
Ad

Similar to biopython, doctest and makefiles (20)

ODP
Programming Under Linux In Python
PPT
PDF
Docopt, beautiful command-line options for R, user2014
PPTX
python programming ppt-230111072927-1c7002a5.pptx
PPTX
PYTHON PPT.pptx
PPTX
2014 nicta-reproducibility
PDF
Python for Physical Science.pdf
PDF
Massively Parallel Process with Prodedural Python by Ian Huston
PDF
Intro-to-Python-Part-1-first-part-edition.pdf
PDF
Massively Parallel Processing with Procedural Python (PyData London 2014)
PPT
Bioinformatica 10-11-2011-p6-bioperl
PDF
First Steps in Python Programming
PDF
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
PPTX
Python fundamentals
PPTX
PYTHON 101.pptx
PPTX
Sour Pickles
ODP
Patterns in Python
 
ODP
Dynamic Python
PDF
Pyhton-1a-Basics.pdf
PPTX
Python PPT by Sushil Sir.pptx
Programming Under Linux In Python
Docopt, beautiful command-line options for R, user2014
python programming ppt-230111072927-1c7002a5.pptx
PYTHON PPT.pptx
2014 nicta-reproducibility
Python for Physical Science.pdf
Massively Parallel Process with Prodedural Python by Ian Huston
Intro-to-Python-Part-1-first-part-edition.pdf
Massively Parallel Processing with Procedural Python (PyData London 2014)
Bioinformatica 10-11-2011-p6-bioperl
First Steps in Python Programming
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Python fundamentals
PYTHON 101.pptx
Sour Pickles
Patterns in Python
 
Dynamic Python
Pyhton-1a-Basics.pdf
Python PPT by Sushil Sir.pptx
Ad

More from Giovanni Marco Dall'Olio (20)

PPTX
Applicazioni di chatGPT e altri LLMs per la ricerca di farmaci
PDF
Fehrman Nat Gen 2014 - Journal Club
PDF
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
PDF
PDF
Version control
PDF
Linux intro 5 extra: awk
PDF
Linux intro 3 grep + Unix piping
PDF
Linux intro 2 basic terminal
PDF
Linux intro 1 definitions
PDF
Wagner chapter 5
PDF
Wagner chapter 4
PDF
Wagner chapter 3
PDF
Wagner chapter 2
PDF
Wagner chapter 1
PDF
Hg for bioinformatics, second part
PDF
Hg version control bioinformaticians
PDF
The true story behind the annotation of a pathway
PDF
Plotting data with python and pylab
ODP
Makefiles Bioinfo
Applicazioni di chatGPT e altri LLMs per la ricerca di farmaci
Fehrman Nat Gen 2014 - Journal Club
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Version control
Linux intro 5 extra: awk
Linux intro 3 grep + Unix piping
Linux intro 2 basic terminal
Linux intro 1 definitions
Wagner chapter 5
Wagner chapter 4
Wagner chapter 3
Wagner chapter 2
Wagner chapter 1
Hg for bioinformatics, second part
Hg version control bioinformaticians
The true story behind the annotation of a pathway
Plotting data with python and pylab
Makefiles Bioinfo

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Big Data Technologies - Introduction.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
cuic standard and advanced reporting.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
A Presentation on Artificial Intelligence
PDF
Empathic Computing: Creating Shared Understanding
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Tartificialntelligence_presentation.pptx
PPTX
Machine Learning_overview_presentation.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
Teaching material agriculture food technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Encapsulation_ Review paper, used for researhc scholars
Advanced methodologies resolving dimensionality complications for autism neur...
Big Data Technologies - Introduction.pptx
A comparative analysis of optical character recognition models for extracting...
cuic standard and advanced reporting.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Group 1 Presentation -Planning and Decision Making .pptx
Spectroscopy.pptx food analysis technology
A Presentation on Artificial Intelligence
Empathic Computing: Creating Shared Understanding
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Getting Started with Data Integration: FME Form 101
Tartificialntelligence_presentation.pptx
Machine Learning_overview_presentation.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Teaching material agriculture food technology
20250228 LYD VKU AI Blended-Learning.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Encapsulation_ Review paper, used for researhc scholars

biopython, doctest and makefiles

  • 1. Barcelona Python Developers Seminars biopython, doctest and makefiles
  • 2. This is me Giovanni Phd student in a Population Genetics lab Not a biopython dev (that could be not my real photo)
  • 3. Intro BioPython -> a collection of standard python modules for bioinformatics Advantages of using open source libraries in science: more reproducibility easier to compare results less errors less time spent
  • 4. BioPython – some use cases The human genome sequencing project (2001): TCCATGGCCTCCCGGCAAGCCTAAGCTAGCGCAATTGTCAGACGCACAGGACCGGTCTGGGGAGACCAATGTGTTCAGACAACGATTCCCAGCTAGTACCACTGTTTGACTCGGAAGATGTGTACAACTATTGTAGCGACTGTGTCCCATCATTGCATTCAAACCCAAGTAATTGATGGATCAACAAAGGATACACTCCAAAAGTCGCACAGAGATTGGTCATCTTAACGCGAGATTAAACATGCGTCTATACGCCCGTGTTAAGTTCGGCCGCCATCGTACAAATAAGCGAGNNNNTATCAATCTAATCTTAAACCGGCTCTTGAGAAGGGCTAGCGGCGTTAGGACCCGCTGCCGGCCGTGAGCGTGCGTTCACTCTGAACAGCGCCATCGATGGGTCGCTTGTGTAGCTATTTTAAGGACGCGACATAGGCCCTGGGGCAGTTACTGGGGCATGCCCACTATATCCGCGGGCAAGTTGGTATTCAGCTATGTTTATCTCTCGCCCAATGCGTGAAAGCGCCAAACGTGGGTAGAGGACTTAGCAATTTGGGGCATGCCCTGCTCTTTTAGATCTGTTAAGCAATCCGCGCGTAGGGCTCGCTGCGTCGTAAATGTGAGCGCAAGTCACCGACGCAGTGGTAATATACGTGTAACTGATCATCNNNNNNTCCCGAACCATGCCTTCTAACAGGAGATGCCCAAGGTCGAGGGTCACCGCCAACGACCGGCTGATCCCTGTTGGTGAGGATTTATGGAGGTGGACTGTCAGGTAGGCAAGAACTCTGGGTGAATTTGCGAGCGCTATCTCTAAGTTACACGCTTTACTGGGGCATGCCCGGGCCGTAGAAGTTACTGGGGCATGCCCCACGTAATAGGTTTTCATGAGGAGATGTTTGGTCTGATTCTCGAGATTGTGGCTAAGTATTGAGTCAGACTTACTGGGGCATTTACTGGGGCATGCCCGCCCTGCTCTTTTAGATCTGTTAAGCAATCCGCGCGTAGGGCTCGCTGCGTCGTAAATGTGAGCGCAAGTCACCGACGCAGTGGTAATATACGTGTAACTGATCATCTTCATGATTCCCGAACCATGCCTTCTAACAGGAGATGCCCAAGGTCGAGGGTCACCGCCAACGACCGGCTGATTTACTGGGGCATGCCCCCCNNNNNGAGGATTTNNNNTGGAGCCTATCTCACATTTTAAACTTCAATCATCATAACACGTGCGCACTTTTTCCGCGCTTGACGGCGAAGTGACTGGCCACTTCCTGCTCCCTGTTTTTCCCAATACCTGACAAGTGTGGCATCTGTCCCCCTGAAGAGGACTAGAGTATCATTACGGGGGGCTTGACACTTACCTTCATAGG............. Up to ~3*10 9 characters Lot of regexs (perl-ists like it) Could be obtained for <1000$ in the near future
  • 5. BioPython – use cases Conversion between different formats Structure data into objects (genes, proteins, species, etc..) Match regular expressions/motifs Launch external tools (web or local) Retrieve data from public online resources Interrogate databases
  • 6. BioPython documentation How the documentation of a project like biopython should be? follow strict specifications (it does already, epydoc) be always up-to-date have many examples of usage (there are many in the tutorials) A python module called ' doctest ' that can help in doing this.
  • 7. def say_hello (name): ''' print hello <name> to the screen example: >>> say_hello('Albert Einstein') hello Albert Einstein!!! ''' print 'hello ' + name + '!!!' doctest doctest allows to incorporate examples of the usage of a function in its docstring, and use them as tests. Example of say_hello's usage function's docstring (everything in green)
  • 8. The docstring The docstring is what is shown when you ask for help for a function; >>> help (say_hello) Help on function say_hello in module __main__: say_hello(name) print hello <name> to the screen example: >>> say_hello('Albert Einstein') hello Albert Einstein!!!
  • 9. doctest – how does it works #!/usr/bin/env python def sum (x, y): ''' sums two numbers example: >>> print sum(1, 2) 3 ''' return x + y if __name__ == ' __main__ ': import doctest doctest.testmod() doctest.testmod () looks for any line beginning with ' >>> ' and execute it as a python command The result is compared with the subsequent lines (expected output). If there are differences, an error is raised. If 'print sum(1, 2)' doesn't return 3, an error is raised
  • 10. doctest - examples BioPython - SeqIO.parse
  • 11. doctest – file parsing example In bioinformatics there are many formats with semi-homonymous names ped, tped, bed, tmap, pdb, fasta... It is useful to put an example of input file in every parser function
  • 12. Choose good examples Write the doctest along with who will use the script (e.g. A fellow scientist) Ask them 'how this function is supposed to behave in this example?' Simplify: round all numbers to multiples of 100, put comments
  • 13. Doctest – Pros and Cons Pros: docs always up to date Usage examples Quick tests when you are coding Cons: Functions that read files (StringIO? NamedTempFile?) Still need to write a unittest Can't use lines longer than 80 characters (PEP8) Random generators / statistics / rounding
  • 14. Bioinformatics – a different approach The approach between programming software and programming experiments is different: Testing has different dimensions (biological meaning, reproducibility) Usually you write numerous scripts, each one carrying out a small task, and glue them with a pipeline/wrapper script/makefile/automated builds tool/xml described workflow/insert others here I am a makefile guy
  • 15. What is a makefile? gnu/make is an utility for building C/C++ programs. It can be used to save shell commands (...) with their options and re-execute them at will. Example: :$ make all python retrieve_data.py --option1 --option2 perl convert_format.pl --input inputfile --option3 perl convert_format.pl --inputfile inputfile2
  • 16. Simplest Makefile example $: cat Makefile help : echo 'execute “make all” to carry out the whole analysis' get_data : python retrieve_data.py --database ensembl --specie Human --output sequences.fasta calculate_results : perl calculate_results.pl --option1 --option2 --input sequence.fasta --output results.txt all : get_data calculate_results
  • 17. Makefiles – Pros Conditional execution If there is no need to execute a command, it is skipped (checks if the expected output file already exists and is up-to-date) Chaining commands You can define the order in which commands must be executed (download sequences first, then read them) Support for clusters Syntax is ugly, but standard
  • 18. Make - Cons Gnu/Make has a very ugly syntax Really, I hate its syntax I am looking for substitutes in python: scons paver waf (google summer of code project) Still haven't start using them ¿Implement something in biopython?
  • 19. A more complicated Makefile Variables like %, $@, $< Modificators like -, @ addprefix, addsuffix ?? Triple parentesis ??
  • 20. Thanks for the attention! Did you like the talk?
  • 21. BioPython – use cases Single Nucleotides Polymorphisms are positions in the genome that tend to vary most between different individuals We are working with data on 650.000 SNPs on 1000 of individuals Need to organize data on objects (SNPs, Genotypes, Individuals, Populations), use a database for support, calculate statistics on them
  • 22. Doctest – a closer look #usr/bin/env python def say_hello (name): ''' print hello (name) to the screen example: >>> say_hello('Albert Einstein') hello Albert Einstein!!! ''' print ' hello ' + name + ' !!! ' if __name__ == ' __main__ ': import doctest doctest.testmod() normal doc example of function usage expected output body of the function call to the doctest module new function definition