SlideShare a Scribd company logo
File handling
     Karin Lagesen

karin.lagesen@bio.uio.no
Homework
●   ATCurve.py
      ●   take an input string from the user
      ●   check if the sequence only contains DNA – if
          not, prompt for new sequence.
      ●   calculate a running average of AT content
          along the sequence. Window size should be
          3, and the step size should be 1. Print one
          value per line.
●   Note: you need to include several runtime
    examples to show that all parts of the code
    works.
ATCurve.py - thinking
●   Take input from user:
     ●   raw_input
●   Check for the presence of !ATCG
     ●   use sets – very easy
●   Calculate AT – window = 3, step = 1
     ●   iterate over string in slices of three
ATCurve.py
# variable valid is used to see if the string is ok or not.
valid = False
while not valid:
   # promt user for input using raw_input() and store in string,
   # convert all characters into uppercase
   test_string = raw_input("Enter string: ")
   upper_string = test_string.upper()

  # Figure out if anything else than ATGCs are present
  dnaset = set(list("ATGC"))
  upper_string_set = set(list(upper_string))

  if len(upper_string_set - dnaset) > 0:
         print "Non-DNA present in your string, try again"
  else:
         valid = True



if valid:
    for i in range(0, len(upper_string)-3, 1):
       at_sum = 0.0
        at_sum += upper_string.count("A",i,i+2)
        at_sum += upper_string.count("T",i,i+2)
Homework
●   CodonFrequency.py
     ●   take an input string from the user
     ●   if the sequence only contains DNA
           –   find a start codon in your string
           –   if startcodon is present
                  ●   count the occurrences of each three-mer from start
                      codon and onwards
                  ●   print the results
CodonFrequency.py - thinking
●   First part – same as earlier
●   Find start codon: locate index of AUG
      ●   Note, can simplify and find ATG
●   If start codon is found:
      ●   create dictionary
      ●   for slice of three in input[StartCodon:]:
            –   get codon
            –   if codon is in dict:
                    ●   add to count
            –   if not:
                    ●   create key-value pair in dict
CodonFrequency.py
input = raw_input("Type a piece of DNA here: ")

if len(set(input) - set(list("ATGC"))) > 0:
    print "Not a valid DNA sequence"
else:
    atg = input.find("ATG")
    if atg == -1:
        print "Start codon not found"
    else:
        codondict = {}
        for i in xrange(atg,len(input)-3,3):
           codon = input[i:i+3]
           if codon not in codondict:
               codondict[codon] = 1
           else:
               codondict[codon] +=1

     for codon in codondict:
        print codon, codondict[codon]
CodonFrequency.py w/
     stopcodon
input = raw_input("Type a piece of DNA here: ")

if len(set(input) - set(list("ATGC"))) > 0:
    print "Not a valid DNA sequence"
else:
    atg = input.find("ATG")
    if atg == -1:
        print "Start codon not found"
    else:
        codondict = {}
        for i in xrange(atg,len(input) -3,3):
           codon = input[i:i+3]
           if codon in ['UAG', 'UAA', 'UAG']:
               break
           elif codon not in codondict:
               codondict[codon] = 1
           else:
               codondict[codon] +=1

     for codon in codondict:
        print codon, codondict[codon]
Results

[karinlag@freebee]/projects/temporary/cees-python-course/Karin% python CodonFrequency2.py
Type a piece of DNA here: ATGATTATTTAAATG
ATG 1
ATT 2
TAA 1
[karinlag@freebee]/projects/temporary/cees-python-course/Karin% python CodonFrequency2.py
Type a piece of DNA here: ATGATTATTTAAATGT
ATG 2
ATT 2
TAA 1
[karinlag@freebee]/projects/temporary/cees-python-course/Karin%
Working with files
●   Reading – get info into your program
●   Parsing – processing file contents
●   Writing – get info out of your program
Reading and writing
●   Three-step process
     ●   Open file
           –   create file handle – reference to file
     ●   Read or write to file
     ●   Close file
           –   will be automatically close on program end, but
               bad form to not close
Opening files
●   Opening modes:
     ●   “r” - read file
     ●   “w” - write file
     ●   “a” - append to end of file
●   fh = open(“filename”, “mode”)
●   fh = filehandle, reference to a file, NOT the
    file itself
Reading a file
●   Three ways to read
     ●   read([n]) - n = bytes to read, default is all
     ●   readline() - read one line, incl. newline
     ●   readlines() - read file into a list, one element
         per line, including newline
Reading example
●   Log on to freebee, and go to your area
●   do cp ../Karin/fastafile.fsa .
●   open python
       >>> fh = open("fastafile.fsa", "r")
       >>> fh



●   Q: what does the response mean?
Read example
●   Use all three methods to read the file. Print
    the results.
     ●   read
     ●   readlines
     ●   readline
●   Q: what happens after you have read the
    file?
●   Q: What is the difference between the
    three?
Read example
>>> fh = open("fastafile.fsa", "r")
>>> withread = fh.read()
>>> withread
'>This is the description linenATGCGCTTAGGATCGATAGCGATTTAGAnTTAGCGGAn'
>>> withreadlines = fh.readlines()
>>> withreadlines
[]
>>> fh = open("fastafile.fsa", "r")
>>> withreadlines = fh.readlines()
>>> withreadlines
['>This is the description linen', 'ATGCGCTTAGGATCGATAGCGATTTAGAn', 'TTAGCGGAn']
>>> fh = open("fastafile.fsa", "r")
>>> withreadline = fh.readline()
>>> withreadline
'>This is the description linen'
>>>
Parsing
●   Getting information out of a file
●   Commonly used string methods
      ●   split([character]) – default is whitespace
      ●   replace(“in string”, “put into instead”)
      ●   “string character”.join(list)
            –   joins all elements in the list with string
                character as a separator
            –   common construction: ''.join(list)
      ●   slicing
Type conversions
●   Everything that comes on the command
    line or from a file is a string
●   Conversions:
     ●   int(X)
           –   string cannot have decimals
           –   floats will be floored
     ●   float(X)
     ●   str(X)
Parsing example
●   Continue using fastafile.fsa
●   Print only the description line to screen
●   Print the whole DNA string
    >>> fh = open("fastafile.fsa", "r")
    >>> firstline = fh.readline()
    >>> print firstline[1:-1]
    This is the description line
    >>> sequence = ''
    >>> for line in fh:
    ... sequence += line.replace("n", "")
    ...
    >>> print sequence
    ATGCGCTTAGGATCGATAGCGATTTAGA
    >>>
Accepting input from
             command line
●   Need to be able to specify file name on
    command line
●   Command line parameters stored in list
    called sys.argv – program name is 0
●   Usage:
      ●   python pythonscript.py arg1 arg2 arg3....
●   In script:
      ●   at the top of the file, write import sys
      ●
          arg1 = sys.argv[1]
Batch example
●   Read fastafile.fsa with all three methods
●   Per method, print method, name and
    sequence
●   Remember to close the file at the end!
Batch example
import sys
filename = sys.argv[1]
#using readline
fh = open(filename, "r")
firstline = fh.readline()
name = firstline[1:-1]
sequence =''
for line in fh:
    sequence += line.replace("n", "")
print "Readline", name, sequence

#using readlines()
fh = open(filename, "r")
inputlines = fh.readlines()
name = inputlines[0][1:-1]
sequence = ''
for line in inputlines[1:]:
   sequence += line.replace("n", "")
print "Readlines", name, sequence

#using read
fh = open(filename, "r")
inputlines = fh.read()
name = inputlines.split("n")[0][1:-1]
sequence = "".join(inputlines.split("n")[1:])
print "Read", name, sequence

fh.close()
Classroom exercise
●   Modify ATCurve.py script so that it accepts
    the following input on the command line:
      ●   fasta filename
      ●   window size
●   Let the user input an alternate filename if it
    contains !ATGC
●   Print results to screen
ATCurve2.py
import sys
# Define filename
filename = sys.argv[1]
windowsize = int(sys.argv[2])

# variable valid is used to see if the string is ok or not.
valid = False
while not valid:
   fh = open(filename, "r")
   inputlines = fh.readlines()
   name = inputlines[0][1:-1]
   sequence = ''
   for line in inputlines[1:]:
          sequence += line.replace("n", "")
   upper_string = sequence.upper()

  # Figure out if anything else than ATGCs are present
  dnaset = set(list("ATGC"))
  upper_string_set = set(list(upper_string))

  if len(upper_string_set - dnaset) > 0:
        print "Non-DNA present in your file, try again"
        filename = raw_input("Type in filename: ")
  else:
        valid = True

if valid:
    for i in range(0, len(upper_string)-windowsize + 1, 1):
       at_sum = 0.0
       at_sum += upper_string.count("A",i,i+windowsize)
       at_sum += upper_string.count("T",i,i+windowsize)
       print i + 1, at_sum/windowsize
Writing to files
●   Similar procedure as for read
     ●   Open file, mode is “w” or “a”
     ●   fh.write(string)
           –   Note: one single string
           –   No newlines are added
     ●   fh.close()
ATContent3.py
●   Modify previous script so that you have the
    following on the command line
     ●   fasta filename for input file
     ●   window size
     ●   output file
●   Output should be on the format
     ●   number, AT content
     ●   number is the 1-based position of the first
         nucleotide in the window
ATCurve3.py

 import sys
 # Define filename
 filename = sys.argv[1]
 windowsize = int(sys.argv[2])
 outputfile = sys.argv[3]



if valid:
    fh = open(outputfile, "w")
    for i in range(0, len(upper_string)-windowsize + 1, 1):
       at_sum = 0.0
       at_sum += upper_string.count("A",i,i+windowsize)
       at_sum += upper_string.count("T",i,i+windowsize)
       fh.write(str(i + 1) + " " + str(at_sum/windowsize) + "n")
    fh.close()
Homework:
            TranslateProtein.py
●   Input files are in
    /projects/temporary/cees-python-course/Karin
      ●   translationtable.txt - tab separated
      ●   dna31.fsa
●   Script should:
      ●   Open the translationtable.txt file and read it into a
          dictionary
      ●   Open the dna31.fsa file and read the contents.
      ●   Translates the DNA into protein using the dictionary
      ●   Prints the translation in a fasta format to the file
          TranslateProtein.fsa. Each protein line should be 60
          characters long.

More Related Content

ODP
Python course Day 1
ODP
PPTX
PYTHON -Chapter 2 - Functions, Exception, Modules and Files -MAULIK BOR...
PDF
Functions and modules in python
PPTX
PPT
Python
PDF
4. python functions
Python course Day 1
PYTHON -Chapter 2 - Functions, Exception, Modules and Files -MAULIK BOR...
Functions and modules in python
Python
4. python functions

What's hot (20)

PPTX
Functions in python
PPTX
Programming in Python
PDF
python codes
PDF
Python
PPTX
Python programming
ODP
Biopython
PPTX
GE8151 Problem Solving and Python Programming
PPTX
Python ppt
PPTX
Python language data types
PDF
Python programming Workshop SITTTR - Kalamassery
PPT
Intro to Functions Python
PPT
Programming in Computational Biology
PDF
Python basic
PPTX
Python for Beginners(v1)
PPTX
Pythonppt28 11-18
PPTX
Python basics
PPT
4 b file-io-if-then-else
PDF
Python Basics
PPTX
Python programing
PPTX
Iteration
Functions in python
Programming in Python
python codes
Python
Python programming
Biopython
GE8151 Problem Solving and Python Programming
Python ppt
Python language data types
Python programming Workshop SITTTR - Kalamassery
Intro to Functions Python
Programming in Computational Biology
Python basic
Python for Beginners(v1)
Pythonppt28 11-18
Python basics
4 b file-io-if-then-else
Python Basics
Python programing
Iteration
Ad

Viewers also liked (6)

PPT
Charla orientación 4º eso
PDF
Organización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FP
PDF
Sesión informativa 1º PCPI 2014
PPTX
Charla orientación 4º eso
PPTX
Presentation1
PDF
2015 12-09 nmdd
Charla orientación 4º eso
Organización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FP
Sesión informativa 1º PCPI 2014
Charla orientación 4º eso
Presentation1
2015 12-09 nmdd
Ad

Similar to Day3 (20)

PPTX
UNIT –5.pptxpython for engineering students
PPTX
iPython
PPT
PPTX
Lecturer notes on file handling in programming C
PDF
Introduction To Programming with Python
PPTX
Productive bash
PPTX
PDF
Python 101
PPTX
Python basics
PPTX
Python basics
PPTX
Python basics
PPTX
Python basics
PPTX
Python basics
PPTX
Python basics
PPTX
File management
PPTX
System Calls.pptxnsjsnssbhsbbebdbdbshshsbshsbbs
PPTX
Introduction about Low Level Programming using C
PPT
file.ppt
PPTX
shellScriptAlt.pptx
PDF
Python overview
UNIT –5.pptxpython for engineering students
iPython
Lecturer notes on file handling in programming C
Introduction To Programming with Python
Productive bash
Python 101
Python basics
Python basics
Python basics
Python basics
Python basics
Python basics
File management
System Calls.pptxnsjsnssbhsbbebdbdbshshsbshsbbs
Introduction about Low Level Programming using C
file.ppt
shellScriptAlt.pptx
Python overview

Recently uploaded (20)

PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Cell Structure & Organelles in detailed.
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
Computing-Curriculum for Schools in Ghana
PDF
Complications of Minimal Access Surgery at WLH
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Trump Administration's workforce development strategy
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
RMMM.pdf make it easy to upload and study
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
Supply Chain Operations Speaking Notes -ICLT Program
Anesthesia in Laparoscopic Surgery in India
Cell Structure & Organelles in detailed.
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
History, Philosophy and sociology of education (1).pptx
Computing-Curriculum for Schools in Ghana
Complications of Minimal Access Surgery at WLH
STATICS OF THE RIGID BODIES Hibbelers.pdf
Trump Administration's workforce development strategy
What if we spent less time fighting change, and more time building what’s rig...
2.FourierTransform-ShortQuestionswithAnswers.pdf
Orientation - ARALprogram of Deped to the Parents.pptx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
RMMM.pdf make it easy to upload and study
Weekly quiz Compilation Jan -July 25.pdf
Final Presentation General Medicine 03-08-2024.pptx
A systematic review of self-coping strategies used by university students to ...
LDMMIA Reiki Yoga Finals Review Spring Summer
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx

Day3

  • 2. Homework ● ATCurve.py ● take an input string from the user ● check if the sequence only contains DNA – if not, prompt for new sequence. ● calculate a running average of AT content along the sequence. Window size should be 3, and the step size should be 1. Print one value per line. ● Note: you need to include several runtime examples to show that all parts of the code works.
  • 3. ATCurve.py - thinking ● Take input from user: ● raw_input ● Check for the presence of !ATCG ● use sets – very easy ● Calculate AT – window = 3, step = 1 ● iterate over string in slices of three
  • 4. ATCurve.py # variable valid is used to see if the string is ok or not. valid = False while not valid: # promt user for input using raw_input() and store in string, # convert all characters into uppercase test_string = raw_input("Enter string: ") upper_string = test_string.upper() # Figure out if anything else than ATGCs are present dnaset = set(list("ATGC")) upper_string_set = set(list(upper_string)) if len(upper_string_set - dnaset) > 0: print "Non-DNA present in your string, try again" else: valid = True if valid: for i in range(0, len(upper_string)-3, 1): at_sum = 0.0 at_sum += upper_string.count("A",i,i+2) at_sum += upper_string.count("T",i,i+2)
  • 5. Homework ● CodonFrequency.py ● take an input string from the user ● if the sequence only contains DNA – find a start codon in your string – if startcodon is present ● count the occurrences of each three-mer from start codon and onwards ● print the results
  • 6. CodonFrequency.py - thinking ● First part – same as earlier ● Find start codon: locate index of AUG ● Note, can simplify and find ATG ● If start codon is found: ● create dictionary ● for slice of three in input[StartCodon:]: – get codon – if codon is in dict: ● add to count – if not: ● create key-value pair in dict
  • 7. CodonFrequency.py input = raw_input("Type a piece of DNA here: ") if len(set(input) - set(list("ATGC"))) > 0: print "Not a valid DNA sequence" else: atg = input.find("ATG") if atg == -1: print "Start codon not found" else: codondict = {} for i in xrange(atg,len(input)-3,3): codon = input[i:i+3] if codon not in codondict: codondict[codon] = 1 else: codondict[codon] +=1 for codon in codondict: print codon, codondict[codon]
  • 8. CodonFrequency.py w/ stopcodon input = raw_input("Type a piece of DNA here: ") if len(set(input) - set(list("ATGC"))) > 0: print "Not a valid DNA sequence" else: atg = input.find("ATG") if atg == -1: print "Start codon not found" else: codondict = {} for i in xrange(atg,len(input) -3,3): codon = input[i:i+3] if codon in ['UAG', 'UAA', 'UAG']: break elif codon not in codondict: codondict[codon] = 1 else: codondict[codon] +=1 for codon in codondict: print codon, codondict[codon]
  • 9. Results [karinlag@freebee]/projects/temporary/cees-python-course/Karin% python CodonFrequency2.py Type a piece of DNA here: ATGATTATTTAAATG ATG 1 ATT 2 TAA 1 [karinlag@freebee]/projects/temporary/cees-python-course/Karin% python CodonFrequency2.py Type a piece of DNA here: ATGATTATTTAAATGT ATG 2 ATT 2 TAA 1 [karinlag@freebee]/projects/temporary/cees-python-course/Karin%
  • 10. Working with files ● Reading – get info into your program ● Parsing – processing file contents ● Writing – get info out of your program
  • 11. Reading and writing ● Three-step process ● Open file – create file handle – reference to file ● Read or write to file ● Close file – will be automatically close on program end, but bad form to not close
  • 12. Opening files ● Opening modes: ● “r” - read file ● “w” - write file ● “a” - append to end of file ● fh = open(“filename”, “mode”) ● fh = filehandle, reference to a file, NOT the file itself
  • 13. Reading a file ● Three ways to read ● read([n]) - n = bytes to read, default is all ● readline() - read one line, incl. newline ● readlines() - read file into a list, one element per line, including newline
  • 14. Reading example ● Log on to freebee, and go to your area ● do cp ../Karin/fastafile.fsa . ● open python >>> fh = open("fastafile.fsa", "r") >>> fh ● Q: what does the response mean?
  • 15. Read example ● Use all three methods to read the file. Print the results. ● read ● readlines ● readline ● Q: what happens after you have read the file? ● Q: What is the difference between the three?
  • 16. Read example >>> fh = open("fastafile.fsa", "r") >>> withread = fh.read() >>> withread '>This is the description linenATGCGCTTAGGATCGATAGCGATTTAGAnTTAGCGGAn' >>> withreadlines = fh.readlines() >>> withreadlines [] >>> fh = open("fastafile.fsa", "r") >>> withreadlines = fh.readlines() >>> withreadlines ['>This is the description linen', 'ATGCGCTTAGGATCGATAGCGATTTAGAn', 'TTAGCGGAn'] >>> fh = open("fastafile.fsa", "r") >>> withreadline = fh.readline() >>> withreadline '>This is the description linen' >>>
  • 17. Parsing ● Getting information out of a file ● Commonly used string methods ● split([character]) – default is whitespace ● replace(“in string”, “put into instead”) ● “string character”.join(list) – joins all elements in the list with string character as a separator – common construction: ''.join(list) ● slicing
  • 18. Type conversions ● Everything that comes on the command line or from a file is a string ● Conversions: ● int(X) – string cannot have decimals – floats will be floored ● float(X) ● str(X)
  • 19. Parsing example ● Continue using fastafile.fsa ● Print only the description line to screen ● Print the whole DNA string >>> fh = open("fastafile.fsa", "r") >>> firstline = fh.readline() >>> print firstline[1:-1] This is the description line >>> sequence = '' >>> for line in fh: ... sequence += line.replace("n", "") ... >>> print sequence ATGCGCTTAGGATCGATAGCGATTTAGA >>>
  • 20. Accepting input from command line ● Need to be able to specify file name on command line ● Command line parameters stored in list called sys.argv – program name is 0 ● Usage: ● python pythonscript.py arg1 arg2 arg3.... ● In script: ● at the top of the file, write import sys ● arg1 = sys.argv[1]
  • 21. Batch example ● Read fastafile.fsa with all three methods ● Per method, print method, name and sequence ● Remember to close the file at the end!
  • 22. Batch example import sys filename = sys.argv[1] #using readline fh = open(filename, "r") firstline = fh.readline() name = firstline[1:-1] sequence ='' for line in fh: sequence += line.replace("n", "") print "Readline", name, sequence #using readlines() fh = open(filename, "r") inputlines = fh.readlines() name = inputlines[0][1:-1] sequence = '' for line in inputlines[1:]: sequence += line.replace("n", "") print "Readlines", name, sequence #using read fh = open(filename, "r") inputlines = fh.read() name = inputlines.split("n")[0][1:-1] sequence = "".join(inputlines.split("n")[1:]) print "Read", name, sequence fh.close()
  • 23. Classroom exercise ● Modify ATCurve.py script so that it accepts the following input on the command line: ● fasta filename ● window size ● Let the user input an alternate filename if it contains !ATGC ● Print results to screen
  • 24. ATCurve2.py import sys # Define filename filename = sys.argv[1] windowsize = int(sys.argv[2]) # variable valid is used to see if the string is ok or not. valid = False while not valid: fh = open(filename, "r") inputlines = fh.readlines() name = inputlines[0][1:-1] sequence = '' for line in inputlines[1:]: sequence += line.replace("n", "") upper_string = sequence.upper() # Figure out if anything else than ATGCs are present dnaset = set(list("ATGC")) upper_string_set = set(list(upper_string)) if len(upper_string_set - dnaset) > 0: print "Non-DNA present in your file, try again" filename = raw_input("Type in filename: ") else: valid = True if valid: for i in range(0, len(upper_string)-windowsize + 1, 1): at_sum = 0.0 at_sum += upper_string.count("A",i,i+windowsize) at_sum += upper_string.count("T",i,i+windowsize) print i + 1, at_sum/windowsize
  • 25. Writing to files ● Similar procedure as for read ● Open file, mode is “w” or “a” ● fh.write(string) – Note: one single string – No newlines are added ● fh.close()
  • 26. ATContent3.py ● Modify previous script so that you have the following on the command line ● fasta filename for input file ● window size ● output file ● Output should be on the format ● number, AT content ● number is the 1-based position of the first nucleotide in the window
  • 27. ATCurve3.py import sys # Define filename filename = sys.argv[1] windowsize = int(sys.argv[2]) outputfile = sys.argv[3] if valid: fh = open(outputfile, "w") for i in range(0, len(upper_string)-windowsize + 1, 1): at_sum = 0.0 at_sum += upper_string.count("A",i,i+windowsize) at_sum += upper_string.count("T",i,i+windowsize) fh.write(str(i + 1) + " " + str(at_sum/windowsize) + "n") fh.close()
  • 28. Homework: TranslateProtein.py ● Input files are in /projects/temporary/cees-python-course/Karin ● translationtable.txt - tab separated ● dna31.fsa ● Script should: ● Open the translationtable.txt file and read it into a dictionary ● Open the dna31.fsa file and read the contents. ● Translates the DNA into protein using the dictionary ● Prints the translation in a fasta format to the file TranslateProtein.fsa. Each protein line should be 60 characters long.