TAPAS (Testing Aligner Performance for Ancient Samples) is a software package written in Python and R which enables you to test how well parameters of NGS short read mappers perform when mapping reads to a reference. The reads can be subjected to artificial damage to simulate evolutionary distance and chemical alteration of ancient DNA samples.
Installation
Download the package from the project's GitHub page
See the section Dependencies of the scripts to learn about the installation process of TAPAS
Workflow
TAPAS can be used to generate artificial NGS reads from a given reference genome, introduce mutations into them which follow a user-defined pattern and evaluate the mapping success of the reads with a user-supplied short read mapper.
See the section about Input files to see which files must be supplied prior to the analysis.
In the following, the workflow is outlined, where links are provided to manual chapters which give more detail on each step.
- Prepare the reference genome. See section Genome Preparation
- Generate artificial reads from a sample genome. See Read generation
- Mutations must be introduced into the reads to mimic chemical damage of ancient DNA and evolutionary distance of the reads to the reference. Read the section about Read mutation for more information
- The mutated artificial reads are mapped back to the reference from which they originated. This simulates the process of mapping ancient DNA reads to a recent reference genome. TAPAS allows for comparison of different mapping parameters in terms of mapping success. Section Mapping and How to write a mapping script provide details on how the mapping commands and parameters are provided to TAPAS.
- The read mapping yields one SAM file for every mapping parameter set tested. The section SAM parsing provides information on how the mapping positions of the artificial reads can be compared to their known, true, positions in the reference to determine measures of mapping accuracy like sensitivity and specificity.
- Finally, the obtained accuracy values can be analysed to determine the importance of a certain mapping parameter choice for the mapping success. The section Parameter influence provides details.
Apart from that, the Appendix lists additional information:
- All commands from the tutorial, collated into one script for overview: Appendix A1
- An overview of all scripts provided by TAPAS: Appendix A2
- How to put together artificial FASTQ files from their parts (1) nucleotide strings, (2) quality strings and (3) read names to allow for more fine-grained read generation and incorporation of external tools: Appendix A3: Advanced Read generation
- Appendix B: Glossary