FASTA/FASTQ Scripts

Scripts involving

dump_fasta_stats.py

This script will check the .fasta file that if there are duplicate IDs in the sequence.

Input:

  • -i : .fasta file to check

Output:

  • If duplicate IDs, output will be “Id repeated: bad fasta file”.

  • If no duplicated IDs, output will be the number of sequence.

Usage:

  • python dump-fasta-stats.py –version

  • This is the option that show you the program’s version.*

  • python dump-fasta-stats.py -h

  • This can show you some help information.*

  • python dump-fasta-stats.py -i <filename.fasta>

  • This will check the .fasta file that if there are duplicate IDs in the sequence. *

dump_fasta_stats.count_sequences(file_to_check)

Count number of sequences. Inform the user if a ID repeats.

dump_fasta_stats.main()

Prints number of sequences.

fasta_parser.py

This script will print (std out) the sequence of a record with specified ID.

Input:

  • -i : .fasta file to search for record

  • -v : ID to search for

Output:

  • Sequence of record with specified ID

Usage:

  • python fasta-parser.py –version *

  • This is the option that show you the program’s version. *

  • python fasta-parser.py -h

  • This can show you some help information.

  • python fasta-parser.py -i <filename.fasta> -v <ID>

Runs program with specified file and ID

fasta_parser.main()

Find values with valid ID

fastq_parser.py

This script will print (std out) the read_id, read_seq and read_qual from the input fastq file.

Input:

  • -i : fastq file to print values from

Output:

  • read_id, read_seq and read_qual

Usage:

  • python fastq-parser.py –version

This is the option that show you the program’s version.

  • python fastq-parser.py -h

This can show you some help information.

  • python fastq-parser.py -i <filename.fastq>

Runs program with specified fastq file

fastq_parser.main()

Print read_id, read_seq and read_qual

parse_big_fasta.py

Parses RVDB formatted FASTA headers so they can be interperated by HIVE-hexagon’s tablequery

Input:

  • -i : input FASTA file to reformat

  • -o : specified output file

Output:

  • Reformatted FASTA file

Usage:

*python parse_big_fasta.py –version

This is the option that show you the program’s version.

*python parse_big_fasta.py -h

This can show you some help information

*python parse_big_fasta.py -i <filename.fasta> -o <output_file>

Runs program with specified FASTA file and output file

parse_big_fasta.create_arg_parser()

Creates and returns the ArgumentParser object.

parse_big_fasta.format_header(parsed_args)

Parse the RVDB formatted FASTA headers and re-writes in desired format

parse_big_fasta.main()

Write reformatted .fasta to specified file