Skip to contents

Create random sequences from predefined vocabulary and write to fasta file.

Usage

create_dummy_data(
  file_path,
  num_files,
  header = "header",
  seq_length,
  num_seq,
  fasta_name_start = "file",
  write_to_file_path = FALSE,
  prob = NULL,
  vocabulary = c("a", "c", "g", "t")
)

Arguments

file_path

Output directory; can also be a file name but only possible if write_to_file_path = TRUE and num_files = 1).

num_files

Number of files to create.

header

Fasta header name.

seq_length

Length of one sequence. If vector longer than 1, will randomly sample from that vector.

num_seq

Number of sequences per file.

fasta_name_start

Beginning string of file name. Output files are named fasta_name_start + _i.fasta where i is an integer index.

write_to_file_path

Whether to write output directly to file_path, i.e. file_path is not a directory.

prob

Probability of each character in the vocabulary to be sampled. If NULL each character has same probability.

vocabulary

Set of characters to sample sequences from.

Value

None. Writes data to files.

Examples

path_output <- tempfile()
dir.create(path_output)
create_dummy_data(file_path = path_output,
                  num_files = 3,
                  seq_length = 11, 
                  num_seq = 5,                   
                  vocabulary = c("a", "c", "g", "t"))
list.files(path_output)                
#> [1] "file_1.fasta" "file_2.fasta" "file_3.fasta"