Create random sequences from predefined vocabulary and write to fasta file.
Usage
create_dummy_data(
  file_path,
  num_files,
  header = "header",
  seq_length,
  num_seq,
  fasta_name_start = "file",
  write_to_file_path = FALSE,
  prob = NULL,
  vocabulary = c("a", "c", "g", "t")
)Arguments
- file_path
- Output directory; can also be a file name but only possible if - write_to_file_path = TRUEand- num_files = 1).
- num_files
- Number of files to create. 
- header
- Fasta header name. 
- seq_length
- Length of one sequence. If vector longer than 1, will randomly sample from that vector. 
- num_seq
- Number of sequences per file. 
- fasta_name_start
- Beginning string of file name. Output files are named fasta_name_start + _i.fasta where i is an integer index. 
- write_to_file_path
- Whether to write output directly to - file_path, i.e. file_path is not a directory.
- prob
- Probability of each character in the - vocabularyto be sampled. If- NULLeach character has same probability.
- vocabulary
- Set of characters to sample sequences from. 
Examples
path_output <- tempfile()
dir.create(path_output)
create_dummy_data(file_path = path_output,
                  num_files = 3,
                  seq_length = 11, 
                  num_seq = 5,                   
                  vocabulary = c("a", "c", "g", "t"))
list.files(path_output)                
#> [1] "file_1.fasta" "file_2.fasta" "file_3.fasta"
   
