Create random sequences from predefined vocabulary and write to fasta file.
Usage
create_dummy_data(
file_path,
num_files,
header = "header",
seq_length,
num_seq,
fasta_name_start = "file",
write_to_file_path = FALSE,
prob = NULL,
vocabulary = c("a", "c", "g", "t")
)
Arguments
- file_path
Output directory; can also be a file name but only possible if
write_to_file_path = TRUE
andnum_files = 1
).- num_files
Number of files to create.
- header
Fasta header name.
- seq_length
Length of one sequence. If vector longer than 1, will randomly sample from that vector.
- num_seq
Number of sequences per file.
- fasta_name_start
Beginning string of file name. Output files are named fasta_name_start + _i.fasta where i is an integer index.
- write_to_file_path
Whether to write output directly to
file_path
, i.e. file_path is not a directory.- prob
Probability of each character in the
vocabulary
to be sampled. IfNULL
each character has same probability.- vocabulary
Set of characters to sample sequences from.
Examples
path_output <- tempfile()
dir.create(path_output)
create_dummy_data(file_path = path_output,
num_files = 3,
seq_length = 11,
num_seq = 5,
vocabulary = c("a", "c", "g", "t"))
list.files(path_output)
#> [1] "file_1.fasta" "file_2.fasta" "file_3.fasta"