Create transformer block. Consists of self attention, dense layers, layer normalization, recurrent connection and dropout.
Usage
layer_transformer_block_wrapper(
num_heads = 2,
head_size = 4,
dropout_rate = 0,
ff_dim = 64,
vocabulary_size = 4,
load_r6 = FALSE,
embed_dim = 64
)
Arguments
- num_heads
Number of attention heads.
- head_size
Dimensions of attention key.
- dropout_rate
Rate to randomly drop out connections.
- ff_dim
Units of first dense layer after attention blocks.
- vocabulary_size
Number of unique character in vocabulary.
- load_r6
Whether to return the layer class.
- embed_dim
Dimension for token embedding. No embedding if set to 0. Should be used when input is not one-hot encoded (integer sequence).