Create transformer block. Consists of self attention, dense layers, layer normalization, recurrent connection and dropout.
Usage
layer_transformer_block_wrapper(
  num_heads = 2,
  head_size = 4,
  dropout_rate = 0,
  ff_dim = 64,
  vocabulary_size = 4,
  load_r6 = FALSE,
  embed_dim = 64
)Arguments
- num_heads
- Number of attention heads. 
- head_size
- Dimensions of attention key. 
- dropout_rate
- Rate to randomly drop out connections. 
- ff_dim
- Units of first dense layer after attention blocks. 
- vocabulary_size
- Number of unique character in vocabulary. 
- load_r6
- Whether to return the layer class. 
- embed_dim
- Dimension for token embedding. No embedding if set to 0. Should be used when input is not one-hot encoded (integer sequence). 
