Skip to contents

Create transformer block. Consists of self attention, dense layers, layer normalization, recurrent connection and dropout.

Usage

layer_transformer_block_wrapper(
  num_heads = 2,
  head_size = 4,
  dropout_rate = 0,
  ff_dim = 64,
  vocabulary_size = 4,
  load_r6 = FALSE,
  embed_dim = 64
)

Arguments

num_heads

Number of attention heads.

head_size

Dimensions of attention key.

dropout_rate

Rate to randomly drop out connections.

ff_dim

Units of first dense layer after attention blocks.

vocabulary_size

Number of unique character in vocabulary.

load_r6

Whether to return the layer class.

embed_dim

Dimension for token embedding. No embedding if set to 0. Should be used when input is not one-hot encoded (integer sequence).

Value

A keras layer implementing a transformer block.

Examples

if (FALSE) { # reticulate::py_module_available("tensorflow")
l <- layer_transformer_block_wrapper()
}