Create LSTM/CNN network that can process multiple samples for one target — create_model_lstm_cnn_multi

Creates a network consisting of an arbitrary number of CNN, LSTM and dense layers with multiple input layers. After LSTM/CNN part all representations get aggregated by summation. Can be used to make single prediction for combination of multiple input sequences. Implements approach as described here

Usage

create_model_lstm_cnn_multi_input(
  maxlen = 50,
  dropout_lstm = 0,
  recurrent_dropout_lstm = 0,
  layer_lstm = NULL,
  layer_dense = c(4),
  dropout_dense = NULL,
  solver = "adam",
  learning_rate = 0.001,
  vocabulary_size = 4,
  bidirectional = FALSE,
  batch_size = NULL,
  compile = TRUE,
  kernel_size = NULL,
  filters = NULL,
  strides = NULL,
  pool_size = NULL,
  padding = "same",
  dilation_rate = NULL,
  gap_inputs = NULL,
  use_bias = TRUE,
  zero_mask = FALSE,
  label_smoothing = 0,
  label_noise_matrix = NULL,
  last_layer_activation = "softmax",
  loss_fn = "categorical_crossentropy",
  auc_metric = FALSE,
  f1_metric = FALSE,
  bal_acc = FALSE,
  samples_per_target,
  batch_norm_momentum = 0.99,
  aggregation_method = c("sum"),
  verbose = TRUE,
  model_seed = NULL,
  mixed_precision = FALSE,
  mirrored_strategy = NULL
)

Arguments

maxlen

Length of predictor sequence.

dropout_lstm

Fraction of the units to drop for inputs.

recurrent_dropout_lstm

Fraction of the units to drop for recurrent state.

layer_lstm

Number of cells per network layer. Can be a scalar or vector.

layer_dense

Vector specifying number of neurons per dense layer after last LSTM or CNN layer (if no LSTM used).

dropout_dense

Vector of dropout rates between dense layers. No dropout if NULL.

solver

Optimization method, options are "adam", "adagrad", "rmsprop" or "sgd".

learning_rate

Learning rate for optimizer.

vocabulary_size

Number of unique character in vocabulary.

bidirectional

Use bidirectional wrapper for lstm layers.

batch_size

Number of samples that are used for one network update. Only used if stateful = TRUE.

compile

Whether to compile the model.

kernel_size

Size of 1d convolutional layers. For multiple layers, assign a vector. (e.g, rep(3,2) for two layers and kernel size 3)

filters

Number of filters. For multiple layers, assign a vector.

strides

Stride values. For multiple layers, assign a vector.

pool_size

Integer, size of the max pooling windows. For multiple layers, assign a vector.

padding

Padding of CNN layers, e.g. "same", "valid" or "causal".

dilation_rate

Integer, the dilation rate to use for dilated convolution.

gap_inputs

Global pooling method to apply. Same options as for flatten_method argument in create_model_transformer function.

use_bias

Boolean. Usage of bias for CNN layers.

zero_mask

Boolean, whether to apply zero masking before LSTM layer. Only used if model does not use any CNN layers.

label_smoothing

Float in [0, 1]. If 0, no smoothing is applied. If > 0, loss between the predicted labels and a smoothed version of the true labels, where the smoothing squeezes the labels towards 0.5. The closer the argument is to 1 the more the labels get smoothed.

label_noise_matrix

Matrix of label noises. Every row stands for one class and columns for percentage of labels in that class. If first label contains 5 percent wrong labels and second label no noise, then

label_noise_matrix <- matrix(c(0.95, 0.05, 0, 1), nrow = 2, byrow = TRUE )

last_layer_activation

Activation function of output layer(s). For example "sigmoid" or "softmax".

loss_fn

Either "categorical_crossentropy" or "binary_crossentropy". If label_noise_matrix given, will use custom "noisy_loss".

auc_metric

Whether to add AUC metric.

f1_metric

Whether to add F1 metric.

bal_acc

Whether to add balanced accuracy.

samples_per_target

Number of samples to combine for one target.

batch_norm_momentum

Momentum for the moving mean and the moving variance.

aggregation_method

At least one of the options "sum", "mean", "max".

verbose

Boolean.

model_seed

Set seed for model parameters in tensorflow if not NULL.

mixed_precision

Whether to use mixed precision (https://www.tensorflow.org/guide/mixed_precision).

mirrored_strategy

Whether to use distributed mirrored strategy. If NULL, will use distributed mirrored strategy only if >1 GPU available.

Value

A keras model with multiple input layers. Input goes through shared LSTM/CNN layers.

Examples

if (FALSE) { # reticulate::py_module_available("tensorflow")
create_model_lstm_cnn_multi_input(
  maxlen = 50,
  vocabulary_size = 4,
  samples_per_target = 7,
  kernel_size = c(10, 10),
  filters = c(64, 128),
  pool_size = c(2, 2),
  layer_lstm = c(32),
  layer_dense = c(64, 2),
  aggregation_method = c("max"),
  learning_rate = 0.001)
  
}