# Create LSTM/CNN network for combining multiple sequences

Source:`R/create_model_set_learning.R`

`create_model_lstm_cnn_time_dist.Rd`

Creates a network consisting of an arbitrary number of CNN, LSTM and dense layers. Input is a 4D tensor, where axis correspond to:

batch size

number of samples in one batch

length of one sample

size of vocabulary

After LSTM/CNN part all representations get aggregated by summation. Can be used to make single prediction for combination of multiple input sequences. Architecture is equivalent to create_model_lstm_cnn_multi_input but instead of multiple input layers with 3D input, input here in one 4D tensor.

## Usage

```
create_model_lstm_cnn_time_dist(
maxlen = 50,
dropout_lstm = 0,
recurrent_dropout_lstm = 0,
layer_lstm = NULL,
layer_dense = c(4),
solver = "adam",
learning_rate = 0.001,
vocabulary_size = 4,
bidirectional = FALSE,
stateful = FALSE,
batch_size = NULL,
compile = TRUE,
kernel_size = NULL,
filters = NULL,
strides = NULL,
pool_size = NULL,
padding = "same",
dilation_rate = NULL,
gap_time_dist = NULL,
use_bias = TRUE,
zero_mask = FALSE,
label_smoothing = 0,
label_noise_matrix = NULL,
last_layer_activation = "softmax",
loss_fn = "categorical_crossentropy",
auc_metric = FALSE,
f1_metric = FALSE,
samples_per_target,
batch_norm_momentum = 0.99,
verbose = TRUE,
model_seed = NULL,
aggregation_method = NULL,
transformer_args = NULL,
lstm_time_dist = NULL,
mixed_precision = FALSE,
bal_acc = FALSE,
mirrored_strategy = NULL
)
```

## Arguments

- maxlen
Length of predictor sequence.

- dropout_lstm
Fraction of the units to drop for inputs.

- recurrent_dropout_lstm
Fraction of the units to drop for recurrent state.

- layer_lstm
Number of cells per network layer. Can be a scalar or vector.

- layer_dense
Vector specifying number of neurons per dense layer after last LSTM or CNN layer (if no LSTM used).

- solver
Optimization method, options are

`"adam", "adagrad", "rmsprop"`

or`"sgd"`

.- learning_rate
Learning rate for optimizer.

- vocabulary_size
Number of unique character in vocabulary.

- bidirectional
Use bidirectional wrapper for lstm layers.

- stateful
Boolean. Whether to use stateful LSTM layer.

- batch_size
Number of samples that are used for one network update. Only used if

`stateful = TRUE`

.- compile
Whether to compile the model.

- kernel_size
Size of 1d convolutional layers. For multiple layers, assign a vector. (e.g,

`rep(3,2)`

for two layers and kernel size 3)- filters
Number of filters. For multiple layers, assign a vector.

- strides
Stride values. For multiple layers, assign a vector.

- pool_size
Integer, size of the max pooling windows. For multiple layers, assign a vector.

- padding
Padding of CNN layers, e.g.

`"same", "valid"`

or`"causal"`

.- dilation_rate
Integer, the dilation rate to use for dilated convolution.

- gap_time_dist
Pooling or flatten method after last time distribution wrapper. Same options as for

`flatten_method`

argument in create_model_transformer function.- use_bias
Boolean. Usage of bias for CNN layers.

- zero_mask
Boolean, whether to apply zero masking before LSTM layer. Only used if model does not use any CNN layers.

- label_smoothing
Float in [0, 1]. If 0, no smoothing is applied. If > 0, loss between the predicted labels and a smoothed version of the true labels, where the smoothing squeezes the labels towards 0.5. The closer the argument is to 1 the more the labels get smoothed.

- label_noise_matrix
Matrix of label noises. Every row stands for one class and columns for percentage of labels in that class. If first label contains 5 percent wrong labels and second label no noise, then

`label_noise_matrix <- matrix(c(0.95, 0.05, 0, 1), nrow = 2, byrow = TRUE )`

- last_layer_activation
Activation function of output layer(s). For example

`"sigmoid"`

or`"softmax"`

.- loss_fn
Either

`"categorical_crossentropy"`

or`"binary_crossentropy"`

. If`label_noise_matrix`

given, will use custom`"noisy_loss"`

.- auc_metric
Whether to add AUC metric.

- f1_metric
Whether to add F1 metric.

- samples_per_target
Number of samples to combine for one target.

- batch_norm_momentum
Momentum for the moving mean and the moving variance.

- verbose
Boolean.

- model_seed
Set seed for model parameters in tensorflow if not

`NULL`

.- aggregation_method
At least one of the options

`"sum", "mean", "max"`

.- transformer_args
List of arguments for transformer blocks; see layer_transformer_block_wrapper. Additionally, list can contain

`pool_flatten`

argument to apply global pooling or flattening after last transformer block (same options as`flatten_method`

argument in create_model_transformer function).- lstm_time_dist
Vector containing number of units per LSTM cell. Applied after time distribution part.

- mixed_precision
Whether to use mixed precision (https://www.tensorflow.org/guide/mixed_precision).

- bal_acc
Whether to add balanced accuracy.

- mirrored_strategy
Whether to use distributed mirrored strategy. If NULL, will use distributed mirrored strategy only if >1 GPU available.

## Examples

```
if (FALSE) { # reticulate::py_module_available("tensorflow")
create_model_lstm_cnn_time_dist(
maxlen = 50,
vocabulary_size = 4,
samples_per_target = 7,
kernel_size = c(10, 10),
filters = c(64, 128),
pool_size = c(2, 2),
layer_lstm = c(32),
aggregation_method = c("max"),
layer_dense = c(64, 2),
learning_rate = 0.001)
}
```