Train your own Metric

To train our models we rely on PyTorch Lightning Library. This means that all our models are Lightning Modules.

To train a new metric we just need to run 1 command:

comet train -f {my_configs}.yaml

This will setup a Lightning Trainer and fit your module accordingly.

Data Format

To train your metric we expect your data to be a csv with the following columns:

  • src: The source segment.
  • mt: The machine translation hypothesis.
  • ref: The reference segment.
  • score: The human judgment score.

Example:

src mt ref score
Hello world! Oi mundo. Olá mundo! 0.5
This is a sample este é um exemplo isto é um exemplo! 0.8

Training flags

Lightning Trainer Configurations

Argument Default Description
seed 3 Training seed.
deterministic True If true enables cudnn.deterministic. Might make your system slower, but ensures reproducibility.
verbose False Verbosity mode.
early_stopping True Enables early stopping.
save_top_k 1 Sets how many checkpoints we want to save (keeping only the best ones).
monitor Kendall Metric to monitor during training.
metric_mode max 'min' or 'max' depending if we wish to maximize or minimize the metric.
min_delta 0 Sensitivity to the metric.
patience 1 Number of epochs without improvement before stopping training
accumulate_grad_batches 1 Gradient accumulation steps
lr_finder False Enables the learning rate finder described in Cyclical Learning Rates for Training Neural Networks

Base Model Configurations

Argument Default Description
model required Type of metric we want to train. Options: [CometEstimator, CometRanker, QualityEstimator]
batch_size 8 Batch size used to train the model.
nr_frozen_epochs 0 Number of epochs we keep the encoder frozen.
keep_embeddings_frozen False If set to True, keeps the embedding layer frozen during training. Usefull to save some GPU memory.
optimizer Adam PyTorch Optimizer class name
learning_rate 1e-05 Learning rate to be used during training.
scheduler constant Learning Rate scheduler. Options: [constant, linear_warmup, warmup_constant]
warmup_steps None Scheduler warmup steps.
encoder_model XLMR Encoder Model to be used: Options: [LASER, BERT, XLMR].
pretrained_model xlmr.base pretrained model to be used e.g: xlmr.base vs xlmr.large (for LASER this is ignored)
pool avg Pooling technique to create the sentence embeddings. Options: [avg, avg+cls, max, cls, default]
load_weights False Loads compatible weights from another checkpoint file.
train_path required Path to the training csv.
val_path required Path to the validation csv.
test_path None Path to the test csv. (Not used)
loader_workers False Number of workers for loading and preparing the batches.

Note: The Ranker model requires no further configs.

Estimator Specific Configurations

Argument Default Description
encoder_learning_rate required Learning rate used to fine-tune the encoder. Note that this is different from learning_rate config that will be used only for the top layer.
layerwise_decay 1.0 Decay for the layer wise learning rates. If 1.0 no decay is applied.
layer mix Layer from the pretrained encoder that we wish to extract the word embeddings. If mix uses a layer-wise attention mechanism to combine different layers.
scalar_mix_dropout mix Sets the layer-wise dropout. Ignored if layer != mix.
loss mse mse for Mean Squared Error or binary_xentfor Binary Cross Entropy.
hidden_sizes 1536,768 Hidden sizes of the different Feed-Forward layers on top.
activations Tanh Activation functions for the Feed-Forward on top.
dropout 0.1 Dropout used in the Feed-Forward on top.
final_activation Sigmoid Feed-Forward final activation function. If False the model outputs the logits