API Reference: Advanced

masterful.core.fit

masterful.core.fit(fit_policy, model, model_spec, training_data, validation_data, unlabeled_data, synthetic_data, data_spec, **kwargs)

Trains the given model on the provided datasets using the policy provided.

Parameters
  • fit_policy (masterful.policy.FitPolicy) – The policy to apply for training.

  • model (keras.engine.training.Model) – The model to analyze.

  • model_spec (masterful.spec.ModelSpec) – The specification for the model.

  • training_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for training the model. Labeled data must unbatched, and use the Keras formulation of (features, targets) for each element of data.

  • validation_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for validating the model. Validation data must be unbatched, and use the Keras formulation of (features, targets) for each element of data.

  • unlabeled_data (Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]) – A tuple or list of unlabeled datasets which can be used to improve the training of the model through semi-supervised and unsupervised techniques. Pass an empty sequence or None if no unlabeled data.

  • synthetic_data (Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]) – A tuple or list of labeled, synthetic data that can be used to improve the performance of the model. Pass an empty sequence or None if no synthetic data.

  • data_spec (masterful.spec.DataSpec) – A spec that describes the datasets.

Returns

An instance of masterful.FitReport, containing the results of training the given model using the provided policy on the provided datasets.

Return type

masterful.core.FitReport

masterful.core.ensemble

masterful.core.ensemble(ensemble_policy, model, model_spec, training_data, validation_data, unlabeled_data, synthetic_data, data_spec, **kwargs)

Similar to masterful.core.fit(), trains the given model on the provided datasets using the policy provided. However, masterful.core.ensemble() trains ensemble_policy.multiplier models in sequence, and returns an ensembled model which is the joint prediction from all child models.

Parameters
  • ensemble_policy (masterful.policy.EnsemblePolicy) – The policy to apply for ensembling.

  • model (keras.engine.training.Model) – The model to analyze.

  • model_spec (masterful.spec.ModelSpec) – The spec for the model.

  • policy_path – The policy to use when training the model. This must be a policy previously found through analyze().

  • training_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for training the model. Labeled data must unbatched, and use the Keras formulation of (features, targets) for each element of data.

  • validation_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for validating the model. Validation data must be unbatched, and use the Keras formulation of (features, targets) for each element of data.

  • unlabeled_data (Optional[Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]]) – A tuple or list of unlabeled datasets which can be used to improve the training of the model through semi-supervised and unsupervised techniques. Pass an empty sequence or None if no unlabeled data.

  • synthetic_data (Optional[Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]]) – A tuple or list of labeled, synthetic data that can be used to improve the performance of the model. Pass an empty sequence or None if no synthetic data.

  • data_spec (masterful.spec.DataSpec) – A specification that describes the datasets.

Returns

An instance of FitReport, containing the results of training the given models and ensembling their results using the provided policy on the provided datasets.

Return type

masterful.core.EnsembleReport

masterful.core.distill

masterful.core.distill(distillation_policy, source_model, source_model_spec, target_model, target_model_spec, labeled_data, unlabeled_data, synthetic_data, data_spec, **kwargs)

Distills the knowledge from source_model into target_model.

Parameters
  • task – The task that we are learning.

  • source_model (keras.engine.training.Model) – The source model that we are trying to match. This model is already trained.

  • target_model (keras.engine.training.Model) – The target model that we are training.

  • labeled_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for training the model. Labeled data must be unbatched, and use the Keras formulation of (features, targets) for each mini-batch of data.

  • unlabeled_data (Optional[Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]]) – [Optional] A set of unlabeled datasets which can be used to improve the training of the model through semi-supervised and unsupervised techniques.

  • synthetic_data (Optional[Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]]) – [Optional] A set of labeled, synthetic data that can be used to improve the performance of the model.

  • data_spec (masterful.spec.DataSpec) – A specification that describes the datasets.

  • distillation_policy (masterful.policy.DistillationPolicy) –

  • source_model_spec (masterful.spec.ModelSpec) –

  • target_model_spec (masterful.spec.ModelSpec) –

Returns

An instance of FitReport, containing the results of distilling source_model into target_model.

Return type

masterful.core.FitReport

masterful.core.pretrain

masterful.core.pretrain(pretrain_policy, model, labeled_data, validation_data, unlabeled_data, synthetic_data, data_spec, **kwargs)

Pretrain the weights of the given model using the provided datasets. The model is assumed to be the feature extractor (backbone) of a larger model, so there should be no classification heads (softmax output) in the model provided.

Parameters
  • pretrain_policy (masterful.policy.PretrainPolicy) – The policy used for pretraining.

  • model (keras.engine.training.Model) – The model to pretrain. Models used here should have no classification head attached.

  • labeled_data (Optional[tensorflow.python.data.ops.dataset_ops.DatasetV2]) – The labeled data to use for training the model. Labeled data must be unbatched, and use the Keras formulation of (features, targets) for each mini-batch of data.

  • validation_data (Optional[tensorflow.python.data.ops.dataset_ops.DatasetV2]) – The labeled data to use for validating the model. Labeled data must be unbatched, and use the Keras formulation of (features, targets) for each mini-batch of data.

  • unlabeled_data (Optional[Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]]) – [Optional] A set of unlabeled datasets which can be used to improve the training of the model through semi-supervised and unsupervised techniques.

  • synthetic_data (Optional[Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]]) – [Optional] A set of labeled, synthetic data that can be used to improve the performance of the model.

  • data_spec (masterful.spec.DataSpec) – A specification that describes the datasets.

Returns

An instance of FitReport, containing the results of pretraining the model. In order to measure the performance of pretraining, a small task specific head is temporarily attached and trained at the end to measure the performance of the pretraining task.

Return type

masterful.core.FitReport

masterful.core.adapt

masterful.core.adapt(adapt_policy, model, model_spec, source_labeled_data, target_unlabeled_data, data_spec, source_unlabeled_data=None, target_labeled_data=None, **kwargs)

Adapts the given model from the source domain to the target domain, given labeled datasets in the source domain and unlabeled datasets in the target domain. The provided model is assumed to be untrained in any domain.

Parameters
  • task – The task that we are learning.

  • model (keras.engine.training.Model) – The model we are adapting to perform well on the source and target domains.

  • source_labeled_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data from the source domain.

  • target_unlabeled_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The unlabeled data from the target domain.

  • source_unlabeled_data (Optional[tensorflow.python.data.ops.dataset_ops.DatasetV2]) – [Optional] The unlabeled data from the source domain.

  • target_labeled_data (Optional[tensorflow.python.data.ops.dataset_ops.DatasetV2]) – [Optional] The labeled data from the target domain.

  • adapt_policy (masterful.policy.AdaptationPolicy) –

  • model_spec (masterful.spec.ModelSpec) –

  • data_spec (masterful.spec.DataSpec) –

Returns

An instance of FitReport, containing the results of adapting the model to the target domain. In order to measure the performance on the target domain, if labeled data in the target domain is provided we can measure the performance against the labeled data in the target domain.

Return type

masterful.core.FitReport

masterful.core.FitReport

masterful.core.FitReport(model=None, validation_metrics=<factory>, history=None)

Holds the results of training a model using Masterful. This is the output of the masterful.core.fit() function, among others.

Parameters
  • model (keras.engine.training.Model) –

  • validation_metrics (Dict) –

  • history (keras.callbacks.History) –

Return type

None

masterful.core.model

The trained model.

masterful.core.validation_metrics

The results of evaluating the trained model on the validation data.

masterful.core.history

The full training history report, containing the results at the end of each epoch for key metrics.

masterful.core.EnsembleReport

masterful.core.EnsembleReport(model=None, validation_metrics=<factory>, history=None, multiplier=None)

Holds the results of ensembling a group of models using Masterful. This is the output of the masterful.core.ensemble() function.

Parameters
  • model (keras.engine.training.Model) –

  • validation_metrics (Dict) –

  • history (keras.callbacks.History) –

  • multiplier (int) –

Return type

None

masterful.core.multiplier

The number of models trained for the ensemble.

masterful.core.find_fit_policy

masterful.core.find_fit_policy(model, model_spec, training_data, validation_data, unlabeled_data, synthetic_data, data_spec, **kwargs)

Finds an optimal policy for use with masterful.core.fit().

Parameters
  • model (keras.engine.training.Model) – The model to analyze.

  • model_spec (masterful.spec.ModelSpec) – The specification for the model.

  • training_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for training the model. Labeled data must unbatched, and use the Keras formulation of (features, targets) for each element of data.

  • validation_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for validating the model. Validation data must be unbatched, and use the Keras formulation of (features, targets) for each element of data.

  • unlabeled_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – A tuple or list of unlabeled datasets which can be used to improve the training of the model through semi-supervised and unsupervised techniques. Pass an empty sequence or None if no unlabeled data.

  • synthetic_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – A tuple or list of labeled, synthetic data that can be used to improve the performance of the model. Pass an empty sequence or None if no synthetic data.

  • data_spec (masterful.spec.DataSpec) – A spec that describes the datasets.

Returns

An instance of masterful.FitPolicy that can be used with masterful.core.fit().

masterful.core.find_standard_loss

masterful.core.find_standard_loss(model_spec, data_spec)

Finds a prototypical loss policy given a masterful.ModelSpec and masterful.DataSpec.

Parameters
Returns

A loss policy for use in training. A loss policy is a dictionary with the following keys:

loss_types:

tf.keras.losses.Loss The types of loss functions to use in training.

loss_configs:

Loss specific configuration. This will be passed to tf.keras.losses.Loss.from_config(loss_config).

metrics_types:

tf.keras.metrics.Metric The types of metrics associated with the above losses, to be reported during training.

metrics_configs:

Metrics specific configurations. This will be passed as kwargs to the metric type initializer.

Return type

Dict

masterful.core.find_batch_size

masterful.core.find_batch_size(model, model_spec, sample_data, data_spec)

Finds optimal batch size.

The algorithm is exponential binary search, guaranteeing that the search will run in less than math:log_2($ ext{batchsize}$).

Parameters
  • model (keras.engine.training.Model) – The model to be used in training.

  • model_spec (masterful.spec.ModelSpec) – Specification of the model to be trained.

  • sample_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – Sample data used in finding the batch size. Does not have to be actual training data, but must be correctly typed and ranged to allow the optimizer to backprop.

  • data_spec (masterful.spec.DataSpec) – Specification of the sample data.

Returns

An estimate of the best batch size to use.

Return type

int

masterful.core.find_max_learning_rate

masterful.core.find_max_learning_rate(batch_size, optimizer_type)

Finds optimal max learning rate.

Parameters
  • batch_size (int) – The batch size used in training.

  • optimizer_type (type) – The type of optimizer used in training.

Returns

An estimate of the optimal maximum learning rate.

Return type

float

masterful.core.find_optimizer_policy

masterful.core.find_optimizer_policy(model, model_spec, training_data, validation_data, data_spec, batch_size, **kwargs)

Finds an optimizer policy that approximates the ideal policy.

Returns

optimizer_type:

tf.keras.optimizers.Optimizer. The type of optimizer to use. See note on learning rates.

optimizer_config:

Optimizer specific configuration. This will be passed to tf.keras.optimizers.Optimizer.from_config(optimizer_config). See note on learning rates.

learning_rate_callback_type:

To control learning rate using a callback, this attribute holds the type of a callback and the learning_rate_callback_config holds the config.

learning_rate_callback_config:

This dictionary is passed as keyword args to learning_rate_callback_type. See note on learning rates.

learning_rate_schedule:

A callable that matches the signature f(step)->lr. See note on learning rates.

epochs_callback_type:

To control early stopping behavior, an early stopping callback’s type is stored in epochs_callback_type.

epochs_callback_config:

This dictionary is passed as keyword args to epochs_callback_type.

epochs:

A fixed number of epochs to run. Either this value or the epochs callback type and epochs_callback_config must be set.

warmup_initial_lr:

The initial learning rate to start warming up a model.

warmup_final_lr:

The final learning rate to finish warming up a model.

warmup_steps:

The number of steps to warm up a model.

Return type

A dictionary with the following keys

Parameters
  • model (keras.engine.training.Model) –

  • model_spec (masterful.spec.ModelSpec) –

  • training_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) –

  • validation_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) –

  • data_spec (masterful.spec.DataSpec) –

  • batch_size (int) –

masterful.core.find_augmentation_policy

masterful.core.find_augmentation_policy(model, model_spec, training_data, validation_data, synthetic_data, data_spec, batch_size, optimizer_type, optimizer_config, loss, loss_weights, warmup_initial_lr, warmup_final_lr, warmup_steps, **kwargs)

Analyzes the model and provided datasets, and returns the optimal policy to train the model given the provided datasets.

Parameters
  • model (keras.engine.training.Model) – The model to analyze.

  • model_spec (masterful.spec.ModelSpec) – Specification of the model being trained.

  • training_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for training the model. Labeled data must be unbatched, and use the Keras formulation of (features, targets) for each sample.

  • validation_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for validating the model. Validation data must be unbatched, and use the Keras formulation of (features, targets) for each mini-batch of data.

  • synthetic_data (Optional[Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]]) –

  • data_spec (masterful.spec.DataSpec) – Specification of the datasets to use for analysis.

  • batch_size (int) – Batch size to use in finding the policy.

  • optimizer_type – tbd

  • optimizer_config – tbd

  • loss – tbd

  • loss_weights – tbd

  • warmup_initial_lr (float) – tbd

  • warmup_final_lr (float) – tbd

  • warmup_steps (int) – tbd

Returns

The optimal augmentation policy for training the model with the provided datasets. An augmentation policy is a dictionary with the following keys (for more information on each key, see the corresponding key under masterful.FitPolicy): - mirror - rot90 - rotate - hsv - contrast - blur - spatial - hsv_clustering - contrast_clustering - blur_clustering - spatial_clustering - mixup - cutmix - synthetic_proportion

Return type

Dict