API Reference: Advanced¶
masterful.core.fit¶
- masterful.core.fit(fit_policy, model, model_spec, training_data, validation_data, unlabeled_data, synthetic_data, data_spec, **kwargs)¶
Trains the given model on the provided datasets using the policy provided.
- Parameters
fit_policy (masterful.policy.FitPolicy) – The policy to apply for training.
model (keras.engine.training.Model) – The model to analyze.
model_spec (masterful.spec.ModelSpec) – The specification for the model.
training_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for training the model. Labeled data must unbatched, and use the Keras formulation of (features, targets) for each element of data.
validation_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for validating the model. Validation data must be unbatched, and use the Keras formulation of (features, targets) for each element of data.
unlabeled_data (Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]) – A tuple or list of unlabeled datasets which can be used to improve the training of the model through semi-supervised and unsupervised techniques. Pass an empty sequence or None if no unlabeled data.
synthetic_data (Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]) – A tuple or list of labeled, synthetic data that can be used to improve the performance of the model. Pass an empty sequence or None if no synthetic data.
data_spec (masterful.spec.DataSpec) – A spec that describes the datasets.
- Returns
An instance of
masterful.FitReport
, containing the results of training the given model using the provided policy on the provided datasets.- Return type
masterful.core.FitReport
masterful.core.ensemble¶
- masterful.core.ensemble(ensemble_policy, model, model_spec, training_data, validation_data, unlabeled_data, synthetic_data, data_spec, **kwargs)¶
Similar to
masterful.core.fit()
, trains the given model on the provided datasets using the policy provided. However,masterful.core.ensemble()
trains ensemble_policy.multiplier models in sequence, and returns an ensembled model which is the joint prediction from all child models.- Parameters
ensemble_policy (masterful.policy.EnsemblePolicy) – The policy to apply for ensembling.
model (keras.engine.training.Model) – The model to analyze.
model_spec (masterful.spec.ModelSpec) – The spec for the model.
policy_path – The policy to use when training the model. This must be a policy previously found through
analyze()
.training_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for training the model. Labeled data must unbatched, and use the Keras formulation of (features, targets) for each element of data.
validation_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for validating the model. Validation data must be unbatched, and use the Keras formulation of (features, targets) for each element of data.
unlabeled_data (Optional[Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]]) – A tuple or list of unlabeled datasets which can be used to improve the training of the model through semi-supervised and unsupervised techniques. Pass an empty sequence or None if no unlabeled data.
synthetic_data (Optional[Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]]) – A tuple or list of labeled, synthetic data that can be used to improve the performance of the model. Pass an empty sequence or None if no synthetic data.
data_spec (masterful.spec.DataSpec) – A specification that describes the datasets.
- Returns
An instance of
FitReport
, containing the results of training the given models and ensembling their results using the provided policy on the provided datasets.- Return type
masterful.core.EnsembleReport
masterful.core.distill¶
- masterful.core.distill(distillation_policy, source_model, source_model_spec, target_model, target_model_spec, labeled_data, unlabeled_data, synthetic_data, data_spec, **kwargs)¶
Distills the knowledge from source_model into target_model.
- Parameters
task – The task that we are learning.
source_model (keras.engine.training.Model) – The source model that we are trying to match. This model is already trained.
target_model (keras.engine.training.Model) – The target model that we are training.
labeled_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for training the model. Labeled data must be unbatched, and use the Keras formulation of (features, targets) for each mini-batch of data.
unlabeled_data (Optional[Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]]) – [Optional] A set of unlabeled datasets which can be used to improve the training of the model through semi-supervised and unsupervised techniques.
synthetic_data (Optional[Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]]) – [Optional] A set of labeled, synthetic data that can be used to improve the performance of the model.
data_spec (masterful.spec.DataSpec) – A specification that describes the datasets.
distillation_policy (masterful.policy.DistillationPolicy) –
source_model_spec (masterful.spec.ModelSpec) –
target_model_spec (masterful.spec.ModelSpec) –
- Returns
An instance of
FitReport
, containing the results of distilling source_model into target_model.- Return type
masterful.core.FitReport
masterful.core.pretrain¶
- masterful.core.pretrain(pretrain_policy, model, labeled_data, validation_data, unlabeled_data, synthetic_data, data_spec, **kwargs)¶
Pretrain the weights of the given model using the provided datasets. The model is assumed to be the feature extractor (backbone) of a larger model, so there should be no classification heads (softmax output) in the model provided.
- Parameters
pretrain_policy (masterful.policy.PretrainPolicy) – The policy used for pretraining.
model (keras.engine.training.Model) – The model to pretrain. Models used here should have no classification head attached.
labeled_data (Optional[tensorflow.python.data.ops.dataset_ops.DatasetV2]) – The labeled data to use for training the model. Labeled data must be unbatched, and use the Keras formulation of (features, targets) for each mini-batch of data.
validation_data (Optional[tensorflow.python.data.ops.dataset_ops.DatasetV2]) – The labeled data to use for validating the model. Labeled data must be unbatched, and use the Keras formulation of (features, targets) for each mini-batch of data.
unlabeled_data (Optional[Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]]) – [Optional] A set of unlabeled datasets which can be used to improve the training of the model through semi-supervised and unsupervised techniques.
synthetic_data (Optional[Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]]) – [Optional] A set of labeled, synthetic data that can be used to improve the performance of the model.
data_spec (masterful.spec.DataSpec) – A specification that describes the datasets.
- Returns
An instance of
FitReport
, containing the results of pretraining the model. In order to measure the performance of pretraining, a small task specific head is temporarily attached and trained at the end to measure the performance of the pretraining task.- Return type
masterful.core.FitReport
masterful.core.adapt¶
- masterful.core.adapt(adapt_policy, model, model_spec, source_labeled_data, target_unlabeled_data, data_spec, source_unlabeled_data=None, target_labeled_data=None, **kwargs)¶
Adapts the given model from the source domain to the target domain, given labeled datasets in the source domain and unlabeled datasets in the target domain. The provided model is assumed to be untrained in any domain.
- Parameters
task – The task that we are learning.
model (keras.engine.training.Model) – The model we are adapting to perform well on the source and target domains.
source_labeled_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data from the source domain.
target_unlabeled_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The unlabeled data from the target domain.
source_unlabeled_data (Optional[tensorflow.python.data.ops.dataset_ops.DatasetV2]) – [Optional] The unlabeled data from the source domain.
target_labeled_data (Optional[tensorflow.python.data.ops.dataset_ops.DatasetV2]) – [Optional] The labeled data from the target domain.
adapt_policy (masterful.policy.AdaptationPolicy) –
model_spec (masterful.spec.ModelSpec) –
data_spec (masterful.spec.DataSpec) –
- Returns
An instance of
FitReport
, containing the results of adapting the model to the target domain. In order to measure the performance on the target domain, if labeled data in the target domain is provided we can measure the performance against the labeled data in the target domain.- Return type
masterful.core.FitReport
masterful.core.FitReport¶
- masterful.core.FitReport(model=None, validation_metrics=<factory>, history=None)¶
Holds the results of training a model using Masterful. This is the output of the
masterful.core.fit()
function, among others.- Parameters
model (keras.engine.training.Model) –
validation_metrics (Dict) –
history (keras.callbacks.History) –
- Return type
None
- masterful.core.model¶
The trained model.
- masterful.core.validation_metrics¶
The results of evaluating the trained model on the validation data.
- masterful.core.history¶
The full training history report, containing the results at the end of each epoch for key metrics.
masterful.core.EnsembleReport¶
- masterful.core.EnsembleReport(model=None, validation_metrics=<factory>, history=None, multiplier=None)¶
Holds the results of ensembling a group of models using Masterful. This is the output of the
masterful.core.ensemble()
function.- Parameters
model (keras.engine.training.Model) –
validation_metrics (Dict) –
history (keras.callbacks.History) –
multiplier (int) –
- Return type
None
- masterful.core.multiplier¶
The number of models trained for the ensemble.
masterful.core.find_fit_policy¶
- masterful.core.find_fit_policy(model, model_spec, training_data, validation_data, unlabeled_data, synthetic_data, data_spec, **kwargs)¶
Finds an optimal policy for use with
masterful.core.fit()
.- Parameters
model (keras.engine.training.Model) – The model to analyze.
model_spec (masterful.spec.ModelSpec) – The specification for the model.
training_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for training the model. Labeled data must unbatched, and use the Keras formulation of (features, targets) for each element of data.
validation_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for validating the model. Validation data must be unbatched, and use the Keras formulation of (features, targets) for each element of data.
unlabeled_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – A tuple or list of unlabeled datasets which can be used to improve the training of the model through semi-supervised and unsupervised techniques. Pass an empty sequence or None if no unlabeled data.
synthetic_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – A tuple or list of labeled, synthetic data that can be used to improve the performance of the model. Pass an empty sequence or None if no synthetic data.
data_spec (masterful.spec.DataSpec) – A spec that describes the datasets.
- Returns
An instance of
masterful.FitPolicy
that can be used withmasterful.core.fit()
.
masterful.core.find_standard_loss¶
- masterful.core.find_standard_loss(model_spec, data_spec)¶
Finds a prototypical loss policy given a
masterful.ModelSpec
andmasterful.DataSpec
.- Parameters
model_spec (masterful.spec.ModelSpec) – Specification of the model to be trained.
data_spec (masterful.spec.DataSpec) – Specification of the data used in training.
- Returns
A loss policy for use in training. A loss policy is a dictionary with the following keys:
- loss_types:
tf.keras.losses.Loss
The types of loss functions to use in training.- loss_configs:
Loss specific configuration. This will be passed to tf.keras.losses.Loss.from_config(loss_config).
- metrics_types:
tf.keras.metrics.Metric
The types of metrics associated with the above losses, to be reported during training.- metrics_configs:
Metrics specific configurations. This will be passed as kwargs to the metric type initializer.
- Return type
Dict
masterful.core.find_batch_size¶
- masterful.core.find_batch_size(model, model_spec, sample_data, data_spec)¶
Finds optimal batch size.
The algorithm is exponential binary search, guaranteeing that the search will run in less than math:log_2($ ext{batchsize}$).
- Parameters
model (keras.engine.training.Model) – The model to be used in training.
model_spec (masterful.spec.ModelSpec) – Specification of the model to be trained.
sample_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – Sample data used in finding the batch size. Does not have to be actual training data, but must be correctly typed and ranged to allow the optimizer to backprop.
data_spec (masterful.spec.DataSpec) – Specification of the sample data.
- Returns
An estimate of the best batch size to use.
- Return type
int
masterful.core.find_max_learning_rate¶
- masterful.core.find_max_learning_rate(batch_size, optimizer_type)¶
Finds optimal max learning rate.
- Parameters
batch_size (int) – The batch size used in training.
optimizer_type (type) – The type of optimizer used in training.
- Returns
An estimate of the optimal maximum learning rate.
- Return type
float
masterful.core.find_optimizer_policy¶
- masterful.core.find_optimizer_policy(model, model_spec, training_data, validation_data, data_spec, batch_size, **kwargs)¶
Finds an optimizer policy that approximates the ideal policy.
- Returns
- optimizer_type:
tf.keras.optimizers.Optimizer. The type of optimizer to use. See note on learning rates.
- optimizer_config:
Optimizer specific configuration. This will be passed to tf.keras.optimizers.Optimizer.from_config(optimizer_config). See note on learning rates.
- learning_rate_callback_type:
To control learning rate using a callback, this attribute holds the type of a callback and the learning_rate_callback_config holds the config.
- learning_rate_callback_config:
This dictionary is passed as keyword args to learning_rate_callback_type. See note on learning rates.
- learning_rate_schedule:
A callable that matches the signature f(step)->lr. See note on learning rates.
- epochs_callback_type:
To control early stopping behavior, an early stopping callback’s type is stored in epochs_callback_type.
- epochs_callback_config:
This dictionary is passed as keyword args to epochs_callback_type.
- epochs:
A fixed number of epochs to run. Either this value or the epochs callback type and epochs_callback_config must be set.
- warmup_initial_lr:
The initial learning rate to start warming up a model.
- warmup_final_lr:
The final learning rate to finish warming up a model.
- warmup_steps:
The number of steps to warm up a model.
- Return type
A dictionary with the following keys
- Parameters
model (keras.engine.training.Model) –
model_spec (masterful.spec.ModelSpec) –
training_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) –
validation_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) –
data_spec (masterful.spec.DataSpec) –
batch_size (int) –
masterful.core.find_augmentation_policy¶
- masterful.core.find_augmentation_policy(model, model_spec, training_data, validation_data, synthetic_data, data_spec, batch_size, optimizer_type, optimizer_config, loss, loss_weights, warmup_initial_lr, warmup_final_lr, warmup_steps, **kwargs)¶
Analyzes the model and provided datasets, and returns the optimal policy to train the model given the provided datasets.
- Parameters
model (keras.engine.training.Model) – The model to analyze.
model_spec (masterful.spec.ModelSpec) – Specification of the model being trained.
training_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for training the model. Labeled data must be unbatched, and use the Keras formulation of (features, targets) for each sample.
validation_data (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled data to use for validating the model. Validation data must be unbatched, and use the Keras formulation of (features, targets) for each mini-batch of data.
synthetic_data (Optional[Sequence[tensorflow.python.data.ops.dataset_ops.DatasetV2]]) –
data_spec (masterful.spec.DataSpec) – Specification of the datasets to use for analysis.
batch_size (int) – Batch size to use in finding the policy.
optimizer_type – tbd
optimizer_config – tbd
loss – tbd
loss_weights – tbd
warmup_initial_lr (float) – tbd
warmup_final_lr (float) – tbd
warmup_steps (int) – tbd
- Returns
The optimal augmentation policy for training the model with the provided datasets. An augmentation policy is a dictionary with the following keys (for more information on each key, see the corresponding key under
masterful.FitPolicy
): - mirror - rot90 - rotate - hsv - contrast - blur - spatial - hsv_clustering - contrast_clustering - blur_clustering - spatial_clustering - mixup - cutmix - synthetic_proportion- Return type
Dict