API Reference: SSL

masterful.ssl.SemiSupervisedParams

class masterful.ssl.SemiSupervisedParams(algorithms=<factory>)

Parameters which control the semi-supervised learning aspects of Masterful training.

In this context, semi-supervised learning incorporates self training, self-supervised learning, and traditional semi-supervised learning (any learning with a combination of labeled and unlabeled data).

Parameters

algorithms (Optional[Sequence[str]]) – An optional list of semi-supervised learning algorithms to use during training. Can be any combination of [“noisy_student”, “barlow_twins”]. Defaults to [“noisy_student”]

Return type

None

masterful.ssl.learn_ssl_params

masterful.ssl.learn_ssl_params(training_dataset, training_dataset_params, unlabeled_datasets=None, synthetic_datasets=None)

Learns the optimal set of semi-supervised learning parameters to use during training.

Parameters
  • training_dataset (tensorflow.python.data.ops.dataset_ops.DatasetV2) – The labeled dataset to use during training.

  • training_dataset_params (masterful.data.DataParams) – The parameters of the labeled dataset.

  • unlabeled_datasets (Optional[Sequence[Tuple[tensorflow.python.data.ops.dataset_ops.DatasetV2, masterful.data.DataParams]]]) – Optional sequence of unlabled datasets and their parameters, to use during training. If an unlabeled dataset is specified, then a set of algorithms must be specified in ssl_params otherwise this will have no effect.

  • synthetic_datasets (Optional[Sequence[Tuple[tensorflow.python.data.ops.dataset_ops.DatasetV2, masterful.data.DataParams]]]) – Optional sequence of synthetic data and parameters to use during training. The amount of synthetic data used during training is controlled by masterful.regularization.RegularizationParams.synthetic_proportion.

Return type

masterful.ssl.params.SemiSupervisedParams

masterful.ssl.analyze_data_then_save_to

masterful.ssl.analyze_data_then_save_to(*args, **kwargs)

Analyze labeled and unlabeled data then save intermediate results to disk.

Please see the Simple Semi-Supervised Learning Recipe for more details.

Parameters
  • model

    A trained model. The output must be probabilities, in other words, your model’s final layer should be a softmax or sigmoid activation.

    If your model finishes with a tf.keras.layers.Dense layer, without an activation, then it’s said to be ‘outputting logits’. In that case, typically you’ll use a loss function initialized with from_logits=True. If this describes your model, you can simply attach an extra sigmoid or softmax activation to your model and pass the new model into this function. You do not need to change your original model, loss function, or training loop. For example, the model below outputs logits: ``` m = tf.keras.Sequential([tf.keras.Input((32,32,3)),

    tf.keras.layers.Dense(10)])

    ```

    To use this model, attach a softmax activation: ` activated_model = tf.keras.Sequential([m, tf.keras.layers.Softmax()]) masterful.ssl.save_data(activated_model, ...) `

  • architecture_params – Parameters about the model architecture.

  • labeled_training_data – Labeled training data as a tf.data.Dataset. The data should be batched. Each example should have the following structure: (original_images, original_labels).

  • labeled_training_data_params – Params that describe the labeled training data.

  • unlabeled_training_data – Unlabeled training data as a tf.data.Dataset. The data should be batched. Each example should be a tensor of images.

  • unlabeled_training_data_params – Params that describe the unlabeled training data.

  • path – The filepath to save to.

  • Raises – ValueError: If the path is empty or malformed.

masterful.ssl.load_from

masterful.ssl.load_from(path, unlabeled_weight=1.0)

Load data from disk into a tf.data.Dataset.

Please see the Simple Semi-Supervised Learning Recipe for more details.

Parameters
  • path (str) – The location on disk to load from.

  • unlabeled_weight (Optional[float]) – A weighting for the unlabeled data.

Returns

A dataset ready to be trained against. The dataset is unbatched. If unlabeled_weight is specified, and not set to 1.0, the dataset elements are (image, label, weight), where weight is 1.0 for labeled data unlabeled_weight for unlabeled data. Otherwise, the dataset elements are (image, label).

Return type

tf.data.Dataset

Raises
  • ValueError – If the path is empty or malformed.

  • FileNotFound – If the path does not point to a valid file on disk.