src.actions.actionWrapper module

Action Wrapper Module

This module provides high-level wrapper functions and utilities for complex data processing workflows within the SNN2 neural network framework. It contains specialized actions for:

Dataset separation based on quality thresholds (good/bad/gray classification)
Duration-based experiment grouping and analysis
Feature statistics computation (mean, std, min, max)
Data normalization operations (z-score and min-max)
Triplet generation for metric learning (including gray zone handling)
Random sampling and dataset balancing utilities
Action selection and dynamic function dispatch

The module serves as a high-level interface for combining multiple basic actions into complex data processing pipelines, particularly for network performance analysis and anomalous behavior detection.

Functions

check_df_not_nonefunction: Utility function to validate DataFrame inputs.
GoodBadGraySeparationfunction: Separate experiments into good, bad, and gray categories based on performance thresholds.
durationSeparationfunction: Group experiments by duration for comparative analysis.
featureMean, featureStd, featureMax, featureMinfunction: Compute statistical measures across feature dimensions.
normalizefunction: Apply z-score normalization using provided mean and standard deviation.
normalizeMinMaxfunction: Apply min-max normalization to dataset tuples.
generateTripletsNGfunction: Generate triplets from anchor data excluding difficult samples.
generateTripletsfunction: Generate triplets for metric learning with class balancing.
generatePredictionTripletsfunction: Create prediction triplets from separate anchor, positive, and negative sets.
GenerateGrayTripletsfunction: Generate triplets specifically handling gray zone samples.
print_datafunction: Utility function for debugging data inspection.
action_selectorfunction: Dynamic action selection and execution.
GoodBad_randomGrayfunction: Create random gray samples from good and bad datasets.
get_randomGrayfunction: Extract random gray samples with specified portions (incomplete implementation).

Notes

All functions in this module use the @action decorator for consistent tracking within the SNN2 framework. Many functions are specifically designed for network performance analysis with delay and packet drop rate thresholds.

The module includes sophisticated triplet generation algorithms for metric learning, with special handling for difficult/gray zone samples that don’t clearly belong to either good or bad categories.

Examples

Basic dataset separation workflow:

>>> # Separate experiments based on performance thresholds
>>> separated_df = GoodBadGraySeparation(
...     df, delay_lower=20.0, delay_upper=50.0,
...     drop_lower=0.0, drop_upper=0.7
... )
>>>
>>> # Generate triplets for metric learning
>>> anchor_data = DataManager()
>>> triplets = generateTripletsNG(anchor_data)
>>>
>>> # Apply normalization
>>> normalized_data = normalize(data, mean_tensor, std_tensor)

See also

SNN2.src.actions.separation: Basic separation utilities
SNN2.src.core.data.DataManager: DataManager class for data handling
SNN2.src.decorators.decorators: Action decorator implementation

src.actions.actionWrapper.GenerateGrayTriplets(anchor: ~SNN2.src.decorators.decorators.c_logger.<locals>.augmented_cls, positive: ~SNN2.src.decorators.decorators.c_logger.<locals>.augmented_cls, negative: ~SNN2.src.decorators.decorators.c_logger.<locals>.augmented_cls, keep_all: bool = False, *, logger=None, write_msg=<function f_logger.<locals>.__dummy_log>, **kwargs) → Dict[str, Dict[str, Any]]

Generate gray triplets with comprehensive statistics and metadata.

This function creates triplets for gray (uncertain) samples while collecting detailed statistics about the triplet generation process. It’s specifically designed for handling uncertain or ambiguous data points in metric learning.

Parameters:

anchor (DataManager) – DataManager containing anchor samples (typically gray/uncertain samples).
positive (DataManager) – DataManager containing positive samples (same class as anchors).
negative (DataManager) – DataManager containing negative samples (different class from anchors).
keep_all (bool, default=False) – Whether to preserve all intermediate data and statistics.
**kwargs (dict) – Additional keyword arguments including: - logger: Logger instance for recording operations - write_msg: Function for writing log messages

Returns:

Nested dictionary containing: - Triplet data organized by categories - Statistics about triplet generation - Metadata about data shapes and distributions - Debug information and logs

Return type:

Dict[str, Dict[str, Any]]

Notes

This function is specialized for gray sample processing and includes: 1. Enhanced logging and debugging capabilities 2. Detailed statistics collection during triplet creation 3. Shape validation and dimension checking 4. Flexible data preservation options

The gray triplet generation process: - Creates balanced triplets using uncertain samples as anchors - Maintains proper positive/negative relationships - Collects comprehensive metadata for analysis - Supports various data retention policies

Examples

>>> # Generate gray triplets with full statistics
>>> gray_triplets = GenerateGrayTriplets(
...     anchor=gray_samples,
...     positive=good_samples,
...     negative=bad_samples,
...     keep_all=True
... )
>>> print(f"Generated {len(gray_triplets['data'])} gray triplet sets")

src.actions.actionWrapper.GoodBadGraySeparation(df: DataFrame = None, delay_lower: float = 20.0, delay_upper: float = 50.0, drop_lower: float = 0.0, drop_upper: float = 0.7) → DataFrame

Separate experiments into good, bad, and gray categories based on performance thresholds.

This function classifies network experiments based on delay and packet drop rate performance metrics, creating three categories: good (acceptable performance), bad (poor performance), and gray (uncertain/intermediate performance).

Parameters:

df (pd.DataFrame, optional) – Input DataFrame containing experiment data with columns: - ‘exp_id’: Experiment identifier - ‘problem’: Problem type (‘good’, ‘delay’, ‘drop’) - ‘value’: Performance metric value
delay_lower (float, default=20.0) – Lower threshold for delay classification. Values below this are considered good.
delay_upper (float, default=50.0) – Upper threshold for delay classification. Values above this are considered bad.
drop_lower (float, default=0.0) – Lower threshold for drop rate classification. Values below this are considered good.
drop_upper (float, default=0.7) – Upper threshold for drop rate classification. Values above this are considered bad.

Returns:

DataFrame with added ‘Dataset’ column indicating classification: - ‘good’: Good performance experiments - ‘bad’: Poor performance experiments - ‘gray’: Intermediate/uncertain performance experiments

Return type:

pd.DataFrame

Notes

Classification logic: - Good: problem=’good’ OR (problem=’delay’ AND value < delay_lower) OR

(problem=’drop’ AND value < drop_lower)

Bad: (problem=’delay’ AND value > delay_upper) OR
(problem=’drop’ AND value > drop_upper)
Gray: Experiments with values between lower and upper thresholds

The function assumes experiments with problem=’good’ are inherently good regardless of other metrics.

Examples

>>> # Classify experiments with custom thresholds
>>> classified_df = GoodBadGraySeparation(
...     df, delay_lower=15.0, delay_upper=60.0,
...     drop_lower=0.05, drop_upper=0.8
... )
>>> print(classified_df['Dataset'].value_counts())

src.actions.actionWrapper.GoodBad_randomGray(df: DataFrame | None = None, exp_column: str = 'op_id', wdw_column: str = 'window', sep_column: str = 'anomaly_window', new_column: str = 'Dataset', gray_portions: List[float] = [0.1, 0.1]) → DataFrame

Create random gray samples from good and bad datasets.

This function separates data into good and bad categories based on an anomaly flag, then randomly selects portions from both categories to create a gray/uncertain category. This is useful for creating balanced datasets with uncertain samples.

Parameters:

df (Optional[pd.DataFrame], optional) – Input DataFrame containing experiment data.
exp_column (str, default="op_id") – Column name containing experiment/operation identifiers.
wdw_column (str, default="window") – Column name containing window identifiers.
sep_column (str, default="anomaly_window") – Column name containing binary anomaly flags (0=good, 1=bad).
new_column (str, default="Dataset") – Name of the new column to store dataset classifications.
gray_portions (List[float], default=[0.1, 0.1]) – Portions of good and bad samples to convert to gray category. [good_portion, bad_portion] where each value is between 0 and 1.

Returns:

DataFrame with added classification column containing ‘good’, ‘bad’, or ‘gray’ labels.

Return type:

pd.DataFrame

Notes

The function operates at the window level, selecting entire windows rather than individual samples. This ensures temporal consistency within windows.

Gray samples are created by: 1. Identifying unique (experiment, window) combinations 2. Randomly selecting specified portions from good and bad categories 3. Relabeling selected windows as ‘gray’

The gray_portions parameter currently uses only the first value for both good and bad portions.

Examples

>>> # Create gray samples using 15% from each category
>>> balanced_df = GoodBad_randomGray(
...     df, gray_portions=[0.15, 0.15],
...     exp_column='experiment_id',
...     sep_column='is_anomaly'
... )
>>> print(balanced_df['Dataset'].value_counts())

src.actions.actionWrapper.action_selector(obj, *args, **kwargs)

Dynamic action selection and execution.

This function provides dynamic dispatch for registered actions, allowing runtime selection and execution of action functions based on string identifiers.

Parameters:

obj (str) – String identifier for the action to execute. Must be registered in the global actions dictionary.
*args (tuple) – Positional arguments to pass to the selected action function.
**kwargs (dict) – Keyword arguments to pass to the selected action function.

Returns:

Return value from the executed action function.

Return type:

Any

Raises:

ValueError – If the specified action is not found in the actions registry.

Notes

This function relies on the global ‘actions’ dictionary that contains mappings from string identifiers to action functions. Actions must be registered using the @action decorator to be available.

Examples

>>> # Execute action by name
>>> result = action_selector('normalize', data, mean, std)
>>> # Equivalent to: result = normalize(data, mean, std)

src.actions.actionWrapper.check_df_not_none(df: DataFrame) → None

Validate that a DataFrame input is not None.

This utility function provides consistent DataFrame validation across the module, ensuring that functions receive valid DataFrame inputs before processing.

Parameters:: df (pd.DataFrame) – DataFrame to validate for None values.
Returns:: Function performs validation only, no return value.
Return type:: None
Raises:: AssertionError – If the input DataFrame is None.

Examples

>>> check_df_not_none(my_dataframe)  # Passes if df is valid
>>> check_df_not_none(None)  # Raises AssertionError

src.actions.actionWrapper.durationSeparation(df: DataFrame = None) → List[DataFrame]

Group experiments by duration for comparative analysis.

This function separates experiments into groups based on their duration, enabling analysis of experiments with similar temporal characteristics. Duration is determined by the maximum ‘second’ value for each experiment.

Parameters:: df (pd.DataFrame, optional) – Input DataFrame containing experiment data with columns: - ‘exp_id’: Experiment identifier - ‘second’: Time progression within each experiment
Returns:: List of DataFrames, each containing experiments with the same duration. The number of DataFrames equals the number of unique durations found.
Return type:: List[pd.DataFrame]

Notes

The function groups experiments by their maximum ‘second’ value, which represents the duration of each experiment. This is useful for: - Comparing experiments with similar temporal scope - Analyzing duration-dependent patterns - Ensuring fair comparisons between experiments

The grouping preserves all original data while organizing it by duration.

Examples

>>> # Separate experiments by duration
>>> duration_groups = durationSeparation(experiment_df)
>>> print(f"Found {len(duration_groups)} different durations")
>>> for i, group in enumerate(duration_groups):
...     duration = group.groupby('exp_id')['second'].max().iloc[0]
...     print(f"Group {i}: {len(group['exp_id'].unique())} experiments of {duration}s")

src.actions.actionWrapper.featureMax(data: Tensor) → Tensor

Compute maximum values across all samples for each feature dimension.

This function flattens the input tensor and computes the maximum across all samples while preserving the feature dimension structure.

Parameters:: data (tf.Tensor) – Input tensor with shape (…, n_features) where the last dimension represents feature channels.
Returns:: 1D tensor of shape (n_features,) containing maximum values for each feature.
Return type:: tf.Tensor

Examples

>>> data = tf.random.normal([100, 10, 5])  # 100 samples, 10 timesteps, 5 features
>>> maxs = featureMax(data)
>>> print(maxs.shape)  # (5,)

src.actions.actionWrapper.featureMean(data: Tensor) → Tensor

Compute mean values across all samples for each feature dimension.

This function flattens the input tensor and computes the mean across all samples while preserving the feature dimension structure.

Parameters:: data (tf.Tensor) – Input tensor with shape (…, n_features) where the last dimension represents feature channels.
Returns:: 1D tensor of shape (n_features,) containing mean values for each feature.
Return type:: tf.Tensor

Examples

>>> data = tf.random.normal([100, 10, 5])  # 100 samples, 10 timesteps, 5 features
>>> means = featureMean(data)
>>> print(means.shape)  # (5,)

src.actions.actionWrapper.featureMin(data: Tensor) → Tensor

Compute minimum values across all samples for each feature dimension.

This function flattens the input tensor and computes the minimum across all samples while preserving the feature dimension structure.

Parameters:: data (tf.Tensor) – Input tensor with shape (…, n_features) where the last dimension represents feature channels.
Returns:: 1D tensor of shape (n_features,) containing minimum values for each feature.
Return type:: tf.Tensor

Examples

>>> data = tf.random.normal([100, 10, 5])  # 100 samples, 10 timesteps, 5 features
>>> mins = featureMin(data)
>>> print(mins.shape)  # (5,)

src.actions.actionWrapper.featureStd(data: Tensor) → Tensor

Compute standard deviation across all samples for each feature dimension.

This function flattens the input tensor and computes the standard deviation across all samples while preserving the feature dimension structure.

Parameters:: data (tf.Tensor) – Input tensor with shape (…, n_features) where the last dimension represents feature channels.
Returns:: 1D tensor of shape (n_features,) containing standard deviation values for each feature.
Return type:: tf.Tensor

Examples

>>> data = tf.random.normal([100, 10, 5])  # 100 samples, 10 timesteps, 5 features
>>> stds = featureStd(data)
>>> print(stds.shape)  # (5,)

src.actions.actionWrapper.generatePredictionTriplets(anchor: ~SNN2.src.decorators.decorators.c_logger.<locals>.augmented_cls, positive: ~SNN2.src.decorators.decorators.c_logger.<locals>.augmented_cls, negative: ~SNN2.src.decorators.decorators.c_logger.<locals>.augmented_cls, keep_tf_dft: bool = False, keep_anchor_wdw: bool = False, keep_all_wdw: bool = False, *, logger=None, write_msg=<function f_logger.<locals>.__dummy_log>, **kwargs) → augmented_cls

Generate triplets for prediction tasks using DataManager objects.

This function creates triplets (anchor, positive, negative) from separate DataManager objects for training neural networks in a metric learning setup. It handles data alignment and repetition to ensure proper triplet formation.

Parameters:

anchor (DataManager) – DataManager containing anchor samples.
positive (DataManager) – DataManager containing positive samples (same class as anchors).
negative (DataManager) – DataManager containing negative samples (different class from anchors).
keep_tf_dft (bool, default=False) – Whether to preserve TensorFlow dataset format in output.
keep_anchor_wdw (bool, default=False) – Whether to preserve anchor window information.
keep_all_wdw (bool, default=False) – Whether to preserve window information for all samples.
**kwargs (dict) – Additional keyword arguments including: - logger: Logger instance for recording operations - write_msg: Function for writing log messages

Returns:

DataManager object containing generated triplets with properly aligned anchor, positive, and negative samples.

Return type:

DataManager

Notes

The function performs the following operations: 1. Extracts indices from each DataManager’s window data 2. Calculates repetition factors to align different-sized datasets 3. Repeats and tiles indices to create balanced triplets 4. Combines data from all three sources into a unified triplet structure

The repetition logic ensures that: - Each anchor has corresponding positive and negative examples - Smaller datasets are repeated to match larger ones - All triplets maintain proper class relationships

Examples

>>> # Generate triplets from separated datasets
>>> triplet_data = generatePredictionTriplets(
...     anchor=good_data,
...     positive=good_validation,
...     negative=bad_data,
...     keep_tf_dft=True
... )
>>> print(f"Generated triplets: {triplet_data.dft('Windows').shape[0]}")

src.actions.actionWrapper.generatePredictionTripletsReverse(anchor: Tuple[DatasetV2, DatasetV2], positive: Tuple[DatasetV2, DatasetV2], negative: Tuple[DatasetV2, DatasetV2]) → Tuple[Tuple[DatasetV2, DatasetV2], ...]

Generate triplets in reverse format using TensorFlow datasets.

This function creates triplets from TensorFlow dataset tuples, processing data in a reverse manner compared to the standard triplet generation. It handles tensor conversion and index alignment for metric learning.

Parameters:

anchor (Tuple[tf.data.Dataset, tf.data.Dataset]) – Tuple containing (features, targets) datasets for anchor samples.
positive (Tuple[tf.data.Dataset, tf.data.Dataset]) – Tuple containing (features, targets) datasets for positive samples.
negative (Tuple[tf.data.Dataset, tf.data.Dataset]) – Tuple containing (features, targets) datasets for negative samples.

Returns:

Nested tuple structure containing triplet datasets with properly aligned anchor, positive, and negative samples.

Return type:

Tuple[Tuple[tf.data.Dataset, tf.data.Dataset], …]

Notes

The function performs the following operations: 1. Converts TensorFlow datasets to tensors for processing 2. Extracts window features and target labels 3. Creates index ranges for each dataset 4. Aligns datasets through repetition and tiling 5. Reconstructs datasets in triplet format

The “reverse” nature refers to the processing order or structure compared to the standard generatePredictionTriplets function.

This function is optimized for TensorFlow dataset operations and maintains computational graph compatibility for training pipelines.

Examples

>>> # Generate reverse triplets from TF datasets
>>> anchor_data = (features_ds, labels_ds)
>>> positive_data = (pos_features_ds, pos_labels_ds)
>>> negative_data = (neg_features_ds, neg_labels_ds)
>>> triplets = generatePredictionTripletsReverse(
...     anchor_data, positive_data, negative_data
... )

src.actions.actionWrapper.generateTriplets(anchor: Tuple[DatasetV2, DatasetV2, DatasetV2, DatasetV2], log: LogHandler | None = None) → Tuple[DatasetV2, DatasetV2, DatasetV2, DatasetV2]

Generate triplets from a unified anchor dataset containing all sample types.

This function creates triplets from a single anchor tuple containing four TensorFlow datasets representing different aspects or views of the data. It’s designed for cases where all sample types are pre-organized in one structure.

Parameters:

anchor (Tuple[tf.data.Dataset, tf.data.Dataset, tf.data.Dataset, tf.data.Dataset]) – Tuple containing four TensorFlow datasets: - [0]: Window features dataset - [1]: Additional features or metadata - [2]: Labels or classifications - [3]: Auxiliary data or indices
log (Optional[LH], optional) – Logger instance for recording operations. If None, uses a dummy logger.

Returns:

Tuple containing four processed TensorFlow datasets with triplet structure: - Anchor samples - Positive samples - Negative samples - Associated metadata

Return type:

Tuple[tf.data.Dataset, tf.data.Dataset, tf.data.Dataset, tf.data.Dataset]

Notes

This function differs from other triplet generators by: 1. Taking a unified input structure with four components 2. Processing all data through a single anchor reference 3. Maintaining the four-dataset output structure 4. Supporting optional logging for debugging

The triplet generation process: - Converts datasets to tensors for processing - Creates index mappings for efficient access - Generates balanced positive/negative pairs - Reconstructs datasets in triplet format

Examples

>>> # Generate triplets from unified anchor structure
>>> anchor_tuple = (features_ds, meta_ds, labels_ds, indices_ds)
>>> triplets = generateTriplets(anchor_tuple, log=logger)
>>> anchor_out, pos_out, neg_out, meta_out = triplets

src.actions.actionWrapper.generateTripletsNG(anchor: ~SNN2.src.decorators.decorators.c_logger.<locals>.augmented_cls, *, logger=None, write_msg=<function f_logger.<locals>.__dummy_log>, **kwargs) → augmented_cls

Generate triplets from anchor data excluding difficult samples.

This function creates triplet datasets for metric learning by separating anchor data into good and bad samples (excluding difficult/gray samples) and then generating triplets using the GenerateGrayTriplets function.

Parameters:

anchor (DataManager) – DataManager instance containing anchor data with ‘Windows’ and ‘Classes’ keys. Classes should be encoded as: 0 (good), 1 (bad), 2 (difficult/gray).
**kwargs (dict) – Additional keyword arguments containing logger and write_msg from decorators.

Returns:

DataManager instance containing triplet data generated from good and bad samples.

Return type:

DataManager

Notes

The function filters out samples with class label 2 (difficult) and separates the remaining samples into good (class 0) and bad (class 1) categories. It then uses GenerateGrayTriplets to create the final triplet structure.

This approach ensures that triplets are generated only from clearly classified samples, avoiding ambiguity from difficult cases.

Examples

>>> anchor_data = DataManager()
>>> # ... populate anchor_data with Windows and Classes ...
>>> triplets = generateTripletsNG(anchor_data, logger=my_logger)
>>> triplet_dataset = triplets['TripletDst']['TfDataset']

src.actions.actionWrapper.get_randomGray(df: DataFrame | None = None, anomal_clm: str = 'anomalous', wdw_size: int = 120, portion: List[float] = [0.1, 0.1]) → DataFrame

Create random gray samples from normal and anomalous data.

This function separates data into normal and anomalous categories, then randomly selects portions from each to create gray (uncertain) samples. It operates at the individual sample level rather than window level.

Parameters:

df (Optional[pd.DataFrame], optional) – Input DataFrame containing the data to process.
anomal_clm (str, default="anomalous") – Column name containing binary anomaly flags (0=normal, 1=anomalous).
wdw_size (int, default=120) – Window size parameter for processing (currently unused in implementation).
portion (List[float], default=[0.1, 0.1]) – Portions of normal and anomalous samples to convert to gray category. [normal_portion, anomalous_portion] where each value is between 0 and 1.

Returns:

DataFrame with samples classified as normal, anomalous, or gray, with a new column indicating the classification.

Return type:

pd.DataFrame

Notes

The function performs the following operations: 1. Separates data based on the anomaly column 2. Randomly selects specified portions from each category 3. Creates gray samples from selected data 4. Combines all categories into a unified dataset

This differs from GoodBad_randomGray by: - Operating at individual sample level - Using different column naming conventions - Having a simpler selection mechanism

The gray sample creation helps in: - Creating balanced training sets - Handling uncertain or borderline cases - Improving model robustness to ambiguous data

Examples

>>> # Create gray samples using 15% from each category
>>> processed_df = get_randomGray(
...     df=network_data,
...     anomal_clm='is_attack',
...     portion=[0.15, 0.15]
... )
>>> print(processed_df['classification'].value_counts())

src.actions.actionWrapper.normalize(data: Tensor, mean: Tensor, std: Tensor) → Tensor

Apply z-score normalization using provided mean and standard deviation.

This function normalizes input data by subtracting the mean and dividing by the standard deviation for each feature dimension. The operation is applied element-wise across all samples.

Parameters:

data (tf.Tensor) – Input tensor to be normalized with shape (…, n_features).
mean (tf.Tensor) – Mean values for each feature dimension with shape (n_features,).
std (tf.Tensor) – Standard deviation values for each feature dimension with shape (n_features,).

Returns:

Normalized tensor with the same shape as input data.

Return type:

tf.Tensor

Notes

The function uses tf.map_fn with autograph disabled for efficient element-wise operations. The normalization formula is: (x - mean) / std.

Examples

>>> data = tf.random.normal([100, 5])
>>> mean = featureMean(data)
>>> std = featureStd(data)
>>> normalized = normalize(data, mean, std)
>>> # Verify normalization: mean ≈ 0, std ≈ 1

src.actions.actionWrapper.normalizeMinMax(data: Tuple[DatasetV2, DatasetV2], l_max: Tensor, l_min: Tensor) → Tuple[DatasetV2, DatasetV2]

Apply min-max normalization to dataset tuples.

This function applies min-max scaling to the first dataset in the tuple (typically the feature dataset) while leaving the second dataset (typically targets) unchanged.

Parameters:

data (Tuple[tf.data.Dataset, tf.data.Dataset]) – Tuple containing (features_dataset, targets_dataset).
l_max (tf.Tensor) – Maximum values for each feature dimension used for scaling.
l_min (tf.Tensor) – Minimum values for each feature dimension used for scaling.

Returns:

Tuple with normalized features dataset and unchanged targets dataset.

Return type:

Tuple[tf.data.Dataset, tf.data.Dataset]

Notes

The normalization formula is: (x - min) / (max - min), which scales values to the range [0, 1]. Only the first dataset is normalized.

Examples

>>> features_ds = tf.data.Dataset.from_tensor_slices(features)
>>> targets_ds = tf.data.Dataset.from_tensor_slices(targets)
>>> data_tuple = (features_ds, targets_ds)
>>> normalized_tuple = normalizeMinMax(data_tuple, max_vals, min_vals)

src.actions.actionWrapper.print_data(*args, df: DataFrame | None = None, **kwargs) → DataFrame | None

Utility function for debugging data inspection.

This function prints DataFrame contents and other arguments for debugging purposes while passing the DataFrame through unchanged. It’s useful for inspecting data at various stages of processing pipelines.

Parameters:

*args (tuple) – Variable arguments to be printed alongside the DataFrame.
df (Optional[pd.DataFrame], optional) – DataFrame to be printed and returned. Can be None.
**kwargs (dict) – Keyword arguments passed to the print function.

Returns:

Returns the input DataFrame unchanged, or None if input was None.

Return type:

Union[pd.DataFrame, None]

Examples

>>> # Use in a processing pipeline for debugging
>>> processed_df = some_processing_function(df)
>>> processed_df = print_data("After processing:", df=processed_df)
>>> # DataFrame is printed and continues through pipeline