assert_frame_equal#

diffly.testing.assert_frame_equal(
left: DataFrame | LazyFrame,
right: DataFrame | LazyFrame,
/,
*,
primary_key: str | Sequence[str] | None = None,
check_dtypes: bool = True,
abs_tol: float | Mapping[str, float] = 1e-08,
rel_tol: float | Mapping[str, float] = 1e-05,
abs_tol_temporal: timedelta | Mapping[str, timedelta] = datetime.timedelta(0),
show_perfect_column_matches: bool = False,
top_k_column_changes: int = 0,
sample_k_rows_only: int = 0,
show_sample_primary_key_per_change: bool = False,
left_name: str = Side.LEFT,
right_name: str = Side.RIGHT,
slim: bool = False,
hidden_columns: list[str] | None = None,
) None[source]#

Assert that two polars data frames are equal.

In contrast to polars.testing.assert_frame_equal(), this method leverages diffly’s comparison logic. This allows printing a much more comprehensive summary of the changes between two data frames, making debugging considerably more straightforward.

Parameters:
  • left – The first data frame in the comparison.

  • right – The second data frame in the comparison.

  • primary_key – Primary key columns to use for joining the data frames. If not provided, the summary of the changes between two data frames are limited. Providing join columns does NOT have any functional effect on the assert of this function.

  • check_dtypes – Whether to check that the data types of columns match exactly.

  • abs_tol – Absolute tolerance for comparing floating point types. If a Mapping is provided, it should map from column name to absolute tolerance for every column in the data frame (except the primary key).

  • rel_tol – Relative tolerance for comparing floating point types. If a Mapping is provided, it should map from column name to relative tolerance for every column in the data frame (except the primary key).

  • abs_tol_temporal – Absolute tolerance for comparing temporal types. If a Mapping is provided, it should map from column name to absolute temporal tolerance for every column in the data frame (except the primary key).

  • show_perfect_column_matches – Whether to include column matches in the assertion error even if the column match rate is 100%.

  • top_k_column_changes – The maximum number of column values changes to display for columns with a match rate below 100% in the summary. When enabling this feature, make sure that no sensitive data is leaked.

  • sample_k_rows_only – The number of rows to show in the “Rows left/right only” section of the summary. If 0 (default), no rows are shown. Only the primary key will be printed. An error will be raised if a positive number is provided and any of the primary key columns is also in hidden_columns.

  • show_sample_primary_key_per_change – Whether to show a sample primary key per column change in the summary. If False (default), no primary key values are shown. A sample primary key can only be shown if top_k_column_changes is greater than 0, as each sample primary key is linked to a specific column change. An error will be raised if True and any of the primary key columns is also in hidden_columns.

  • left_name – Custom display name for the left data frame.

  • right_name – Custom display name for the right data frame.

  • slim – Whether to generate a slim summary. In slim mode, the summary is as concise as possible, only showing sections that contain differences. As the structure of the summary can vary, it should only be used by advanced users who are familiar with the summary format.

  • hidden_columns – Columns for which no values are printed, e.g. because they contain sensitive information.

Raises:

AssertionError – If the data frames are not equal.

Note

Contrary to polars.testing.assert_frame_equal(), the data frames left and right may both be either eager or lazy. They are not required to be the same for determining equivalence.