Data Comparison#
- diffly.compare_frames(
- left: DataFrame | LazyFrame,
- right: DataFrame | LazyFrame,
- /,
- *,
- primary_key: str | Sequence[str] | None = None,
- abs_tol: float | Mapping[str, float] = 1e-08,
- rel_tol: float | Mapping[str, float] = 1e-05,
- abs_tol_temporal: timedelta | Mapping[str, timedelta] = datetime.timedelta(0),
Compare two
polarsdata frames.- Parameters:
left – The first data frame in the comparison.
right – The second data frame in the comparison.
primary_key – Primary key columns to use for joining the data frames. If not provided, comparisons based on joins will raise an error.
abs_tol – Absolute tolerance for comparing floating point types. If a
Mappingis provided, it should map from column name to absolute tolerance for every column in the data frame (except the primary key).rel_tol – Relative tolerance for comparing floating point types. If a
Mappingis provided, it should map from column name to relative tolerance for every column in the data frame (except the primary key).abs_tol_temporal – Absolute tolerance for comparing temporal types. If a
Mappingis provided, it should map from column name to absolute temporal tolerance for every column in the data frame (except the primary key).
- Returns:
A data frame comparison object that can be used to explore the differences of the provided data frames.
Note
The implementation of floating point equivalence mirrors the implementation of
math.isclose().
- class diffly.comparison.DataFrameComparison(
- left: LazyFrame,
- right: LazyFrame,
- left_schema: Schema,
- right_schema: Schema,
- primary_key: list[str] | None,
- _other_common_columns: list[str],
- abs_tol_by_column: dict[str, float],
- rel_tol_by_column: dict[str, float],
- abs_tol_temporal_by_column: dict[str, timedelta],
Object representing a comparison between two
polarsdata frames.Note
Do not initialize this object directly. Instead, use
compare_frames().See also
Schema Comparison — inspect column names and data types via
schemas.
|
Whether the data frames are equal, independent of row and column order. |
Whether the number of rows in the left and right data frames are equal. |
|
The rows of both data frames that can be joined, regardless of whether column values match in columns which are not used for joining. |
|
The rows of both data frames that can be joined and have matching values in in all columns in subset. |
|
The rows of both data frames that can be joined and have at least one mismatching value across any column in subset. |
|
Number of rows in the left data frame. |
|
Number of rows in the right data frame. |
|
The number of rows that can be joined, regardless of whether column values match in columns which are not used for joining. |
|
The number of rows that can be joined and have matching values in all columns in subset. |
|
The number of rows of both data frames that can be joined and have at least one mismatching value across any column in subset. |
|
The rows in the left data frame which cannot be joined with a row in the right data frame. |
|
The rows in the right data frame which cannot be joined with a row in the left data frame. |
|
The number of rows in the left data frame which cannot be joined with a row in the right data frame. |
|
The number of rows in the right data frame which cannot be joined with a row in the left data frame. |
|
Compute the fraction of matching values. |
|
Get the changes of a column, sorted in descending order of frequency. |