Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/publish.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,18 @@ jobs:
GITHUB_CONTEXT: ${{ toJson(github) }}
run: echo "$GITHUB_CONTEXT"

- uses: actions/checkout@v2
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: "3.8"
python-version: "3.11"

- name: Install poetry
run: pip install poetry==1.3.2
run: pip install poetry>=1.5.0

- name: Install Dependencies
run: poetry install --no-dev
run: poetry install --without dev

- name: Build
run: poetry build
Expand Down
18 changes: 9 additions & 9 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,15 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: 3.9
python-version: "3.11"

- name: Install poetry
run: pip install poetry==1.3.2
run: pip install poetry>=1.5.0

- name: Install dependencies
run: make install
Expand All @@ -36,18 +36,18 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.7", "3.8", "3.9", "3.10"]
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install poetry
run: pip install poetry==1.3.2
run: pip install poetry>=1.5.0

- name: Install dependencies
run: make install
Expand All @@ -56,4 +56,4 @@ jobs:
run: make test

- name: Upload coverage
uses: codecov/codecov-action@v3
uses: codecov/codecov-action@v4
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@ settings.json
.mypy_cache/
.pytest_cache/

# Claude Code
CLAUDE.md

# Tests artifacts
reports/
coverage.xml
Expand Down
8 changes: 4 additions & 4 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
version: 2

build:
os: ubuntu-20.04
os: ubuntu-22.04
tools:
python: "3.8"
python: "3.11"
jobs:
post_install:
- pip install --no-cache-dir poetry
- poetry export -f requirements.txt -o requirements.txt --without-hashes
- pip install --no-cache-dir poetry poetry-plugin-export
- poetry export -f requirements.txt -o requirements.txt --without-hashes --without dev
- pip install --no-cache-dir -r requirements.txt

sphinx:
Expand Down
44 changes: 44 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,50 @@
Release Notes
=============

Version 0.5.0 (06.01.2025)
---------------------------

**Breaking Changes:**

* Minimum Python version raised to 3.9 (dropped support for 3.7, 3.8)

* Minimum PySpark version raised to 3.4 (dropped support for 3.2, 3.3)

**New Features:**

* Added support for Python 3.11, 3.12, 3.13

**Bug Fixes:**

* Added hnswlib as fallback for nmslib on macOS ARM (fixes segfault in metric split)

**Dependencies:**

* Updated numpy to >=1.24.0, <3.0.0

* Updated pandas to >=1.5.0, <3.0.0

* Updated scipy to >=1.10.0

* Updated scikit-learn to >=1.3.0

* Updated nmslib to >=2.1.0

* Added hnswlib >=0.7.0 as alternative KNN backend

* Updated catboost to >=1.2.0

* Updated other dependencies for Python 3.12/3.13 compatibility

**Internal:**

* Replaced deprecated ``pkg_resources`` with ``importlib.metadata``

* Updated CI/CD to test Python 3.9-3.13

* Updated GitHub Actions to v4/v5


Version 0.4.1 (21.04.2023)
---------------------------

Expand Down
4 changes: 3 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Ambrosia
:target: https://codecov.io/gh/MobileTeleSystems/Ambrosia
.. |Black| image:: https://img.shields.io/badge/code%20style-black-000000.svg
:target: https://github.com/psf/black
.. |Python Versions| image:: https://img.shields.io/pypi/pyversions/ambrosia.svg
.. |Python Versions| image:: https://img.shields.io/pypi/pyversions/ambrosia.svg?v=0.5.0
:target: https://pypi.org/project/ambrosia
.. |Telegram Channel| image:: https://img.shields.io/badge/telegram-Ambrosia-blueviolet.svg?logo=telegram
:target: https://t.me/+Tkt43TNUUSAxNWNi
Expand Down Expand Up @@ -62,6 +62,8 @@ and `Tutorials <https://github.com/MobileTeleSystems/Ambrosia/tree/main/examples
Installation
------------

**Requirements:** Python 3.9+

You can always get the newest *Ambrosia* release using ``pip``.
Stable version is released on every tag to ``main`` branch.

Expand Down
2 changes: 1 addition & 1 deletion ambrosia/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.4.1
0.5.0
3 changes: 2 additions & 1 deletion ambrosia/preprocessing/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
from .ml_var_reducer import MLVarianceReducer
from .preprocessor import Preprocessor
from .robust import IQRPreprocessor, RobustPreprocessor
from .transformers import BoxCoxTransformer, LogTransformer
from .transformers import BoxCoxTransformer, LinearizationTransformer, LogTransformer

__all__ = [
"AggregatePreprocessor",
Expand All @@ -32,5 +32,6 @@
"RobustPreprocessor",
"IQRPreprocessor",
"BoxCoxTransformer",
"LinearizationTransformer",
"LogTransformer",
]
46 changes: 45 additions & 1 deletion ambrosia/preprocessing/preprocessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
from ambrosia.preprocessing.aggregate import AggregatePreprocessor
from ambrosia.preprocessing.cuped import Cuped, MultiCuped
from ambrosia.preprocessing.robust import IQRPreprocessor, RobustPreprocessor
from ambrosia.preprocessing.transformers import BoxCoxTransformer, LogTransformer
from ambrosia.preprocessing.transformers import BoxCoxTransformer, LinearizationTransformer, LogTransformer


class Preprocessor:
Expand Down Expand Up @@ -378,6 +378,50 @@ def multicuped(
self.transformers.append(transformer)
return self

def linearize(
self,
numerator: types.ColumnNameType,
denominator: types.ColumnNameType,
transformed_name: Optional[types.ColumnNameType] = None,
load_path: Optional[Path] = None,
) -> Preprocessor:
"""
Linearize a ratio metric for use in A/B testing.

Computes a per-unit linearized value that is approximately normally
distributed, enabling correct t-test usage for ratio metrics:

linearized_i = numerator_i - ratio * denominator_i

where ratio = mean(numerator) / mean(denominator) is estimated on
the data passed to this ``Preprocessor`` instance (reference / control data).

Parameters
----------
numerator : ColumnNameType
Column name of the ratio numerator (e.g. ``"revenue"``).
denominator : ColumnNameType
Column name of the ratio denominator (e.g. ``"orders"``).
transformed_name : ColumnNameType, optional
Name for the new linearized column. Defaults to
``"{numerator}_lin"``.
load_path : Path, optional
Path to a json file with pre-fitted parameters.

Returns
-------
self : Preprocessor
Instance object.
"""
transformer = LinearizationTransformer()
if load_path is None:
transformer.fit_transform(self.dataframe, numerator, denominator, transformed_name, inplace=True)
else:
transformer.load_params(load_path)
transformer.transform(self.dataframe, inplace=True)
self.transformers.append(transformer)
return self

def transformations(self) -> List:
"""
List of all transformations which were called.
Expand Down
133 changes: 132 additions & 1 deletion ambrosia/preprocessing/transformers.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
Module contains tools for metrics transformations during a
preprocessing task.
"""
from typing import Dict, Union
from typing import Dict, Optional, Union

import numpy as np
import pandas as pd
Expand Down Expand Up @@ -386,3 +386,134 @@ def inverse_transform(self, dataframe: pd.DataFrame, inplace: bool = False) -> U
transformed: pd.DataFrame = dataframe if inplace else dataframe.copy()
transformed[self.column_names] = np.exp(transformed[self.column_names].values)
return None if inplace else transformed


class LinearizationTransformer(AbstractFittableTransformer):
"""
Linearization transformer for ratio metrics.

Converts a ratio metric (numerator / denominator) into a per-unit linearized
metric that is approximately normally distributed, enabling correct t-test usage:

linearized_i = numerator_i - ratio * denominator_i

where ratio = mean(numerator) / mean(denominator), estimated on the reference
(control group / historical) data passed to fit().

Parameters
----------
numerator : str
Column name of the ratio numerator (e.g. "revenue").
denominator : str
Column name of the ratio denominator (e.g. "orders").
transformed_name : str, optional
Name for the new column. Defaults to ``"{numerator}_lin"``.

Examples
--------
>>> transformer = LinearizationTransformer()
>>> transformer.fit(control_df, "revenue", "orders", "arpu_lin")
>>> transformer.transform(experiment_df, inplace=True)
"""

def __str__(self) -> str:
return "Linearization transformation"

def __init__(self) -> None:
self.numerator: Optional[str] = None
self.denominator: Optional[str] = None
self.transformed_name: Optional[str] = None
self.ratio: Optional[float] = None
super().__init__()

def get_params_dict(self) -> Dict:
self._check_fitted()
return {
"numerator": self.numerator,
"denominator": self.denominator,
"transformed_name": self.transformed_name,
"ratio": self.ratio,
}

def load_params_dict(self, params: Dict) -> None:
for key in ("numerator", "denominator", "transformed_name", "ratio"):
if key not in params:
raise TypeError(f"params argument must contain: {key}")
setattr(self, key, params[key])
self.fitted = True

def fit(
self,
dataframe: pd.DataFrame,
numerator: str,
denominator: str,
transformed_name: Optional[str] = None,
):
"""
Estimate ratio = mean(numerator) / mean(denominator) on reference data.

Parameters
----------
dataframe : pd.DataFrame
Reference dataframe (typically control group or historical data).
numerator : str
Column name of the ratio numerator.
denominator : str
Column name of the ratio denominator.
transformed_name : str, optional
Name for the linearized column. Defaults to ``"{numerator}_lin"``.
"""
self._check_cols(dataframe, [numerator, denominator])
denom_mean = dataframe[denominator].mean()
if denom_mean == 0:
raise ValueError(f"Mean of denominator column '{denominator}' is zero; cannot compute ratio.")
self.numerator = numerator
self.denominator = denominator
self.transformed_name = transformed_name if transformed_name is not None else f"{numerator}_lin"
self.ratio = dataframe[numerator].mean() / denom_mean
self.fitted = True
return self

def transform(self, dataframe: pd.DataFrame, inplace: bool = False) -> Union[pd.DataFrame, None]:
"""
Apply linearization: transformed = numerator - ratio * denominator.

Parameters
----------
dataframe : pd.DataFrame
Dataframe to transform.
inplace : bool, default: ``False``
If ``True`` modifies dataframe in place, otherwise returns a copy.
"""
self._check_fitted()
self._check_cols(dataframe, [self.numerator, self.denominator])
df = dataframe if inplace else dataframe.copy()
df[self.transformed_name] = df[self.numerator] - self.ratio * df[self.denominator]
return None if inplace else df

def fit_transform(
self,
dataframe: pd.DataFrame,
numerator: str,
denominator: str,
transformed_name: Optional[str] = None,
inplace: bool = False,
) -> Union[pd.DataFrame, None]:
"""
Fit and transform in one step.

Parameters
----------
dataframe : pd.DataFrame
Reference dataframe for fitting and transformation.
numerator : str
Column name of the ratio numerator.
denominator : str
Column name of the ratio denominator.
transformed_name : str, optional
Name for the linearized column.
inplace : bool, default: ``False``
If ``True`` modifies dataframe in place.
"""
self.fit(dataframe, numerator, denominator, transformed_name)
return self.transform(dataframe, inplace)
Loading
Loading