py-glm is a library for fitting, inspecting, and evaluating Generalized Linear Models in python.
The py-glm library can be installed directly from github.
pip install git+https://github.com/madrury/py-glm.git
py-glm supports models from various exponential families:
from glm.glm import GLM
from glm.families import Gaussian, Bernoulli, Poisson, Exponential
linear_model = GLM(family=Gaussian())
logistic_model = GLM(family=Bernoulli())
poisson_model = GLM(family=Poisson())
exponential_model = GLM(family=Exponential())Models with dispersion parameters are also supported. The dispersion parameters in these models are estimated using the deviance.
from glm.families import QuasiPoisson, Gamma
quasi_poisson_model = GLM(family=QuasiPoisson())
gamma_model = GLM(family=Gamma())Fitting a model proceeds in sklearn style, and uses the Fisher scoring algorithm:
logistic_model.fit(X, y_logistic)If your data resides in a pandas.DataFrame, you can pass this to fit along with a model formula.
logistic_model.fit(X, formula="y ~ Moshi + SwimSwim")
Offsets and sample weights are supported when fitting:
linear_model.fit(X, y_linear, sample_weights=sample_weights)
poisson_nmodel.fit(X, y_poisson, offset=np.log(expos))Predictions are also made in sklearn style:
logistic_model.predict(X)Note: There is one major place we deviate from the sklearn interface. The predict method on a GLM object always returns an estimate of the conditional expectation E[y | X]. This is in contrast to sklearn behavior for classification models, where it returns a class assignment. We make this choice so that the py-glm library is consistent with its use of predict. If the user would like class assignments from a model, they will need to threshold the probability returned by predict manually.
Once the model is fit, parameter estimates, parameter covariance estimates, and p-values from a standard z-test are available:
logistic_model.coef_
logistic_model.coef_covariance_matrix_
logistic_model.coef_standard_error_
logistic_model.p_values_To get a quick summary, use the summary method:
logistic_model.summary()
Binomial GLM Model Summary.
===============================================
Name Parameter Estimate Standard Error
-----------------------------------------------
Intercept 1.02 0.01
Moshi -2.00 0.02
SwimSwim 1.00 0.02
Re-sampling methods are also supported in the simulation subpackage: the
parametric and non-parametric bootstraps:
from glm.simulation import Simulation
sim = Simulation(logistic_model)
sim.parametric_bootstrap(X, n_sim=1000)
sim.non_parametric_bootstrap(X, n_sim=1000)Ridge regression is supported for each model (note, the regularization parameter is called alpha instead of lambda due to lambda being a reserved word in python):
logistic_model.fit(X, y_logistic, alpha=1.0)- Marlene Müller (2004). Generalized Linear Models.
The glmnet code included in glm.glmnet is experimental. Please use at your own risk.