Statistical Models
Professional Python tools for econometric analysis, hypothesis testing, and advanced statistical modeling across research and industry applications.
Essential Python library for statistical modeling, hypothesis testing, and rigorous data analysis across economics, social sciences, and advanced research domains.
Build robust regression and estimation models
Conduct rigorous statistical inference
Analyze temporal patterns and forecasting
Visualize and understand data relationships
Watch the setup tutorial and follow the installation steps to start building statistical models.
Complete walkthrough covering Python setup, VSCode configuration, and statsmodels installation.
Python 3.8+, VSCode with Python extension, pip installed
Complete documentation for regression and linear models from statsmodels
Ordinary Least Squares regression for fitting linear models to data with independently and identically distributed errors.
import numpy as np
import statsmodels.api as sm
# Generate sample data
x = np.linspace(0, 10, 100)
X = sm.add_constant(np.column_stack((x, x**2)))
y = np.dot(X, [1, 0.1, 10]) + np.random.normal(size=100)
# Fit OLS model
results = sm.OLS(y, X).fit()
print(results.summary())
Flexible regression framework for exponential family distributions with customizable link functions.
For binary or proportional response data
For count data
For continuous normal data
For positive continuous data
For positive skewed data
For overdispersed counts
import statsmodels.formula.api as smf
# Fit Gamma GLM
model = smf.glm('y ~ x1 + x2', data=df,
family=sm.families.Gamma(link=sm.families.links.log()))
results = model.fit()
print(results.summary())
Residuals vs Fitted: Scatter plot showing residuals against fitted values to assess homoscedasticity.
Q-Q Plot: Quantile-quantile plot comparing residual distribution to theoretical normal distribution.
Marginal regression for panel, cluster, or repeated measures data with correlated observations within clusters.
import statsmodels.api as sm
fam = sm.families.Poisson()
ind = sm.cov_struct.Exchangeable()
model = sm.GEE.from_formula('y ~ x', groups='id',
data=data, cov_struct=ind, family=fam)
results = model.fit()
print(results.summary())
M-estimators for regression resistant to outliers using iteratively reweighted least squares.
Quadratic for small residuals, linear for large.
Exponential downweighting of outliers.
Completely downweights beyond threshold.
import statsmodels.api as sm
# Fit robust model
rlm_model = sm.RLM(y, X, M=sm.robust.norms.HuberT())
rlm_results = rlm_model.fit()
# Compare with OLS
ols_results = sm.OLS(y, X).fit()
print(rlm_results.summary())
Hierarchical models with both fixed and random effects for clustered or longitudinal data.
import statsmodels.formula.api as smf
md = smf.mixedlm("y ~ x", data, groups=data["group_id"])
mdf = md.fit()
print(mdf.summary())
Maximum likelihood estimation for binary, multinomial, count, and ordinal outcomes.
import statsmodels.api as sm
logit_model = sm.Logit(y, X)
results = logit_model.fit()
# Calculate marginal effects
margeff = results.get_margeff(at='mean')
print(margeff.summary())
Statistical tests for differences among group means and variance partitioning.
import statsmodels.api as sm
from statsmodels.formula.api import ols
model = ols('y ~ C(factor1) * C(factor2)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)
Estimate conditional quantiles of the response variable for distributional analysis.
import statsmodels.formula.api as smf
quantiles = [0.1, 0.25, 0.5, 0.75, 0.9]
for q in quantiles:
model = smf.quantreg('y ~ x', data)
results = model.fit(q=q)
print(f"τ={q}: {results.params['x']}")
Scatter plot with multiple regression lines at different quantiles (τ=0.1, 0.25, 0.5, 0.75, 0.9) showing how relationships vary across the distribution. Median line (τ=0.5) in bold.
State space model for time-varying parameters using Kalman filter framework.
import statsmodels.api as sm
model = sm.RecursiveLS(y, X)
results = model.fit()
# Plot recursive coefficients
results.plot_recursive_coefficient(0, alpha=0.05)
results.plot_recursive_coefficient(1, alpha=0.05)
Intercept: Time series plot showing estimated intercept over time with 95% confidence bands.
Slope: Evolution of slope coefficient with widening confidence bands during uncertainty periods. CUSUM test included for stability.