ordinary_regression#
This module contains the concrete implementation of ordinary regression.
A polynomial regression model is used when the given dataset do not lie on the unisolvent nodes of an interpolating polynomial. In such cases, the function values at the unisolvent nodes are obtained via least-squares.
- class minterpy.extras.regression.ordinary_regression.OrdinaryRegression(multi_index=None, grid=None, origin_poly=LagrangePolynomial)[source]#
Bases:
RegressionABC
Implementation of an ordinary (weighted/unweighted) poly. regression.
OrdinaryRegression
fits a polynomial model specified by a multi-index set on a given dataset.- Parameters:
multi_index (
MultiIndexSet
, optional) – The multi-index set that defines the underlying polynomial. This parameter is optional if a grid is specified; in that case, the multi-index set is set to the one attached to the grid.grid (
Grid
, optional) – The grid where the polynomial lives. This parameter is optional if a multi-index set is specified; in that case, the grid is constructed from the specified multi-index set.origin_poly (Type[
MultivariatePolynomialSingleABC
], optional) – The polynomial basis on which the regression is carried out. This parameter is optional and, by default, is set to theLagrangePolynomial
.
Properties
The fitted coefficients with respect to the origin poly.
Polynomial basis use for the prediction.
Grid on which the polynomial lives.
Leave-one-out CV error of the fitted polynomial (abs.
Multi-index set that defines the polynomial.
Polynomial basis on which the regression is carried out.
\(L_2\) regression fit error (in abs.
\(L_{\infty}\) regression fit error (in abs.
Methods
fit
(xx, yy[, weights, lstsq_solver])Fit the ordinary polynomial regression model.
Get the regression matrix on a set of query points.
predict
(xx)Predict using the fitted polynomial model a set of points.
show
()Show the summary of the polynomial regression model.
- property multi_index: MultiIndexSet#
Multi-index set that defines the polynomial.
- property loocv_error: Tuple[float, float]#
Leave-one-out CV error of the fitted polynomial (abs. and norm.).
The leave-one-out (LOO) cross-validation (CV) error is defined as follows:
\[\epsilon_{LOO} = \frac{1}{N} \sum_{i = 1}^N \left( y^{(i)} - \hat{f}_{\setminus i} \left(\boldsymbol{x}^{(i)}\right) \right)^2\]where:
\(N\) is the number of data points;
\(y^{(i)}\) is the \(i\)-th response data point;
\(\boldsymbol{x}^{(i)}\) is the \(i\)-th input data point;
\(\hat{f}_{\setminus i}\) is the polynomial fitted on the dataset excluding the $i$-th data point.
The normalized LOO-CV error is defined as follows:
\[\epsilon_{LOO, \text{norm}} = \frac{\epsilon_{LOO}}{\mathbb{V}[\boldsymbol{y}]}\]where \(\mathbb{V}[\boldsymbol{y}]\) denotes the variance of the response data (\(\boldsymbol{y}\)).
- property regfit_linf_error: Tuple[float, float]#
\(L_{\infty}\) regression fit error (in abs. and norm. terms).
The \(L_{\infty}\) regression fit error is defined as follows:
\[\epsilon_{L_{\infty}} = \max_{i} \; \lvert y^{(i)} - \hat{f}\left(\boldsymbol{x}^{(i)}\right) \rvert\]where:
\(y^{(i)}\) is the \(i\)-th response data point;
\(\boldsymbol{x}^{(i)}\) is the \(i\)-th input data point;
\(\hat{f}\) is the fitted polynomial.
The normalized \(L_{\infty}\) regression fit error is defined as follows:
\[\epsilon_{L_{\infty}, \text{norm}} = \frac{\epsilon_{L_1}}{\sqrt{\mathbb{V}[\boldsymbol{y}]}}\]where \(\mathbb{V}[\boldsymbol{y}]\) denotes the variance of the response data (\(\boldsymbol{y}\)).
- property regfit_l2_error: Tuple[float, float]#
\(L_2\) regression fit error (in abs. and norm. terms).
The \(L_2\) regression fit error is defined as follows:
\[\epsilon_{L_2} = \frac{1}{N} \sum_{i = 1}^N \left( y^{(i)} - \hat{f}\left(\boldsymbol{x}^{(i)}\right) \right)^2\]where:
\(N\) is the number of data points;
\(y^{(i)}\) is the \(i\)-th response data point;
\(\boldsymbol{x}^{(i)}\) is the \(i\)-th input data point;
\(\hat{f}\) is the fitted polynomial.
The normalized \(L_2\) regression fit error is defined as follows:
\[\epsilon_{L_2, \text{norm}} = \frac{\epsilon_{L_2}}{\mathbb{V}[\boldsymbol{y}]}\]where \(\mathbb{V}[\boldsymbol{y}]\) denotes the variance of the response data (\(\boldsymbol{y}\)).
- property origin_poly: MultivariatePolynomialSingleABC#
Polynomial basis on which the regression is carried out.
- property coeffs#
The fitted coefficients with respect to the origin poly. basis.
- property eval_poly: NewtonPolynomial | CanonicalPolynomial | ChebyshevPolynomial#
Polynomial basis use for the prediction.
Notes
Because Minterpy cannot directly evaluate a polynomial in the Lagrange basis on a set of query points, the corresponding polynomial in the Newton basis is created and used for prediction.
- get_regression_matrix(xx)[source]#
Get the regression matrix on a set of query points.
- Parameters:
xx (
numpy.ndarray
) – Evaluation points, an array of shape \(N_e \times M\), where \(N_e\) and \(M\) correspond to the number of evaluation points and the number of spatial dimensions, respectively.- Returns:
Regression matrix in the chosen polynomial basis, an array of shape \(N_e \times N_b\), where \(N_e\) and \(N_b\) correspond to the number of evaluation points and the number of monomials, respectively.
- Return type:
- fit(xx, yy, weights=None, lstsq_solver='lstsq', **kwargs)[source]#
Fit the ordinary polynomial regression model.
Fitting an (unweighted) ordinary polynomial regression model solves for the set of coefficients \(\boldsymbol{c}\) in the following least-squares problem:
\[\hat{\boldsymbol{c}} = \underset{\boldsymbol{c}}{\mathrm{arg min}} \lVert \boldsymbol{R} \boldsymbol{c} - \boldsymbol{y} \rVert_2^2\]where:
\(\lVert \cdot \rVert_2^2\) denotes the square of the Euclidian norm;
\(\boldsymbol{R}\) is the regression matrix evaluated on the input data points in the chosen polynomial basis;
\(\boldsymbol{y}\) is the response data points.
Moreover, if the weights matrix \(\boldsymbol{W}\) are provided, then fitting a weighted ordinary polynomial regression model obtains the fitted coefficients \(\hat{\boldsymbol{c}}\) from the following relation:
\[(\boldsymbol{R}^T \boldsymbol{W} \boldsymbol{R}) \hat{\boldsymbol{c}} = \boldsymbol{R}^T \boldsymbol{W} \boldsymbol{y}\]- Parameters:
xx (
numpy.ndarray
) – Input matrix, also known as the training inputs.yy (
numpy.ndarray
) – Response vector, observed or evaluated atxx
.weights (
numpy.ndarray
, optional) – Individual weights for each input points. The default isNone
.lstsq_solver (Union[
str
,Callable
], optional) –Least-square solver. The default is
"lstsq"
from SciPy. The additional following linear solvers are available as pre-defined:"inv"
: (Gram) matrix inversion (NumPy)"pinv"
: pseudo-inversion of the regression matrix (NumPy)"dgesv"
: LU with full pivoting solver (SciPy)"dsysv"
: Diagonal pivoting solver (SciPy)"dposv"
: Cholesky decomposition solver (SciPy)"qr"
: QR-decomposition-based solver (NumPy)"svd"
: SVD-based solver (NumPy)
compute_loocv (
bool
, optional) – Flag to compute the leave-one-out cross-validation (LOO-CV) error. For a problem of certain size, the LOO-CV computation may be costly. The default is set toTrue
.
- Returns:
The instance itself is updated with a fitted polynomial. After a successful fitting, the instance can be evaluated on a set of query points.
- Return type:
None
Notes
**kwargs
may take additional keyword arguments that are passed to the selected least-square solver. Refer to the documentation of the solver for the list of supported keyword arguments.
- predict(xx)[source]#
Predict using the fitted polynomial model a set of points.
- Parameters:
xx (
numpy.ndarray
) – Query points, a two-dimensional array of shape \(N_q \times M\), where \(N_q\) and \(M\) correspond to the number of query points and the number of spatial dimensions, respectively.- Returns:
Predicted response at the query points, a one-dimensional array of length \(N_q\), the number of query points.
- Return type:
Notes
A fitted polynomial regression model can be directly called on the set of points without accessing the
predict()
method.