Polynomial Regression#
Interpolating polynomials are constructed based on a carefully selected points in the domain (i.e., the unisolvent nodes). However, in many practical applications, the data is often provided as either randomly scattered or arranged on a uniform grid. In these situations, a polynomial—albeit not strictly interpolatory—can still be constructed via least squares, which minimizes the error between the given data and the polynomial.
Given a polynomial with respect to a multi-index set \(A\) of the form
represented in any basis and a set of data \(\mathcal{D} = \{ (\boldsymbol{x}^{(i)}, y^{(i)}) \}_{i = 1}^N\), the coefficients \(c_{\boldsymbol{\alpha}}\) can be obtained by solving the least squares problem
where \(\boldsymbol{y}\) is the vector of dependent variable \((y^{(i)})_{i = 1}^N\), and \(R_A\) is the regression matrix. The regression matrix, in turn, is obtained by evaluation the basis polynomials at the data points \((\boldsymbol{x}^{(i)})_{i = 1}^N\)
\(\Psi_{\boldsymbol{\alpha}}\) are the basis polynomials that correspond to the chosen polynomial basis.
To ensure that the optimization problem in Eq. (6) is well-posed, the regression matrix \(\mathbf{R}_A\) must be of full rank, meaning that the number of data points \(N\) should be greater than or equal to the number of basis polynomials \(\lvert A \rvert\). This condition guarantees that there is enough data to uniquely determine the coefficients of the polynomial.