Polynomial Regression#

Interpolating polynomials are constructed based on a carefully selected points in the domain (i.e., the unisolvent nodes). However, in many practical applications, the data is often provided as either randomly scattered or arranged on a uniform grid. In these situations, a polynomial—albeit not strictly interpolatory—can still be constructed via least squares, which minimizes the error between the given data and the polynomial.

Given a polynomial with respect to a multi-index set \(A\) of the form

(6)#\[Q(\boldsymbol{x}) = \sum_{\boldsymbol{\alpha} \in A} c_{\boldsymbol{\alpha}} \, \Psi_{\boldsymbol{\alpha}} (\boldsymbol{x}),\]

represented in any basis and a set of data \(\mathcal{D} = \{ (\boldsymbol{x}^{(i)}, y^{(i)}) \}_{i = 1}^N\), the coefficients \(c_{\boldsymbol{\alpha}}\) can be obtained by solving the least squares problem

\[\hat{\boldsymbol{c}} = \underset{\boldsymbol{c} \in \mathbb{R}^{\lvert A \rvert}}{\mathrm{argmax}} \; \lVert \mathbf{R}_A \boldsymbol{c} - \boldsymbol{y} \rVert^2_2,\]

where \(\boldsymbol{y}\) is the vector of dependent variable \((y^{(i)})_{i = 1}^N\), and \(R_A\) is the regression matrix. The regression matrix, in turn, is obtained by evaluation the basis polynomials at the data points \((\boldsymbol{x}^{(i)})_{i = 1}^N\)

\[\mathbf{R}_A = \left( \Psi_{\boldsymbol{\alpha}} \left( \boldsymbol{x}^{(i)} \right) \right)_{i = 1, \ldots, N, \boldsymbol{\alpha} \in A} \in \mathbb{R}^{N \times \lvert A \rvert}.\]

\(\Psi_{\boldsymbol{\alpha}}\) are the basis polynomials that correspond to the chosen polynomial basis.

To ensure that the optimization problem in Eq. (6) is well-posed, the regression matrix \(\mathbf{R}_A\) must be of full rank, meaning that the number of data points \(N\) should be greater than or equal to the number of basis polynomials \(\lvert A \rvert\). This condition guarantees that there is enough data to uniquely determine the coefficients of the polynomial.