回顾最小二乘法

Table of Contents

最小二乘法的用途

最小二乘法来常常用来估计线性回归中的斜率,以作线性最小二乘拟合 linear least squares fit

res = ys - (inter + slope * xs)

res: 残差
ys: 因变量序列
xs: 自变量序列
inter: 截距
slope: 斜率

最好是找到合适的 inter 和 slope 使残差的绝对值最小,常见的做法是使得残差的平方和最小, 因为平方和与残差的正负值无关,并使较大的残差具有更多的权重。

···
sum(res**2)
···

最小二乘法说明

具体算法参见 https://en.wikipedia.org/wiki/Numerical_methods_for_linear_least_square
https://zh.wikipedia.org/wiki/%E6%9C%80%E5%B0%8F%E4%BA%8C%E4%B9%98%E6%B3%95

file

线性函数模型

Q = \sum_{i=1}^{n} \ [y_i - f(\vec{x}_i;\hat{\vec{\beta}})]^2
y = \beta_0 + \beta_1x + \varepsilon \, .
Q = \sum_{i=1}^{n} \ [y_i - (\hat{\beta}_0 + \hat{\beta}_1x_i)]^2 \, .

1) taking partial derivatives of Q with respect to β0^ and β̂ 1,
2) setting each partial derivative equal to zero, and
3) solving the resulting system of two equations with two unknowns

\hat{\beta}_1 = \frac{\sum_{i=1}^{n} (x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^{n} (x_i-\bar{x})^2}
\hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x}

示例代码

Think stats 这本书中给了 python 的实现

参见 https://github.com/AllenDowney/ThinkStats2/blob/master/thinkstats2/thinkstats2.py

代码片段如下


def LeastSquares(xs, ys):
    """Computes a linear least squares fit for ys as a function of xs.
    Args:
        xs: sequence of values
        ys: sequence of values
    Returns:
        tuple of (intercept, slope)
    """
    meanx, varx = MeanVar(xs)
    meany = Mean(ys)

    slope = Cov(xs, ys, meanx, meany) / varx
    inter = meany - slope * meanx

    return inter, slope

def MeanVar(xs, ddof=0):
    """Computes mean and variance.
    Based on http://stackoverflow.com/questions/19391149/
    numpy-mean-and-variance-from-single-function
    xs: sequence of values
    ddof: delta degrees of freedom

    returns: pair of float, mean and var
    """
    xs = np.asarray(xs)
    mean = xs.mean()
    s2 = Var(xs, mean, ddof)
    return mean, s2

def Mean(xs):
    """Computes mean.
    xs: sequence of values
    returns: float mean
    """
    return np.mean(xs)

def Cov(xs, ys, meanx=None, meany=None):
    """Computes Cov(X, Y).
    Args:
        xs: sequence of values
        ys: sequence of values
        meanx: optional float mean of xs
        meany: optional float mean of ys
    Returns:
        Cov(X, Y)
    """
    xs = np.asarray(xs)
    ys = np.asarray(ys)

    if meanx is None:
        meanx = np.mean(xs)
    if meany is None:
        meany = np.mean(ys)

    cov = np.dot(xs-meanx, ys-meany) / len(xs)
    return cov

Comments |0|

Legend *) Required fields are marked
**) You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>
Category: 数学