# 【技术】深入机器学习系列3-逻辑回归

## 2 多元逻辑回归

3 逻辑回归的优缺点

• 优点：计算代价低，速度快，容易理解和实现。

• 缺点：容易欠拟合，分类和回归的精度不高。

## 4 实例

`import org.apache.spark.SparkContextimport org.apache.spark.mllib.classification.{LogisticRegressionWithLBFGS, LogisticRegressionModel}import org.apache.spark.mllib.evaluation.MulticlassMetricsimport org.apache.spark.mllib.regression.LabeledPointimport org.apache.spark.mllib.linalg.Vectorsimport org.apache.spark.mllib.util.MLUtils// 加载训练数据val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")// 切分数据，training (60%) and test (40%).val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)val training = splits(0).cache()val test = splits(1)// 训练模型val model = new LogisticRegressionWithLBFGS()  .setNumClasses(10)  .run(training)// Compute raw scores on the test set.val predictionAndLabels = test.map { case LabeledPoint(label, features) =>  val prediction = model.predict(features)  (prediction, label)}// Get evaluation metrics.val metrics = new MulticlassMetrics(predictionAndLabels)val precision = metrics.precisionprintln("Precision = " + precision)// 保存和加载模型model.save(sc, "myModelPath")val sameModel = LogisticRegressionModel.load(sc, "myModelPath")`

## 5.1 训练模型

LogisticRegressionWithLBFGS和LogisticRegressionWithSGD的入口函数均是GeneralizedLinearAlgorithm.run，下面详细分析该方法。

`def run(input: RDD[LabeledPoint]): M = {    if (numFeatures < 0) {      //计算特征数            numFeatures = input.map(_.features.size).first()    }    val initialWeights = {          if (numOfLinearPredictor == 1) {            Vectors.zeros(numFeatures)          } else if (addIntercept) {            Vectors.zeros((numFeatures + 1) * numOfLinearPredictor)          } else {            Vectors.zeros(numFeatures * numOfLinearPredictor)          }    }    run(input, initialWeights)}`

## 5.1.1 根据提供的参数缩放特征并添加截距

`val scaler = if (useFeatureScaling) {      new StandardScaler(withStd = true, withMean = false).fit(input.map(_.features))    } else {      null    }val data =      if (addIntercept) {        if (useFeatureScaling) {          input.map(lp => (lp.label, appendBias(scaler.transform(lp.features)))).cache()        } else {          input.map(lp => (lp.label, appendBias(lp.features))).cache()        }      } else {        if (useFeatureScaling) {          input.map(lp => (lp.label, scaler.transform(lp.features))).cache()        } else {          input.map(lp => (lp.label, lp.features))        }      }val initialWeightsWithIntercept = if (addIntercept && numOfLinearPredictor == 1) {      appendBias(initialWeights)    } else {      /** If `numOfLinearPredictor > 1`, initialWeights already contains intercepts. */      initialWeights}`

`def appendBias(vector: Vector): Vector = {    vector match {      case dv: DenseVector =>        val inputValues = dv.values        val inputLength = inputValues.length        val outputValues = Array.ofDim[Double](inputLength + 1)        System.arraycopy(inputValues, 0, outputValues, 0, inputLength)        outputValues(inputLength) = 1.0        Vectors.dense(outputValues)      case sv: SparseVector =>        val inputValues = sv.values        val inputIndices = sv.indices        val inputValuesLength = inputValues.length        val dim = sv.size        val outputValues = Array.ofDim[Double](inputValuesLength + 1)        val outputIndices = Array.ofDim[Int](inputValuesLength + 1)        System.arraycopy(inputValues, 0, outputValues, 0, inputValuesLength)        System.arraycopy(inputIndices, 0, outputIndices, 0, inputValuesLength)        outputValues(inputValuesLength) = 1.0        outputIndices(inputValuesLength) = dim        Vectors.sparse(dim + 1, outputIndices, outputValues)      case _ => throw new IllegalArgumentException(s"Do not support vector type \${vector.getClass}")    }`

## 5.1.2 使用最优化算法计算最终的权重值

`val weightsWithIntercept = optimizer.optimize(data, initialWeightsWithIntercept)`

`//在GradientDescent中private val gradient = new LogisticGradient()private val updater = new SquaredL2Updater()override val optimizer = new GradientDescent(gradient, updater)    .setStepSize(stepSize)    .setNumIterations(numIterations)    .setRegParam(regParam)    .setMiniBatchFraction(miniBatchFraction)//在LBFGS中override val optimizer = new LBFGS(new LogisticGradient, new SquaredL2Updater)`

`val margin = -1.0 * dot(data, weights)val multiplier = (1.0 / (1.0 + math.exp(margin))) - label//y += a * x，即cumGradient += multiplier * dataaxpy(multiplier, data, cumGradient)if (label > 0) {    // The following is equivalent to log(1 + exp(margin)) but more numerically stable.    MLUtils.log1pExp(margin)} else {    MLUtils.log1pExp(margin) - margin}`

`//权重val weightsArray = weights match {    case dv: DenseVector => dv.values    case _ =>            throw new IllegalArgumentException}//梯度val cumGradientArray = cumGradient match {    case dv: DenseVector => dv.values    case _ =>        throw new IllegalArgumentException}// 计算所有类别中最大的marginvar marginY = 0.0var maxMargin = Double.NegativeInfinityvar maxMarginIndex = 0val margins = Array.tabulate(numClasses - 1) { i =>    var margin = 0.0    data.foreachActive { (index, value) =>        if (value != 0.0) margin += value * weightsArray((i * dataSize) + index)    }    if (i == label.toInt - 1) marginY = margin    if (margin > maxMargin) {            maxMargin = margin            maxMarginIndex = i    }    margin}//计算sum，保证每个margin都小于0，避免出现算术溢出的情况val sum = {     var temp = 0.0     if (maxMargin > 0) {         for (i <- 0 until numClasses - 1) {              margins(i) -= maxMargin              if (i == maxMarginIndex) {                temp += math.exp(-maxMargin)              } else {                temp += math.exp(margins(i))              }         }     } else {         for (i <- 0 until numClasses - 1) {              temp += math.exp(margins(i))         }     }     temp}//计算multiplier并计算梯度for (i <- 0 until numClasses - 1) {     val multiplier = math.exp(margins(i)) / (sum + 1.0) - {          if (label != 0.0 && label == i + 1) 1.0 else 0.0     }     data.foreachActive { (index, value) =>         if (value != 0.0) cumGradientArray(i * dataSize + index) += multiplier * value     }}//计算损失函数,val loss = if (label > 0.0) math.log1p(sum) - marginY else math.log1p(sum)if (maxMargin > 0) {     loss + maxMargin} else {     loss}`

• SquaredL2Updater

`class SquaredL2Updater extends Updater {  override def compute(      weightsOld: Vector,      gradient: Vector,      stepSize: Double,      iter: Int,      regParam: Double): (Vector, Double) = {    // w' = w - thisIterStepSize * (gradient + regParam * w)    // w' = (1 - thisIterStepSize * regParam) * w - thisIterStepSize * gradient   //表示步长，即负梯度方向的大小    val thisIterStepSize = stepSize / math.sqrt(iter)    val brzWeights: BV[Double] = weightsOld.toBreeze.toDenseVector    //正则化，brzWeights每行数据均乘以(1.0 - thisIterStepSize * regParam)    brzWeights :*= (1.0 - thisIterStepSize * regParam)    //y += x * a，即brzWeights -= gradient * thisInterStepSize    brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)    //正则化||w||_2    val norm = brzNorm(brzWeights, 2.0)    (Vectors.fromBreeze(brzWeights), 0.5 * regParam * norm * norm)  }}`

`w' = w - thisIterStepSize * (gradient + regParam * w)w' = (1 - thisIterStepSize * regParam) * w - thisIterStepSize * gradient`

## 5.1.3 对最终的权重值进行后处理

`val intercept = if (addIntercept && numOfLinearPredictor == 1)    {      weightsWithIntercept(weightsWithIntercept.size - 1)    }     else    {      0.0    }var weights = if (addIntercept && numOfLinearPredictor == 1)    {      Vectors.dense(weightsWithIntercept.toArray.slice(0, weightsWithIntercept.size - 1))    }     else     {      weightsWithIntercept    }`

`if (useFeatureScaling) {      if (numOfLinearPredictor == 1) {        weights = scaler.transform(weights)      } else {        var i = 0        val n = weights.size / numOfLinearPredictor        val weightsArray = weights.toArray        while (i < numOfLinearPredictor) {          //排除intercept          val start = i * n          val end = (i + 1) * n - { if (addIntercept) 1 else 0 }          val partialWeightsArray = scaler.transform(            Vectors.dense(weightsArray.slice(start, end))).toArray          System.arraycopy(partialWeightsArray, 0, weightsArray, start, partialWeightsArray.size)          i += 1        }        weights = Vectors.dense(weightsArray)      }    }`

## 5.1.4 创建模型

`createModel(weights, intercept)`

## 5.2 预测

• 二分类的情况

`val margin = dot(weightMatrix, dataMatrix) + interceptval score = 1.0 / (1.0 + math.exp(-margin))threshold match {    case Some(t) => if (score > t) 1.0 else 0.0    case None => score}`

• 多分类情况

`var bestClass = 0var maxMargin = 0.0val withBias = dataMatrix.size + 1 == dataWithBiasSize(0 until numClasses - 1).foreach { i =>        var margin = 0.0        dataMatrix.foreachActive { (index, value) =>          if (value != 0.0) margin += value * weightsArray((i * dataWithBiasSize) + index)        }        // Intercept is required to be added into margin.        if (withBias) {          margin += weightsArray((i * dataWithBiasSize) + dataMatrix.size)        }        if (margin > maxMargin) {          maxMargin = margin          bestClass = i + 1        }}bestClass.toDouble`

## 参考文献

【1】逻辑回归模型(Logistic Regression, LR)基础

【2】逻辑回归

TDH社区版 | TDH社区版提供官方下载

Holodesk | 业界最强的SQL引擎Inceptor为何这么快？

TED视频 | TEDxLujiazui精彩视频：【大数据 大趋势】

【速记】同济大学教授 王伟：同济-星环“数据科学与大数据实践平台”建设

【速记】第一创业证券 瞿任雄：基于星环TDH大数据平台构建新一代券商数据中心

【速记】中泰证券 何波：基于机器学习的场外配资自动识别系统