What are objective functions in regression?

Objective functions in regression are mathematical formulas that quantify the error or discrepancy between the predicted outputs of a model and the actual observed data. Their primary role is to provide a measurable target for the regression algorithm to optimize, usually by minimizing this error. The ultimate goal is to find the optimal set of parameters (or 'ws') for the model that best fit the data, enabling accurate predictions.

In essence, an objective function measures how "wrong" a model's predictions are. For any given data point, the difference between the true dependent variable (yᵢ) and the model's estimated dependent variable (ŷᵢ) is known as the residual error (eᵢ = yᵢ - ŷᵢ). The objective function takes these individual residual errors and aggregates them into a single value that the regression algorithm then seeks to minimize.

Understanding the Purpose of Objective Functions

The core purpose of an objective function, often also called a loss function or cost function, is to guide the learning process of a regression model. Without it, the model wouldn't know which set of parameters produces a better fit to the data.

Here's a breakdown of their importance:

Quantifying Error: They provide a numerical measure of the model's performance. A lower value generally indicates a better-performing model.
Guiding Optimization: During model training, an optimization algorithm (like Gradient Descent) iteratively adjusts the model's parameters to reduce the value of the objective function.
Parameter Estimation: By minimizing the objective function, the model estimates the optimal coefficients (the 'ws' mentioned previously) that define the relationship between the independent and dependent variables.
Model Comparison: Different regression models or different parameter sets for the same model can be compared based on their objective function values, helping in selecting the best model.

Common Types of Objective Functions in Regression

Various objective functions are used, each with its own characteristics and suitability for different types of data or problem requirements.

1. Mean Squared Error (MSE)

MSE is one of the most widely used objective functions. It calculates the average of the squared differences between predicted and actual values.

Formula: $MSE = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2$
Characteristics:
- Penalizes larger errors more significantly due to the squaring.
- Results in a differentiable function, which is beneficial for optimization algorithms.
- Highly sensitive to outliers, as they contribute disproportionately to the error.

2. Root Mean Squared Error (RMSE)

RMSE is simply the square root of the MSE. It brings the error measure back into the same units as the dependent variable, making it more interpretable.

Formula: $RMSE = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2}$
Characteristics:
- Shares similar properties with MSE regarding outlier sensitivity.
- Easier to understand in the context of the original data scale.

3. Mean Absolute Error (MAE)

MAE calculates the average of the absolute differences between predicted and actual values.

Formula: $MAE = \frac{1}{N} \sum_{i=1}^{N} |y_i - \hat{y}_i|$
Characteristics:
- Less sensitive to outliers compared to MSE/RMSE because it doesn't square the errors.
- Provides a more robust measure of average error when the data contains extreme values.
- Its derivative is not continuous at zero, which can sometimes pose challenges for certain optimization algorithms.

4. Huber Loss (Smooth MAE)

Huber Loss is a hybrid function that combines the best properties of MSE and MAE. It is quadratic for small errors and linear for large errors, making it less sensitive to outliers than MSE but still differentiable.

Characteristics:
- Robust to outliers, like MAE.
- Differentiable, like MSE, which is good for optimization.
- Requires tuning a parameter (delta, $\delta$) to define the threshold between quadratic and linear behavior.

5. Log-Cosh Loss

Log-Cosh Loss is another function that is smoother than Huber Loss. It is the logarithm of the hyperbolic cosine of the error.

Characteristics:
- Works similarly to MSE for small errors and MAE for large errors.
- Always differentiable and convex, making it easier for gradient-based optimizers.

Comparison of Common Regression Objective Functions

Objective Function	Characteristics	Sensitivity to Outliers	Primary Use Case
Mean Squared Error (MSE)	Penalizes larger errors heavily, differentiable.	High	General purpose, when large errors are critical.
Mean Absolute Error (MAE)	Treats all errors equally, robust.	Low	When outliers are present and should not dominate.
Root Mean Squared Error (RMSE)	Scale-interpretable version of MSE.	High	Similar to MSE, but for easier interpretation.
Huber Loss	Combines MSE (small errors) and MAE (large errors).	Moderate (tunable)	When robustness to outliers is needed, but differentiability is also important.

How Objective Functions Drive Model Training

During the training phase of a regression model, the process typically involves these steps:

Initialization: The model's parameters (weights and biases) are initialized, often randomly.
Prediction: For each data point in the training set, the model makes a prediction (ŷᵢ) based on the current parameters.
Error Calculation: The objective function calculates the total error based on the difference between the actual values (yᵢ) and the predicted values (ŷᵢ).
Parameter Adjustment: An optimization algorithm uses the calculated error to determine how to adjust the model's parameters to reduce this error. For example, in linear regression, the model aims to estimate the 'ws' (coefficients) that minimize the sum of squared residuals, directly using the MSE as its objective.
Iteration: Steps 2-4 are repeated iteratively until the objective function's value is minimized (or converges to a stable minimum), indicating that the model has learned the best possible fit to the training data.

Choosing the right objective function is crucial, as it directly impacts how the model learns and what kind of errors it prioritizes minimizing. This choice often depends on the specific dataset, the presence of outliers, and the desired behavior of the model.