Multiple Variable Linear Regression

Multiple Variable Linear Regression#

This documentation demonstrates a comprehensive analysis of sea surface temperature (SST) variability explained by two climate indices (AO and Niño 3.4) using multiple linear regression. The analysis covers data preparation, index calculation, regression modeling, and visualization of results.

\[y = a_1 x_1 + a_2 x_2\]

Before proceeding with all the steps, first import some necessary libraries and packages

import xarray as xr
import cartopy.crs as ccrs
import easyclimate as ecl

Two time ranges are defined to account for seasonal analysis with different base periods

  • time_range: Primary analysis period (1982-2020)

  • time_range_plus1: Offset by one year for seasonal calculations

time_range = slice("1982-01-01", "2020-12-31")
time_range_plus1 = slice("1983-01-01", "2021-12-31")

The Arctic Oscillation index is calculated using:

  1. Sea level pressure (SLP) data from the Northern Hemisphere

  2. Seasonal mean calculation for December-January-February (DJF)

  3. EOF analysis following Thompson & Wallace (1998) methodology

The resulting index is then subset to our analysis period.

<xarray.DataArray 'PC' (time: 39)> Size: 312B
array([-0.99517237, -0.37496718,  1.10574761,  1.04859641,  0.12535887,
        0.10322009, -2.66546857, -1.16177425, -0.662752  , -1.67955732,
       -2.06468711,  0.1944445 , -1.04616313,  0.59380495, -0.50431692,
        0.06580437, -0.79229132, -1.28876489,  0.77994097, -0.80616942,
        0.01642609,  0.19134825, -0.25406337,  0.62810255, -1.32004343,
       -1.01681461, -0.07190612,  2.71685131,  0.88742413, -0.59954162,
        0.86927108, -0.13355856, -1.16031679, -0.54596922, -1.17149519,
        0.1642507 , -0.39054768, -2.05494313,  1.3799055 ])
Coordinates:
  * time     (time) datetime64[ns] 312B 1982-12-01 1983-12-01 ... 2020-12-01
Attributes: (12/15)
    model:          EOF analysis
    software:       xeofs
    version:        3.0.4
    date:           2025-06-21 03:18:07
    n_modes:        2
    center:         False
    ...             ...
    sample_name:    sample
    feature_name:   feature
    random_state:   None
    compute:        True
    solver:         auto
    solver_kwargs:  {}


The Nino3.4 index is derived from:

  1. Hadley Centre SST dataset

  2. Seasonal mean for DJF period

  3. Area averaging over the Niño 3.4 region (5°N-5°S, 170°W-120°W)

<xarray.DataArray 'Nino34_index' (time: 39)> Size: 156B
array([ 0.05414302,  0.01795494,  0.24856713, -0.08231278, -0.28434724,
       -0.0455286 ,  0.15132348,  0.2527848 ,  0.1607705 ,  0.51797485,
        0.71319145,  0.50263333,  0.10288487,  0.5135583 ,  0.21392706,
       -0.32183036, -0.34634715, -0.31410912, -0.570967  , -0.22579144,
        0.19944632,  0.19656877,  0.35891408, -0.1823232 , -0.4064587 ,
       -0.21349731, -0.35575086, -0.64792055, -0.3729663 , -0.262498  ,
       -0.42721397,  0.36359602,  0.47783482,  0.3777843 ,  0.5817891 ,
        0.5604557 , -0.12119712, -0.23322351, -0.25149944], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 312B 1982-12-01 1983-12-01 ... 2020-12-01
    month    (time) int64 312B 12 12 12 12 12 12 12 12 ... 12 12 12 12 12 12 12
Attributes:
    standard_name:  sea_surface_temperature
    long_name:      sst
    units:          C
    cell_methods:   time: lat: lon: mean


The dependent variable for our regression is prepared as:

  • Seasonal mean SST for September-October-November (SON)

  • Using the offset time range to examine potential lagged relationships

<xarray.DataArray 'sst' (time: 39, lat: 30, lon: 360)> Size: 2MB
array([[[27.609848, 27.626818, 27.69435 , ..., 27.638601, 27.64776 ,
         27.637032],
        [27.979582, 27.987864, 28.057   , ..., 28.017029, 28.030457,
         28.020773],
        [28.206915, 28.224077, 28.293734, ..., 28.309912, 28.25866 ,
         28.227865],
        ...,
        [27.851227, 27.814247, 27.795563, ..., 28.03151 , 27.987627,
         27.92563 ],
        [27.730936, 27.695229, 27.656563, ..., 27.95027 , 27.84813 ,
         27.773071],
        [27.72257 , 27.671392, 27.599901, ..., 27.993307, 27.842398,
         27.749907]],

       [[27.396734, 27.40095 , 27.45458 , ..., 27.41075 , 27.441338,
         27.435522],
        [27.766632, 27.772455, 27.823105, ..., 27.736404, 27.781382,
         27.794256],
        [28.059565, 28.063883, 28.119484, ..., 28.043167, 28.083147,
         28.092148],
...
        [28.654837, 28.630167, 28.609818, ..., 28.773338, 28.744364,
         28.702963],
        [28.67233 , 28.666992, 28.647272, ..., 28.724535, 28.692022,
         28.67267 ],
        [28.66854 , 28.645025, 28.609365, ..., 28.700104, 28.689146,
         28.676376]],

       [[28.383835, 28.371572, 28.373255, ..., 28.342772, 28.391098,
         28.40364 ],
        [28.755556, 28.737017, 28.712332, ..., 28.629898, 28.711897,
         28.756578],
        [29.015615, 29.005974, 28.980433, ..., 28.918898, 28.959196,
         28.99497 ],
        ...,
        [28.526505, 28.477676, 28.350496, ..., 28.564367, 28.53063 ,
         28.516111],
        [28.422213, 28.383987, 28.281607, ..., 28.44673 , 28.415787,
         28.405664],
        [28.376661, 28.32148 , 28.226974, ..., 28.431257, 28.40944 ,
         28.388994]]], shape=(39, 30, 360), dtype=float32)
Coordinates:
  * lat      (lat) float32 120B -14.5 -13.5 -12.5 -11.5 ... 11.5 12.5 13.5 14.5
  * lon      (lon) float32 1kB -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
  * time     (time) datetime64[ns] 312B 1983-09-01 1984-09-01 ... 2021-09-01
Attributes:
    standard_name:  sea_surface_temperature
    long_name:      sst
    units:          C
    cell_methods:   time: lat: lon: mean


The core analysis applies multiple linear regression to quantify how:

  • AO index (first predictor)

  • Niño 3.4 index (second predictor)

jointly explain spatial patterns of SON SST variability.

The function returns a dataset containing:

  • Regression coefficients (slopes) for each predictor

  • Intercept values

  • R-squared values (goodness of fit)

  • Statistical significance (p-values) for each parameter

/home/runner/work/easyclimate/easyclimate/src/easyclimate/core/stat.py:433: UserWarning: All variables must have the same time coordinates
  warnings.warn("All variables must have the same time coordinates")
<xarray.Dataset> Size: 606kB
Dimensions:      (lat: 30, lon: 360, coef: 2)
Coordinates:
  * lat          (lat) float32 120B -14.5 -13.5 -12.5 -11.5 ... 12.5 13.5 14.5
  * lon          (lon) float32 1kB -179.5 -178.5 -177.5 ... 177.5 178.5 179.5
  * coef         (coef) int64 16B 0 1
Data variables:
    slopes       (lat, lon, coef) float64 173kB -0.001755 -0.02993 ... 0.4529
    intercept    (lat, lon) float64 86kB 28.06 28.08 28.11 ... 28.58 28.54 28.5
    r_squared    (lat, lon) float64 86kB 0.001115 0.0004253 ... 0.2671 0.2581
    slopes_p     (lat, lon, coef) float64 173kB 0.9731 0.845 ... 0.7543 0.001605
    intercept_p  (lat, lon) float64 86kB 1.168e-71 1.273e-71 ... 5.079e-74


The final visualization shows:

  • Top panel: Spatial pattern of AO influence on SON SST: Colors show regression coefficients, and Contours indicate statistically significant areas (p < 0.05)

  • Bottom panel: Spatial pattern of Niño 3.4 influence: Similar interpretation as top panel, and the central longitude is set to 200° for Pacific-centric viewing.

Key interpretation points:

  • Positive coefficients indicate SST increases with positive phase of the index

  • Negative coefficients indicate inverse relationships

  • Non-significant areas suggest no robust statistical relationship

fig, ax = ecl.plot.quick_draw_spatial_basemap(nrows=2 ,figsize = (10, 5), central_longitude=200)

result.slopes.sel(coef = 0).plot(
    ax=ax[0],
    transform=ccrs.PlateCarree(),
    cbar_kwargs={"location": "bottom", "pad": 0.2, "aspect": 100, "shrink": 0.8},
)
ecl.plot.draw_significant_area_contourf(
    result.slopes_p.sel(coef = 0),
    ax = ax[0],
    transform=ccrs.PlateCarree()
)

result.slopes.sel(coef = 1).plot(
    ax=ax[1],
    transform=ccrs.PlateCarree(),
    cbar_kwargs={"location": "bottom", "pad": 0.2, "aspect": 100, "shrink": 0.8},
)
ecl.plot.draw_significant_area_contourf(
    result.slopes_p.sel(coef = 1),
    ax = ax[1],
    transform=ccrs.PlateCarree()
)
coef = 0, coef = 1

Total running time of the script: (0 minutes 8.560 seconds)