Multiple Variable Linear Regression

This documentation demonstrates a comprehensive analysis of sea surface temperature (SST) variability explained by two climate indices (AO and Niño 3.4) using multiple linear regression. The analysis covers data preparation, index calculation, regression modeling, and visualization of results.

\[y = a_1 x_1 + a_2 x_2 + b\]

Before proceeding with all the steps, first import some necessary libraries and packages

import xarray as xr
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import easyclimate as ecl

Two time ranges are defined to account for seasonal analysis with different base periods

  • time_range: Primary analysis period (1982-2020)

  • time_range_plus1: Offset by one year for seasonal calculations

time_range = slice("1982-01-01", "2020-12-31")
time_range_plus1 = slice("1983-01-01", "2021-12-31")

The Arctic Oscillation index is calculated using:

  1. Sea level pressure (SLP) data from the Northern Hemisphere

  2. Seasonal mean calculation for December-January-February (DJF)

  3. EOF analysis following Thompson & Wallace (1998) methodology

The resulting index is then subset to our analysis period.

slp_monmean_NH.nc ━━━━━━━━━━━━━━━━━━━━ 100.0% • 3.3/3.3 MB • 21.3 MB/s • 0:00:00
/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/xeofs/preprocessing/multi_index_converter.py:49: FutureWarning: Deleting a single level of a MultiIndex is deprecated. Previously, this deleted all levels of a MultiIndex. Please also drop the following variables: {'dim2', 'dim1'} to avoid an error in the future.
  X_transformed = X_transformed.drop_vars(dim)
<xarray.DataArray 'PC' (time: 39)> Size: 312B
array([-0.11359502, -0.04278451,  0.1251718 ,  0.1180264 ,  0.01372303,
        0.01148666, -0.30164561, -0.13134693, -0.07526516, -0.19079561,
       -0.23400238,  0.02196399, -0.11893917,  0.06695014, -0.05743778,
        0.0068418 , -0.08987437, -0.14630476,  0.08800577, -0.091502  ,
        0.00138097,  0.02114858, -0.02929297,  0.07069781, -0.14965371,
       -0.11538558, -0.00784751,  0.30721541,  0.10058031, -0.06837294,
        0.09842305, -0.01502733, -0.13184147, -0.06253175, -0.1326965 ,
        0.01857373, -0.04430921, -0.23259293,  0.15605242])
Coordinates:
  * time     (time) datetime64[ns] 312B 1982-12-01 1983-12-01 ... 2020-12-01
    month    (time) float64 312B 12.0 12.0 12.0 12.0 ... 12.0 12.0 12.0 12.0
    mode     int64 8B 1
Attributes: (12/15)
    model:          EOF analysis
    software:       xeofs
    version:        3.0.4
    date:           2026-05-17 10:10:28
    n_modes:        2
    center:         False
    ...             ...
    sample_name:    sample
    feature_name:   feature
    random_state:   None
    compute:        True
    solver:         auto
    solver_kwargs:  {}


Plot the AO index

AO index
Text(0.5, 1.0, 'AO index')

The Nino3.4 index is derived from:

  1. Hadley Centre SST dataset

  2. Seasonal mean for DJF period

  3. Area averaging over the Niño 3.4 region (5°N-5°S, 170°W-120°W)

mini_HadISST_sst.nc ━━━━━━━━━━━━━━━━ 100.0% • 10.8/10.8 MB • 63.2 MB/s • 0:00:00
<xarray.DataArray 'Nino34_index' (time: 39)> Size: 156B
array([        nan, -0.0124734 ,  0.21813878, -0.11274114, -0.3147756 ,
       -0.07595692,  0.12089519,  0.22235654,  0.13034225,  0.48754653,
        0.68276316,  0.47220498,  0.07245654,  0.48312995,  0.18349876,
       -0.35225862, -0.37677544, -0.3445374 , -0.6013953 , -0.2562197 ,
        0.169018  ,  0.16614044,  0.3284857 , -0.21275155, -0.43688706,
       -0.24392569, -0.38617924, -0.6783489 , -0.40339458, -0.29292628,
       -0.4576423 ,  0.3331677 ,  0.44740644,  0.34735596,  0.5513607 ,
        0.5300274 , -0.15162548, -0.2636519 ,         nan], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 312B 1982-12-01 1983-12-01 ... 2020-12-01
    month    (time) int64 312B 12 12 12 12 12 12 12 12 ... 12 12 12 12 12 12 12
Attributes:
    standard_name:  sea_surface_temperature
    long_name:      sst
    units:          C
    cell_methods:   time: lat: lon: mean


Plot the Niño 3.4 index

Niño 3.4 index
Text(0.5, 1.0, 'Niño 3.4 index')

The dependent variable for our regression is prepared as:

  • Seasonal mean SST for September-October-November (SON)

  • Using the offset time range to examine potential lagged relationships

<xarray.DataArray 'sst' (time: 39, lat: 30, lon: 360)> Size: 2MB
array([[[27.609846, 27.626818, 27.694347, ..., 27.6386  , 27.64776 ,
         27.637032],
        [27.979582, 27.987865, 28.057   , ..., 28.01703 , 28.030458,
         28.020773],
        [28.206915, 28.224077, 28.293734, ..., 28.30991 , 28.25866 ,
         28.227865],
        ...,
        [27.851225, 27.814247, 27.79556 , ..., 28.031511, 27.987627,
         27.925632],
        [27.730938, 27.695229, 27.656563, ..., 27.95027 , 27.84813 ,
         27.77307 ],
        [27.72257 , 27.671392, 27.599901, ..., 27.993307, 27.842398,
         27.749905]],

       [[27.396732, 27.400951, 27.45458 , ..., 27.41075 , 27.441338,
         27.43552 ],
        [27.76663 , 27.772455, 27.823105, ..., 27.736406, 27.781384,
         27.794256],
        [28.059566, 28.063883, 28.119484, ..., 28.043169, 28.083147,
         28.092148],
...
        [28.654837, 28.630167, 28.60982 , ..., 28.773338, 28.744364,
         28.702961],
        [28.67233 , 28.666992, 28.647274, ..., 28.724535, 28.69202 ,
         28.67267 ],
        [28.66854 , 28.645025, 28.609365, ..., 28.700104, 28.689146,
         28.676378]],

       [[28.383835, 28.371572, 28.373255, ..., 28.342772, 28.391098,
         28.403643],
        [28.755556, 28.737017, 28.712332, ..., 28.6299  , 28.711895,
         28.756578],
        [29.015614, 29.005974, 28.98043 , ..., 28.918898, 28.959194,
         28.99497 ],
        ...,
        [28.526505, 28.477678, 28.350496, ..., 28.564365, 28.53063 ,
         28.516111],
        [28.422213, 28.383987, 28.281607, ..., 28.44673 , 28.415787,
         28.405664],
        [28.376661, 28.32148 , 28.226974, ..., 28.431257, 28.40944 ,
         28.388994]]], shape=(39, 30, 360), dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 312B 1982-12-01 1983-12-01 ... 2020-12-01
    month    (time) float64 312B 12.0 12.0 12.0 12.0 ... 12.0 12.0 12.0 12.0
  * lat      (lat) float32 120B -14.5 -13.5 -12.5 -11.5 ... 11.5 12.5 13.5 14.5
  * lon      (lon) float32 1kB -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
    mode     int64 8B 1
Attributes:
    standard_name:  sea_surface_temperature
    long_name:      sst
    units:          C
    cell_methods:   time: lat: lon: mean


The core analysis applies multiple linear regression to quantify how:

  • AO index (first predictor)

  • Niño 3.4 index (second predictor)

jointly explain spatial patterns of SON SST variability.

The function returns a dataset containing:

  • Regression coefficients (slopes) for each predictor

  • Intercept values

  • R-squared values (goodness of fit)

  • Statistical significance (p-values) for each parameter

<xarray.Dataset> Size: 606kB
Dimensions:      (lat: 30, lon: 360, coef: 2)
Coordinates:
  * lat          (lat) float32 120B -14.5 -13.5 -12.5 -11.5 ... 12.5 13.5 14.5
  * lon          (lon) float32 1kB -179.5 -178.5 -177.5 ... 177.5 178.5 179.5
  * coef         (coef) int64 16B 0 1
    mode         int64 8B 1
Data variables:
    slopes       (lat, lon, coef) float64 173kB -0.2015 -0.02943 ... 0.4433
    intercept    (lat, lon) float64 86kB 28.06 28.08 28.1 ... 28.61 28.57 28.53
    r_squared    (lat, lon) float64 86kB 0.005617 0.00266 ... 0.3082 0.3067
    slopes_p     (lat, lon, coef) float64 173kB 0.6666 0.845 ... 0.0008305
    intercept_p  (lat, lon) float64 86kB 7.685e-68 9.51e-68 ... 3.241e-71


The final visualization shows:

  • Top panel: Spatial pattern of AO influence on SON SST: Colors show regression coefficients, and Contours indicate statistically significant areas (p < 0.05)

  • Bottom panel: Spatial pattern of Niño 3.4 influence: Similar interpretation as top panel, and the central longitude is set to 200° for Pacific-centric viewing.

Key interpretation points:

  • Positive coefficients indicate SST increases with positive phase of the index

  • Negative coefficients indicate inverse relationships

  • Non-significant areas suggest no robust statistical relationship

fig, ax = ecl.plot.quick_draw_spatial_basemap(nrows=2 ,figsize = (10, 5), central_longitude=200)

result.slopes.sel(coef = 0).plot(
    ax=ax[0],
    transform=ccrs.PlateCarree(),
    cbar_kwargs={"location": "bottom", "pad": 0.2, "aspect": 100, "shrink": 0.8},
)
ecl.plot.draw_significant_area_contourf(
    result.slopes_p.sel(coef = 0),
    ax = ax[0],
    transform=ccrs.PlateCarree()
)
ax[0].set_title("$a_1$: (AO index coefficient)")

result.slopes.sel(coef = 1).plot(
    ax=ax[1],
    transform=ccrs.PlateCarree(),
    cbar_kwargs={"location": "bottom", "pad": 0.2, "aspect": 100, "shrink": 0.8},
)
ecl.plot.draw_significant_area_contourf(
    result.slopes_p.sel(coef = 1),
    ax = ax[1],
    transform=ccrs.PlateCarree()
)
ax[1].set_title("$a_2$: (Niño 3.4 index coefficient)")
$a_1$: (AO index coefficient), $a_2$: (Niño 3.4 index coefficient)
Text(0.5, 1.0, '$a_2$: (Niño 3.4 index coefficient)')

Total running time of the script: (0 minutes 7.794 seconds)