Basic Statistical Analysis#

Before proceeding with all the steps, first import some necessary libraries and packages

import easyclimate as ecl
import matplotlib.pyplot as plt
import cartopy.crs as ccrs

Obtain the sea ice concentration (SIC) data from the Barents-Kara Seas (30°−90°E, 65°−85°N).

See also

Luo, B., Luo, D., Ge, Y. et al. Origins of Barents-Kara sea-ice interannual variability modulated by the Atlantic pathway of El Niño–Southern Oscillation. Nat Commun 14, 585 (2023). https://doi.org/10.1038/s41467-023-36136-5

sic_data_Barents_Sea = ecl.open_tutorial_dataset("mini_HadISST_ice").sic
sic_data_Barents_Sea
<xarray.DataArray 'sic' (time: 508, lat: 20, lon: 60)> Size: 2MB
[609600 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 4kB 1981-01-31 1981-02-28 ... 2023-04-30
  * lat      (lat) float32 80B 65.5 66.5 67.5 68.5 69.5 ... 81.5 82.5 83.5 84.5
  * lon      (lon) float32 240B 30.5 31.5 32.5 33.5 34.5 ... 86.5 87.5 88.5 89.5
Attributes:
    standard_name:  sea_ice_area_fraction
    long_name:      Monthly 1 degree resolution sea ice concentration
    units:          1
    cell_methods:   time: lat: lon: median


And tropical SST dataset.

See also

Rayner, N. A.; Parker, D. E.; Horton, E. B.; Folland, C. K.; Alexander, L. V.; Rowell, D. P.; Kent, E. C.; Kaplan, A. (2003) Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth century J. Geophys. Res.Vol. 108, No. D14, 4407 https://doi.org/10.1029/2002JD002670 (pdf ~9Mb)

sst_data = ecl.open_tutorial_dataset("mini_HadISST_sst").sst
sst_data
<xarray.DataArray 'sst' (time: 504, lat: 30, lon: 360)> Size: 22MB
[5443200 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 4kB 1981-01-16T12:00:00 ... 2022-12-16T12:...
  * lat      (lat) float32 120B -14.5 -13.5 -12.5 -11.5 ... 11.5 12.5 13.5 14.5
  * lon      (lon) float32 1kB -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
Attributes:
    standard_name:  sea_surface_temperature
    long_name:      sst
    units:          C
    cell_methods:   time: lat: lon: mean


Mean States for Special Month#

easyclimate.get_specific_months_data allows us to easily obtain data on the SIC for December alone.

<xarray.DataArray 'sic' (time: 42, lat: 20, lon: 60)> Size: 202kB
[50400 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 336B 1981-12-31 1982-12-31 ... 2022-12-31
  * lat      (lat) float32 80B 65.5 66.5 67.5 68.5 69.5 ... 81.5 82.5 83.5 84.5
  * lon      (lon) float32 240B 30.5 31.5 32.5 33.5 34.5 ... 86.5 87.5 88.5 89.5
Attributes:
    standard_name:  sea_ice_area_fraction
    long_name:      Monthly 1 degree resolution sea ice concentration
    units:          1
    cell_methods:   time: lat: lon: median


Now we try to draw the mean states of the SIC in the Barents-Kara for the December.

draw_sic_mean_state = sic_data_Barents_Sea_12.mean(dim="time")

fig, ax = plt.subplots(
    subplot_kw={
        "projection": ccrs.Orthographic(central_longitude=70, central_latitude=70)
    }
)

ax.gridlines(draw_labels=["bottom", "left"], color="grey", alpha=0.5, linestyle="--")
ax.coastlines(edgecolor="black", linewidths=0.5)

draw_sic_mean_state.plot.contourf(
    ax=ax,
    # projection on data
    transform=ccrs.PlateCarree(),
    # Colorbar is placed at the bottom
    cbar_kwargs={"location": "right"},
    cmap="Blues",
    levels=21,
)
plot basic statistical analysis
<cartopy.mpl.contour.GeoContourSet object at 0x7f804cfad0c0>

Linear Trend#

As we all know, the area of Arctic sea ice has been decreasing more and more in recent years due to the impact of global warming. We can obtain the change of SIC by solving the linear trend of SIC data from 1981-2022. easyclimate.calc_linregress_spatial can provide the calculation results of solving the linear trend for each grid point.

<xarray.Dataset> Size: 58kB
Dimensions:           (lat: 20, lon: 60)
Coordinates:
  * lat               (lat) float32 80B 65.5 66.5 67.5 68.5 ... 82.5 83.5 84.5
  * lon               (lon) float32 240B 30.5 31.5 32.5 33.5 ... 87.5 88.5 89.5
Data variables:
    slope             (lat, lon) float64 10kB nan nan ... 4.943e-05 0.0002091
    intercept         (lat, lon) float64 10kB nan nan nan ... 0.9792 0.9733
    rvalue            (lat, lon) float64 10kB nan nan nan ... 0.04545 0.1818
    pvalue            (lat, lon) float64 10kB nan nan nan ... 0.775 0.2493
    stderr            (lat, lon) float64 10kB nan nan ... 0.0001718 0.0001788
    intercept_stderr  (lat, lon) float64 10kB nan nan nan ... 0.004091 0.004259


The slope is our desired linear trend, let’s try to plot the linear trend of each grid point.

draw_sic_slope = sic_data_Barents_Sea_12_linear_trend.slope

fig, ax = plt.subplots(
    subplot_kw={
        "projection": ccrs.Orthographic(central_longitude=70, central_latitude=70)
    }
)

ax.gridlines(draw_labels=["bottom", "left"], color="grey", alpha=0.5, linestyle="--")
ax.coastlines(edgecolor="black", linewidths=0.5)

draw_sic_slope.plot.contourf(
    ax=ax,
    transform=ccrs.PlateCarree(),
    cbar_kwargs={"location": "right"},
    cmap="RdBu_r",
    levels=21,
)
plot basic statistical analysis
<cartopy.mpl.contour.GeoContourSet object at 0x7f804e329510>

The pvalue is the corresponding p-value, and we can determine a significance level (e.g., significance level is set to 0.05) in order to plot the region of significance.

draw_sic_pvalue = sic_data_Barents_Sea_12_linear_trend.pvalue

fig, ax = plt.subplots(
    subplot_kw={
        "projection": ccrs.Orthographic(central_longitude=70, central_latitude=70)
    }
)

ax.gridlines(draw_labels=["bottom", "left"], color="grey", alpha=0.5, linestyle="--")
ax.coastlines(edgecolor="black", linewidths=0.5)

ecl.plot.draw_significant_area_contourf(
    draw_sic_pvalue, ax=ax, thresh=0.05, transform=ccrs.PlateCarree()
)
plot basic statistical analysis

Further, we can superimpose the linear trend and the region of significance to study the linear trend of the region of significance (since the linear trend of the region of non-significance is often spurious).

fig, ax = plt.subplots(
    subplot_kw={
        "projection": ccrs.Orthographic(central_longitude=70, central_latitude=70)
    }
)

ax.gridlines(draw_labels=["bottom", "left"], color="grey", alpha=0.5, linestyle="--")
ax.coastlines(edgecolor="black", linewidths=0.5)

# SIC slope
draw_sic_slope.plot.contourf(
    ax=ax,
    transform=ccrs.PlateCarree(),
    cbar_kwargs={"location": "right"},
    cmap="RdBu_r",
    levels=21,
)

# SIC 95% significant level
ecl.plot.draw_significant_area_contourf(
    draw_sic_pvalue, ax=ax, thresh=0.05, transform=ccrs.PlateCarree()
)
plot basic statistical analysis

Regression#

Regression analysis is a statistical technique used to investigate the connection between a dependent variable and one or more independent variables.

It is frequently employed in climatology to analyze trends and patterns in climatic data, identify correlations between different climatic parameters, and create models that can predict future changes. By identifying patterns and connections in massive datasets, regression analysis offers several benefits for weather research. For instance, regression analysis can be used to pinpoint the elements that affect global temperatures, such as solar radiation, atmospheric greenhouse gases, and volcanic eruptions. Climate scientists can create models that can accurately predict future changes by including these variables in a regression model.

Moreover, regression analysis can assist climate experts in spotting natural fluctuations in climate data, like El Niño events, and in determining how human activities like deforestation and fossil fuel combustion affect the environment. Regression analysis can also evaluate the effectiveness of various mitigation tactics, such as carbon pricing policies or renewable energy initiatives.

Overall, regression analysis is a potent tool for analyzing complex climate data and producing reliable projections of upcoming alterations.

See also

In this subsection we try to regress the Niño 3.4 index on the Barents-Kara December SIC data. Before performing the regression analysis, we can see that the longitude range of the SST data is -180°~180°, try to convert the longitude range to 0°~360° using easyclimate.utility.transfer_xarray_lon_from180TO360.

sst_data_0_360 = ecl.utility.transfer_xarray_lon_from180TO360(sst_data)
sst_data_0_360
<xarray.DataArray 'sst' (time: 504, lat: 30, lon: 360)> Size: 22MB
[5443200 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 4kB 1981-01-16T12:00:00 ... 2022-12-16T12:...
  * lat      (lat) float32 120B -14.5 -13.5 -12.5 -11.5 ... 11.5 12.5 13.5 14.5
  * lon      (lon) float32 1kB 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
Attributes:
    standard_name:  sea_surface_temperature
    long_name:      sst
    units:          C
    cell_methods:   time: lat: lon: mean


Further, easyclimate.remove_seasonal_cycle_mean is used to remove the climate state of each month in order to obtain the individual month anomalies. The figure below illustrates the November SST anomaly in the tropical equatorial Pacific during the 1982-83 super El Niño.

See also

Philander, S. Meteorology: Anomalous El Niño of 1982–83. Nature 305, 16 (1983). https://doi.org/10.1038/305016a0

sst_data_anormaly = ecl.remove_seasonal_cycle_mean(sst_data_0_360)

fig, ax = plt.subplots(
    figsize=(10, 4), subplot_kw={"projection": ccrs.PlateCarree(central_longitude=180)}
)

sst_data_anormaly.sel(lon=slice(120, 290)).isel(time=22).plot.contourf(
    ax=ax,
    transform=ccrs.PlateCarree(),
    cbar_kwargs={"location": "bottom", "pad": 0.1},
    cmap="RdBu_r",
    levels=21,
)

ax.gridlines(draw_labels=["bottom", "left"], color="grey", alpha=0.5, linestyle="--")
ax.coastlines(edgecolor="black", linewidths=0.5)
time = 1982-11-16, month = 11
<cartopy.mpl.feature_artist.FeatureArtist object at 0x7f804dff23b0>

The Niño3.4 index is commonly used as an indicator for detecting ENSO, and easyclimate provides easyclimate.field.air_sea_interaction.calc_index_nino34 to calculate the index using SST original dataset.

See also

Anthony G. Bamston, Muthuvel Chelliah & Stanley B. Goldenberg (1997) Documentation of a highly ENSO‐related sst region in the equatorial pacific: Research note, Atmosphere-Ocean, 35:3, 367-383, DOI: https://doi.org/10.1080/07055900.1997.9649597

nino34_monthly_index = ecl.field.air_sea_interaction.calc_index_nino34(sst_data_0_360)

nino34_monthly_index.plot(
    figsize=(8, 3),
)
plot basic statistical analysis
[<matplotlib.lines.Line2D object at 0x7f804dff2410>]

easyclimate.calc_yearly_climatological_mean is then used to solve for the annual average of the monthly index data

nino34_12_index = ecl.get_specific_months_data(nino34_monthly_index, 12)
nino34_dec_yearly_index = ecl.calc_yearly_climatological_mean(nino34_12_index)
nino34_dec_yearly_index
<xarray.DataArray 'Nino34_index' (time: 42)> Size: 168B
array([-0.0535326 ,  2.2554278 , -0.86200035, -1.0462521 , -0.55916315,
        1.08696   ,  0.9741691 , -1.8894404 , -0.06711823,  0.2840484 ,
        1.4100466 ,  0.06721933,  0.16351025,  1.0622212 , -0.6885618 ,
       -0.31378114,  2.3521311 , -1.3440579 , -1.4383495 , -0.71757525,
       -0.21213251,  1.1417981 ,  0.34162298,  0.59405655, -0.5856687 ,
        0.7327775 , -1.5969441 , -0.6306157 ,  1.3947657 , -1.5393288 ,
       -0.895552  , -0.05969566, -0.22111896,  0.6862107 ,  2.4407508 ,
       -0.424509  , -0.74715966,  0.8113012 ,  0.60371643, -1.0101154 ,
       -0.8180708 ,         nan], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 336B 1981-01-01 1982-01-01 ... 2022-01-01
Attributes:
    standard_name:  sea_surface_temperature
    long_name:      sst
    units:          C
    cell_methods:   time: lat: lon: mean


Unlike solving for linear trend without passing in x, regression analysis must use the parameter x to pass in the object to be regressed. Care must be taken to ensure that the time dimensions are identical.

/home/runner/work/easyclimate/easyclimate/src/easyclimate/core/stat.py:104: UserWarning: Assuming that the coordinate value of 'time' in `data_input` and `x` is same. Ignoring.
  warnings.warn(
<xarray.Dataset> Size: 58kB
Dimensions:           (lat: 20, lon: 60)
Coordinates:
  * lat               (lat) float32 80B 65.5 66.5 67.5 68.5 ... 82.5 83.5 84.5
  * lon               (lon) float32 240B 30.5 31.5 32.5 33.5 ... 87.5 88.5 89.5
Data variables:
    slope             (lat, lon) float64 10kB nan nan nan nan ... nan nan nan
    intercept         (lat, lon) float64 10kB nan nan nan nan ... nan nan nan
    rvalue            (lat, lon) float64 10kB nan nan nan nan ... nan nan nan
    pvalue            (lat, lon) float64 10kB nan nan nan nan ... nan nan nan
    stderr            (lat, lon) float64 10kB nan nan nan nan ... nan nan nan
    intercept_stderr  (lat, lon) float64 10kB nan nan nan nan ... nan nan nan


Here is an attempt to plot the results of the regression analysis.

draw_sic_slope = sic_reg_nino34.slope
draw_sic_pvalue = sic_reg_nino34.pvalue

fig, ax = plt.subplots(
    subplot_kw={
        "projection": ccrs.Orthographic(central_longitude=70, central_latitude=70)
    }
)

ax.gridlines(draw_labels=["bottom", "left"], color="grey", alpha=0.5, linestyle="--")
ax.coastlines(edgecolor="black", linewidths=0.5)

draw_sic_slope.plot.contourf(
    ax=ax,
    transform=ccrs.PlateCarree(),
    cbar_kwargs={"location": "right"},
    cmap="RdBu_r",
    levels=21,
)

ecl.plot.draw_significant_area_contourf(
    draw_sic_pvalue, ax=ax, thresh=0.05, transform=ccrs.PlateCarree()
)
plot basic statistical analysis

Detrend#

Sea ice area shows an approximately linear trend of decreasing due to global warming. We remove the linear trend from SIC in order to study the variability of SIC itself. In addition, here we explore the differences between the trend followed by regional averaging and regional averaging followed by detrending approaches.

sic_data_Barents_Sea_12_spatial_mean = sic_data_Barents_Sea_12.mean(dim=("lat", "lon"))
sic_data_Barents_Sea_12_spatial_detrendmean = ecl.calc_detrend_spatial(
    sic_data_Barents_Sea_12, time_dim="time"
).mean(dim=("lat", "lon"))
sic_data_Barents_Sea_12_time_detrendmean = ecl.calc_detrend_spatial(
    sic_data_Barents_Sea_12_spatial_mean, time_dim="time"
)

The results show that there is no significant difference between these two detrending methods to study the variability of SIC in the Barents-Kara Seas.

fig, ax = plt.subplots(2, 1, sharex=True)

sic_data_Barents_Sea_12_spatial_mean.plot(ax=ax[0])
ax[0].set_xlabel("")
ax[0].set_title("Original")


sic_data_Barents_Sea_12_spatial_detrendmean.plot(
    ax=ax[1], label="detrend -> spatial mean"
)
sic_data_Barents_Sea_12_time_detrendmean.plot(
    ax=ax[1], ls="--", label="spatial mean -> detrend"
)
ax[1].set_xlabel("")
ax[1].set_title("Detrend")
ax[1].legend()
Original, Detrend
<matplotlib.legend.Legend object at 0x7f804d83ca00>

Weighted Spatial Data#

When calculating regional averages in high-latitude areas, considering weights is necessary because regions at different latitudes cover unequal surface areas on the Earth. Since the Earth is approximately a spheroid, areas closer to the poles have a different distribution of surface area on the spherical surface.

One common way to incorporate weights is by using the cosine of latitude, i.e., multiplying by \(\cos (\varphi)\), where \(\varphi\) represents the latitude of a location. This is because areas at higher latitudes, close to the poles, have higher latitudes and smaller cosine values, allowing for a smaller weight to be applied to these regions when calculating averages.

In summary, considering weights is done to more accurately account for the distribution of surface area on the Earth, ensuring that contributions from different regions are weighted according to their actual surface area when calculating averages or other regional statistical measures.

easyclimate.utility.get_weighted_spatial_data can help us create an xarray.core.weighted.DataArrayWeighted object. This object will automatically consider and calculate weights in subsequent area operations, thereby achieving the operation of the weighted spatial average.

sic_data_Barents_Sea_12_detrend = ecl.calc_detrend_spatial(
    sic_data_Barents_Sea_12, time_dim="time"
)
grid_detrend_data_weighted_obj = ecl.utility.get_weighted_spatial_data(
    sic_data_Barents_Sea_12_detrend, lat_dim="lat", lon_dim="lon"
)
print(type(grid_detrend_data_weighted_obj))
<class 'xarray.core.weighted.DataArrayWeighted'>

Solve for regional averaging for grid_detrend_data_weighted_obj objects (the role of weights is considered at this point)

<xarray.DataArray 'sic' (time: 42)> Size: 168B
array([ 0.00733505, -0.03790337,  0.00323731, -0.07254851, -0.01491249,
        0.00765501, -0.00645751,  0.04360954, -0.0093088 , -0.00049735,
       -0.02658244, -0.00516641, -0.01209194, -0.0102047 ,  0.02112157,
       -0.01286527,  0.07362372,  0.10449733,  0.00810461,  0.00628584,
       -0.0582574 ,  0.08276358,  0.08993486,  0.02257021, -0.04379203,
       -0.02342463, -0.04546626, -0.01492424,  0.03098435,  0.08953879,
       -0.04640904, -0.12334438,  0.03096083,  0.08812726, -0.04354496,
       -0.1620483 , -0.02880639, -0.05406104,  0.10325759, -0.0727714 ,
        0.11438236, -0.00260423], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 336B 1981-12-31 1982-12-31 ... 2022-12-31
Attributes:
    standard_name:  sea_ice_area_fraction
    long_name:      Monthly 1 degree resolution sea ice concentration
    units:          1
    cell_methods:   time: lat: lon: median


We can find some differences between the data considering latitude weights and those not considering latitude weights.

fig, ax = plt.subplots(2, 1, sharex=True)

sic_data_Barents_Sea_12_spatial_mean.plot(ax=ax[0])
ax[0].set_xlabel("")
ax[0].set_title("Original")


sic_data_Barents_Sea_12_spatial_detrendmean.plot(ax=ax[1], label="Regular mean")
sic_data_Barents_Sea_12_spatial_detrend_weightedmean.plot(
    ax=ax[1], ls="--", label="Weighted mean"
)
ax[1].set_xlabel("")
ax[1].set_title("Detrend")
ax[1].legend()
fig.show()
Original, Detrend

Skewness#

Skewness is a measure of the asymmetry of a probability distribution.

It quantifies the extent to which a distribution deviates from being symmetric around its mean. A distribution with a skewness value of zero is considered symmetric, meaning that it has equal probabilities of occurring above and below its mean. However, when the skewness value is non-zero, the distribution becomes asymmetric, indicating that there is a higher likelihood of occurrence on one side of the mean than the other.

Skewness can be positive or negative, depending on whether the distribution is skewed towards larger or smaller values.

In climate analysis, skewness can arise due to various factors such as changes in atmospheric circulation patterns, uneven temperature or precipitation distributions, or differences in measurement instruments.

See also

The skewness is calculated using easyclimate.calc_skewness_spatial. The result of the calculation contains the skewness and p-value.

sic_data_Barents_Sea_12_detrend = ecl.calc_detrend_spatial(
    sic_data_Barents_Sea_12, time_dim="time"
)
sic_data_Barents_Sea_12_skew = ecl.calc_skewness_spatial(
    sic_data_Barents_Sea_12_detrend, dim="time"
)
sic_data_Barents_Sea_12_skew
<xarray.Dataset> Size: 10kB
Dimensions:   (lat: 20, lon: 60)
Coordinates:
  * lat       (lat) float32 80B 65.5 66.5 67.5 68.5 69.5 ... 81.5 82.5 83.5 84.5
  * lon       (lon) float32 240B 30.5 31.5 32.5 33.5 ... 86.5 87.5 88.5 89.5
Data variables:
    skewness  (lat, lon) float32 5kB nan nan nan nan ... 0.06856 0.1416 0.07797
    pvalue    (lat, lon) float32 5kB nan nan nan nan ... 0.1773 0.01788 0.4496


fig, ax = plt.subplots(
    subplot_kw={
        "projection": ccrs.Orthographic(central_longitude=70, central_latitude=70)
    }
)

ax.gridlines(draw_labels=["bottom", "left"], color="grey", alpha=0.5, linestyle="--")
ax.coastlines(edgecolor="black", linewidths=0.5)

# SIC slope
sic_data_Barents_Sea_12_skew.skewness.plot.contourf(
    ax=ax,
    transform=ccrs.PlateCarree(),
    cbar_kwargs={"location": "right"},
    cmap="RdBu_r",
    levels=21,
)

# SIC 95% significant level
ecl.plot.draw_significant_area_contourf(
    sic_data_Barents_Sea_12_skew.pvalue,
    ax=ax,
    thresh=0.05,
    transform=ccrs.PlateCarree(),
)
plot basic statistical analysis

Kurtosis#

Kurtosis is a measure of the “tailedness” of a probability distribution. It describes how heavy or light the tails of a distribution are relative to a standard normal distribution. A distribution with high kurtosis has heavier tails and a greater propensity for extreme events, whereas a distribution with low kurtosis has lighter tails and fewer extreme events. Kurtosis is particularly useful in climate analysis because it can reveal information about the frequency and intensity of extreme weather events such as hurricanes, droughts, or heatwaves.

See also

The skewness is calculated using easyclimate.calc_kurtosis_spatial.

<xarray.DataArray 'sic' (lat: 20, lon: 60)> Size: 5kB
array([[       nan,        nan,        nan, ...,        nan,        nan,
               nan],
       [       nan,        nan,        nan, ...,        nan,        nan,
               nan],
       [       nan,        nan,        nan, ...,        nan,        nan,
               nan],
       ...,
       [16.684706 , 18.235487 , 15.48351  , ...,  2.6224325,  2.4516895,
         3.073731 ],
       [11.324911 , 11.565623 , 10.90156  , ...,  2.3805072,  2.4513185,
         2.1890404],
       [ 2.8223338,  3.5768723,  3.2213914, ...,  2.0441246,  1.835163 ,
         2.2185273]], dtype=float32)
Coordinates:
  * lat      (lat) float32 80B 65.5 66.5 67.5 68.5 69.5 ... 81.5 82.5 83.5 84.5
  * lon      (lon) float32 240B 30.5 31.5 32.5 33.5 34.5 ... 86.5 87.5 88.5 89.5
Attributes:
    standard_name:  sea_ice_area_fraction
    long_name:      Monthly 1 degree resolution sea ice concentration
    units:          1
    cell_methods:   time: lat: lon: median


Consider plotting kurtosis below

fig, ax = plt.subplots(
    subplot_kw={
        "projection": ccrs.Orthographic(central_longitude=70, central_latitude=70)
    }
)

ax.gridlines(draw_labels=["bottom", "left"], color="grey", alpha=0.5, linestyle="--")
ax.coastlines(edgecolor="black", linewidths=0.5)

# SIC slope
sic_data_Barents_Sea_12_kurt.plot.contourf(
    ax=ax,
    transform=ccrs.PlateCarree(),
    cbar_kwargs={"location": "right"},
    cmap="RdBu_r",
    levels=21,
)
plot basic statistical analysis
<cartopy.mpl.contour.GeoContourSet object at 0x7f804d6d16c0>

Composite Analysis#

In the process of climate analysis, composite analysis is a statistical and integrative method used to study the spatial and temporal distribution of specific climate events or phenomena. The primary purpose of this analysis method is to identify common features and patterns among multiple events or time periods by combining their information.

Specifically, the steps of composite analysis typically include the following aspects:

  1. Event Selection: Firstly, a set of events related to the research objective is chosen. These events could be specific climate phenomena such as heavy rainfall, drought, or temperature anomalies.

  2. Data Collection: Collect meteorological data related to the selected events, including observational data, model outputs, or remote sensing data.

  3. Event Alignment: Time-align the chosen events to ensure that they are within the same temporal framework for analysis.

  4. Data Combination: Combine data values at corresponding time points into a composite dataset. Averages or weighted averages are often used to reduce the impact of random noise.

The advantages of this method include:

  • Highlighting Common Features: By combining data from multiple events or time periods, composite analysis can highlight common features, aiding in the identification of general patterns in climate events.

  • Noise Reduction: By averaging data, composite analysis helps to reduce the impact of random noise, resulting in more stable and reliable analysis outcomes.

  • Spatial Consistency: Through spatial averaging, this method helps reveal the consistent spatial distribution of climate events, providing a more comprehensive understanding.

  • Facilitating Comparisons: Composite analysis makes it convenient to compare different events or time periods as it integrates them into a unified framework.

Here we try to extract the El Niño and La Niña events using the standard deviation of the Niño 3.4 index as a threshold.

array(1.0746759, dtype=float32)

easyclimate.get_year_exceed_index_upper_bound is able to obtain the years that exceed the upper bound of the Niño 3.4 exponential threshold, and similarly easyclimate.get_year_exceed_index_lower_bound can obtain the years that exceed the lower bound of the exponential threshold.

elnino_year = ecl.get_year_exceed_index_upper_bound(
    nino34_dec_yearly_index, thresh=nino34_dec_yearly_index_std
)
lanina_year = ecl.get_year_exceed_index_lower_bound(
    nino34_dec_yearly_index, thresh=-nino34_dec_yearly_index_std
)
print("El niño years: ", elnino_year)
print("La niña years: ", lanina_year)
El niño years:  [1982 1986 1991 1997 2002 2009 2015]
La niña years:  [1988 1998 1999 2007 2010]

Further we use easyclimate.get_specific_years_data to extract data for the El Niño years within sic_data_Barents_Sea_12_detrend. The results show that six temporal levels were extracted.

<xarray.DataArray 'El niño years' (time: 7, lat: 20, lon: 60)> Size: 34kB
array([[[        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        ...,
        [ 0.04322249,  0.01938468,  0.04936236, ..., -0.00439435,
         -0.00613469, -0.00160259],
        [-0.01840574, -0.01017874,  0.00120759, ..., -0.00662297,
         -0.00390106, -0.011338  ],
        [-0.00223053, -0.01166272, -0.0034039 , ..., -0.01167816,
          0.00072545, -0.0035426 ]],

       [[        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
...
        [ 0.04887038,  0.04507154,  0.04859674, ...,  0.00190991,
          0.00191987,  0.00413281],
        [ 0.03411204,  0.03380483,  0.03195328, ...,  0.00738251,
          0.00765401,  0.00765753],
        [-0.00430715, -0.00485522,  0.00482559, ...,  0.01009589,
          0.00939119,  0.01081306]],

       [[        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        ...,
        [-0.04431903, -0.0514425 , -0.04157346, ...,  0.00886643,
          0.0092653 ,  0.0109629 ],
        [-0.01755071, -0.00753218, -0.01010329, ..., -0.0039497 ,
         -0.00422269, -0.00478798],
        [-0.01699084, -0.01778686, -0.02779007, ..., -0.01173216,
         -0.01090533, -0.0104413 ]]], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 56B 1982-12-31 1986-12-31 ... 2015-12-31
  * lat      (lat) float32 80B 65.5 66.5 67.5 68.5 69.5 ... 81.5 82.5 83.5 84.5
  * lon      (lon) float32 240B 30.5 31.5 32.5 33.5 34.5 ... 86.5 87.5 88.5 89.5
Attributes:
    standard_name:  sea_ice_area_fraction
    long_name:      Monthly 1 degree resolution sea ice concentration
    units:          1
    cell_methods:   time: lat: lon: median


Similarly, 5 temporal levels were extracted for the La Niña years.

<xarray.DataArray 'La niña years' (time: 5, lat: 20, lon: 60)> Size: 24kB
array([[[           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        ...,
        [ 7.0033133e-02,  5.2870691e-02,  5.9192181e-02, ...,
          2.2562087e-02,  2.1210730e-02,  1.5227437e-02],
        [ 5.9931576e-02,  5.8484256e-02,  4.9151063e-02, ...,
          2.2044897e-02,  2.4222255e-02,  2.6216567e-02],
        [ 5.5085778e-02,  5.5405617e-02,  5.3980470e-02, ...,
          2.6493847e-02,  2.0428896e-02,  2.5203109e-02]],

       [[           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
...
        [-6.6518784e-05,  3.9095283e-03,  1.8653393e-02, ...,
          2.9244423e-03,  2.8046966e-03,  5.1894188e-03],
        [ 2.4666309e-02,  2.4250507e-02,  2.2638798e-02, ...,
          1.7826498e-02,  1.8279552e-02,  1.8472672e-02],
        [-1.3412595e-02, -3.8779974e-03, -4.3025017e-03, ...,
          2.0705163e-02,  1.9490004e-02,  2.1231115e-02]],

       [[           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        ...,
        [ 6.8338752e-02,  6.5652549e-02,  6.8568349e-02, ...,
          1.4026761e-03,  1.4774799e-03,  3.6044717e-03],
        [ 4.3834925e-02,  4.3582022e-02,  4.1610479e-02, ...,
          7.1604848e-03,  7.3412657e-03,  7.2499514e-03],
        [ 3.5245597e-02,  3.4656227e-02,  3.4389675e-02, ...,
          9.7911954e-03,  9.3417764e-03,  1.0603964e-02]]], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 40B 1988-12-31 1998-12-31 ... 2010-12-31
  * lat      (lat) float32 80B 65.5 66.5 67.5 68.5 69.5 ... 81.5 82.5 83.5 84.5
  * lon      (lon) float32 240B 30.5 31.5 32.5 33.5 34.5 ... 86.5 87.5 88.5 89.5
Attributes:
    standard_name:  sea_ice_area_fraction
    long_name:      Monthly 1 degree resolution sea ice concentration
    units:          1
    cell_methods:   time: lat: lon: median


We now plot the distribution of SIC during El Niño and La Niña years.

fig, ax = plt.subplots(
    1,
    2,
    subplot_kw={
        "projection": ccrs.Orthographic(central_longitude=70, central_latitude=70)
    },
    figsize=(10, 4),
)

for axi in ax.flat:
    axi.gridlines(
        draw_labels=["bottom", "left"], color="grey", alpha=0.5, linestyle="--"
    )
    axi.coastlines(edgecolor="black", linewidths=0.5)

sic_data_Barents_Sea_12_detrend_elnino.mean(dim="time").plot.contourf(
    ax=ax[0],
    transform=ccrs.PlateCarree(),
    cbar_kwargs={"location": "bottom"},
    cmap="RdBu_r",
    levels=21,
)
ax[0].set_title("El niño years")

sic_data_Barents_Sea_12_detrend_lanina.mean(dim="time").plot.contourf(
    ax=ax[1],
    transform=ccrs.PlateCarree(),
    cbar_kwargs={"location": "bottom"},
    cmap="RdBu_r",
    levels=21,
)
ax[1].set_title("La niña years")
El niño years, La niña years
Text(0.5, 1.0, 'La niña years')

So is there a significant difference in the distribution of SIC between these two events? easyclimate.calc_ttestSpatialPattern_spatial provides a two-sample t-test operation to investigate whether there is a significant difference between the means of the two samples.

<xarray.Dataset> Size: 20kB
Dimensions:    (lat: 20, lon: 60)
Coordinates:
  * lat        (lat) float32 80B 65.5 66.5 67.5 68.5 ... 81.5 82.5 83.5 84.5
  * lon        (lon) float32 240B 30.5 31.5 32.5 33.5 ... 86.5 87.5 88.5 89.5
Data variables:
    statistic  (lat, lon) float64 10kB nan nan nan nan ... -4.9 -2.239 -2.086
    pvalue     (lat, lon) float64 10kB nan nan nan ... 0.0006233 0.04909 0.06352


We can find that there is little difference in the effect on SIC under different ENSO events.

fig, ax = plt.subplots(
    subplot_kw={
        "projection": ccrs.Orthographic(central_longitude=70, central_latitude=70)
    }
)

ax.gridlines(draw_labels=["bottom", "left"], color="grey", alpha=0.5, linestyle="--")
ax.coastlines(edgecolor="black", linewidths=0.5)

# SIC slope
diff = sic_data_Barents_Sea_12_detrend_lanina.mean(
    dim="time"
) - sic_data_Barents_Sea_12_detrend_elnino.mean(dim="time")
diff.plot.contourf(
    ax=ax,
    transform=ccrs.PlateCarree(),
    cbar_kwargs={"location": "right"},
    cmap="RdBu_r",
    levels=21,
)

ax.set_title("La niña minus El niño", loc="left")

ecl.plot.draw_significant_area_contourf(
    sig_diff.pvalue, ax=ax, thresh=0.1, transform=ccrs.PlateCarree()
)
La niña minus El niño

Total running time of the script: (0 minutes 14.929 seconds)