8 Ways To Calculate Correlation Between Two Time Series In Python

Analyzing correlations is a critical step in understanding complex data relationships.

It’s a fast way to find how similar two time series are.

Python offers a wide range of libraries that make calculating correlations between two time series a breeze.

In this tutorial, we’ll explore some of the most popular libraries for correlation analysis, including NumPy, Pandas, Scipy, Polars, CuPy, CuDF, PyTorch, and Dask.

Let’s get started!

Correlation Between Two Time Series Using NumPy

NumPy is the most popular Python library for numerical computing.

To compute the correlation between two time series, we can use the np.corrcoef function.

import numpy as np

x = np.random.randn(100)
y = np.random.randn(100)

corr_coef = np.corrcoef(x, y)
print("Correlation coefficient:", corr_coef)

This function calculates the Pearson correlation coefficient.

Just pass the two time series to the np.corrcoef function, and it will return the correlation matrix.

Correlation Between Two Time Series Using Pandas

Pandas is a popular Python library for data analysis built on top of NumPy.

To compute the correlation between two time series that are columns in a Pandas DataFrame, we can use the DataFrame.corr method.

import pandas as pd

df = pd.DataFrame({'x': np.random.randn(100), 'y': np.random.randn(100)})

corr_matrix = df.corr()
print("Correlation matrix:")
print(corr_matrix)

	x	y
x	1	-0.0592785
y	-0.0592785	1

This will calculate the correlation between all pairs of columns in the DataFrame.

If you have two Pandas Series, you can use the Series.corr method to calculate the correlation between them.

The series must have the same index, because Pandas will align the values based on it.

import pandas as pd
import numpy as np

series1 = pd.Series(np.random.randn(100))
series2 = pd.Series(np.random.randn(100))

series1.corr(series2)

If you want to calculate the correlation between a DataFrame and a Series, you can use the DataFrame.corrwith method.

import pandas as pd
import numpy as np

df = pd.DataFrame({'x': np.random.randn(100), 'y': np.random.randn(100)})
series = pd.Series(np.random.randn(100))

df.corrwith(series)

	pearson
x	-0.0256518
y	0.20236

By default, Pandas uses the Pearson correlation. To calculate the Spearman or Kendall correlation between two time series, you can use the method argument in any of the functions above.

df.corrwith(series, method='spearman')

	spearman
x	-0.0158176
y	0.188407

df.corrwith(series, method='kendall')

	kendall
x	-0.0109091
y	0.129697

Correlation Between Two Time Series Using Scipy

Another way to calculate the correlation between two time series is to use the scipy.stats module.

We can use the pearsonr function to calculate the Pearson correlation, the spearmanr function for the Spearman, and the kendalltau function to calculate the Kendall correlation coefficient.

from scipy.stats import pearsonr, spearmanr, kendalltau

x = np.random.randn(100)
y = np.random.randn(100)

pearson_coef, _ = pearsonr(x, y)
print("Pearson correlation coefficient:", pearson_coef)

spearman_coef, _ = spearmanr(x, y)
print("Spearman correlation coefficient:", spearman_coef)

kendall_coef, _ = kendalltau(x, y)
print("Kendall correlation coefficient:", kendall_coef)

Correlation Between Two Time Series Using Polars

Polars is a new Python library built on top of Rust that is gaining popularity for data analysis for its speed and ease of use.

You have basically the same functionality as Pandas, but with a much faster performance.

import polars as pl

df = pl.DataFrame({'x': pl.Series(np.random.randn(100)), 'y': pl.Series(np.random.randn(100))})

corr = df.select(pl.corr('x', 'y'))
print(corr)

x
f64
———-
0.171804

To get the Spearman correlation, you can use the argument method in the pl.corr function.

df.select(pl.corr('x', 'y', method='spearman'))

x
f64
———-
0.141122

Correlation Between Two Time Series Using CuPy

If you have a GPU, you can use CuPy to calculate the correlation between two time series.

It’s a library inspired by NumPy that uses the GPU to accelerate the calculations, so you can expect very similar function names.

Always try the same Numpy function name with CuPy to see if it works.

Here we can use the cp.corrcoef function.

import cupy as cp

x = cp.random.randn(100)
y = cp.random.randn(100)

corr_coef = cp.corrcoef(x, y)[0, 1]
print("Correlation coefficient:", corr_coef)

Correlation Between Two Time Series Using CuDF

Just like you can think of CuPy as a GPU version of NumPy, you can think of CuDF as a GPU version of Pandas.

We can easily compute the correlation between two time series that are columns in a CuDF DataFrame with the DataFrame.corr method.

import cudf
import cupy as cp

df = cudf.DataFrame({'x': cp.random.randn(100), 'y': cp.random.randn(100)})

corr_matrix = df.corr()
print("Correlation matrix:")
print(corr_matrix)

Like in Pandas, this will calculate the correlation between all pairs of columns in the DataFrame.

If you have two CuDF Series, you can use the Series.corr method to calculate the correlation between them.

series1 = cudf.Series(cp.random.randn(100))
series2 = cudf.Series(cp.random.randn(100))

series1.corr(series2)

By default, CuDF uses the Pearson correlation, but it has the same method argument as Pandas to calculate the Spearman correlation.

df.corr(method='spearman')

Correlation Between Two Time Series Using Dask

Another library inspired by Pandas is Dask.

It’s a library that allows you to scale your Pandas code to work with datasets that don’t fit in memory.

To calculate the correlation between two time series, you can use the dask.dataframe.corr function.

import dask.dataframe as dd
import pandas as pd
import numpy as np

pandas_df = pd.DataFrame({'x': np.random.randn(100), 'y': np.random.randn(100)})
df = dd.from_pandas(pandas_df, npartitions=2)

corr_matrix = df.corr()
print("Correlation matrix:")
print(corr_matrix.compute())

	x	y
x	1	0.101782
y	0.101782	1

Correlation Between Two Time Series Using PyTorch

PyTorch has a simple torch.corrcoef function that you can use to calculate the correlation between two time series.

import torch

x = torch.randn((100,2))

corr_coef = torch.corrcoef(x.T)

Different than the other libraries, this function calculates the correlation between rows, not columns.

So if your series are in columns, you need to transpose the matrix before passing it to the function.

Correlation Between Two Time Series Using NumPy#

Correlation Between Two Time Series Using Pandas#

Correlation Between Two Time Series Using Scipy#

Correlation Between Two Time Series Using Polars#

Correlation Between Two Time Series Using CuPy#

Correlation Between Two Time Series Using CuDF#

Correlation Between Two Time Series Using Dask#

Correlation Between Two Time Series Using PyTorch#