¶
Atmospheric Toolbox - Basics of HARP functionalities¶
In this example some of the basic functionalities of the ESA Atmospheric Toolbox to handle TROPOMI data are demonstrated. This case focuses on the use of toolbox's HARP component in Python, implemented as a Jupyter notebook.
The ESA Copernicus TROPOMI instrument onboard Sentinel 5p satellite observes atmospheric constituents at very high spatial resolution. In this tutorial we will demonstrate basic data reading and plotting procedures using TROPOMI SO2 observations. We use observations that were obtained during the explosive eruption of La Soufriere volcano in the Caribbean in April 2021. The eruption released large amounts of SO2 into the atmosphere, resulting extensive volcanic SO2 plumes that were transported long distances. This notebook will demonstrate how this event can be visualized using TROPOMI SO2 observations and HARP.
In the steps below this tutorial shows
- basic data reading using HARP
- how to plot single orbit TROPOMI data on a map, and
- how to apply operations to the TROPOMI data when importing with HARP
Initial preparations¶
To follow this notebook some preparations are needed. The TROPOMI SO2 data used in this notebook is obtained from the Copernicus Data Space Ecosystem.
This example uses the following TROPOMI SO2 file obtained at 12.4.2021:
S5P_RPRO_L2__SO2____20210412T151823_20210412T165953_18121_03_020401_20230209T050738.nc
In addition to HARP, this notebook uses several other Python packages that needs to be installed beforehand. The packages needed for running the notebook are:
- harp: for reading and handling of TROPOMI data
- numpy: for working with arrays
- matplotlib: for visualizing data
- cartopy: for geospatial data processing, e.g. for plotting maps
- cmcrameri: for universally readable scientific colormaps
The instructions on how to get started with the Atmospheric toolbox using Python and install HARP can be found here (add link to getting started jupyter notebook). Please note that if you have installed HARP in some specific python environment, check that you have activated the environment before running the script.
Step1: Reading TROPOMI SO2 data using HARP¶
First the needed Python packages are imported; harp, numpy, matplotlib, cartopy, and cmcrameri:
import os
import harp
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import cartopy.crs as ccrs
from cmcrameri import cm
import eofetch
The second step is to import the TROPOMI Level 2 SO2 file using harp.import_product()
. If the file does not yet exist on your local machine, we use the avl library to automatically download the file from the Copernicus Dataspace Ecosystem (CDSE). (Because the original netcdf file is large, both downloading and importing the file might take a while.)
filename = "S5P_RPRO_L2__SO2____20210412T151823_20210412T165953_18121_03_020401_20230209T050738.nc"
We use eofetch to download the S5P product. To be able to perform the download yourself you will need to retrieve and configure credentials as described in the eofetch README. Alternatively, you can download the file manually and put it in the same directory as this notebook.
eofetch.download(filename)
product = harp.import_product(filename)
After a successful import, you have created a python variable called product
. The variable product
contains a record of the SO2 product variables, the data is imported as numpy arrays. You can view the contents of product
using the Python print()
function:
print(product)
source product = 'S5P_RPRO_L2__SO2____20210412T151823_20210412T165953_18121_03_020401_20230209T050738.nc' int scan_subindex {time=1877400} double datetime_start {time=1877400} [seconds since 2010-01-01] float datetime_length [s] long orbit_index long validity {time=1877400} float latitude {time=1877400} [degree_north] float longitude {time=1877400} [degree_east] float latitude_bounds {time=1877400, 4} [degree_north] float longitude_bounds {time=1877400, 4} [degree_east] float sensor_latitude {time=1877400} [degree_north] float sensor_longitude {time=1877400} [degree_east] float sensor_altitude {time=1877400} [m] float solar_zenith_angle {time=1877400} [degree] float solar_azimuth_angle {time=1877400} [degree] float sensor_zenith_angle {time=1877400} [degree] float sensor_azimuth_angle {time=1877400} [degree] double pressure {time=1877400, vertical=34} [Pa] float SO2_column_number_density {time=1877400} [mol/m^2] float SO2_column_number_density_uncertainty_random {time=1877400} [mol/m^2] float SO2_column_number_density_uncertainty_systematic {time=1877400} [mol/m^2] byte SO2_column_number_density_validity {time=1877400} float SO2_column_number_density_amf {time=1877400} [] float SO2_column_number_density_amf_uncertainty_random {time=1877400} [] float SO2_column_number_density_amf_uncertainty_systematic {time=1877400} [] float SO2_column_number_density_avk {time=1877400, vertical=34} [] float SO2_volume_mixing_ratio_dry_air_apriori {time=1877400, vertical=34} [ppv] float SO2_slant_column_number_density {time=1877400} [mol/m^2] byte SO2_type {time=1877400} float O3_column_number_density {time=1877400} [mol/m^2] float O3_column_number_density_uncertainty {time=1877400} [mol/m^2] float absorbing_aerosol_index {time=1877400} [] float cloud_albedo {time=1877400} [] float cloud_albedo_uncertainty {time=1877400} [] float cloud_fraction {time=1877400} [] float cloud_fraction_uncertainty {time=1877400} [] float cloud_height {time=1877400} [km] float cloud_height_uncertainty {time=1877400} [km] float cloud_pressure {time=1877400} [Pa] float cloud_pressure_uncertainty {time=1877400} [Pa] float surface_albedo {time=1877400} [] float surface_altitude {time=1877400} [m] float surface_altitude_uncertainty {time=1877400} [m] float surface_pressure {time=1877400} [Pa] float surface_meridional_wind_velocity {time=1877400} [m/s] float surface_zonal_wind_velocity {time=1877400} [m/s] double tropopause_pressure {time=1877400} [Pa] long index {time=1877400}
With print command you can also inspect the information of a specific SO2 product variable (listed above), e.g. by typing:
print(product.SO2_column_number_density)
type = float dimension = {time=1877400} unit = 'mol/m^2' valid_min = -inf valid_max = inf description = 'SO2 vertical column density' data = [nan nan nan ... nan nan nan]
From the listing above you see e.g. that the unit of the SO2_column_number_density variable is mol/m^2. Type of the product and the shape (size) of the SO2_column_number_density data array can be checked with the following commands:
print(type(product.SO2_column_number_density.data))
print(product.SO2_column_number_density.data.shape)
<class 'numpy.ndarray'> (1877400,)
Here it is important to notice that harp.import_product
command imports and converts the TROPOMI Level 2 data to a structure that is compatible with the HARP conventions. This HARP compatible structure is different from the netcdf file structure. This HARP conversion includes e.g. restructuring data dimensions or renaming variables. For example, from the print
commands above it is shown that after HARP import the dimension of the SO2_column_number_density data is time (=1877400), whereas working with netcdf-files directly using e.g. a library such as netCDF4, the dimensions of the same data field would be a 2D array, having Lat x Lon dimension.
HARP has builtin converters for a lot of atmospheric data products. For each conversion the HARP documentation contains a description of the variables it will return and how it mapped them from the original product format. The description for the TROPOMI SO2 product can be found here.
HARP does this conversion such that data from other satellite data products, such as OMI, or GOME-2, will end up having the same structure and naming conventions. This makes it a lot easier for users to deal with data coming from different satellite instruments.
Step 2: Plotting a single orbit data on a map¶
Now that the TROPOMI SO2 data product is imported, the data will be visualized on a map. The parameter we want to plot is the "SO2_column_number_density", which gives the total atmospheric SO2 column. For this we will be using cartopy and the scatter
function. This plotting function is based on using only the pixel center coordinates of the satellite data, and not the actual latitude and longitude bounds. The scatter function will plot each satellite pixel as coloured single dot on a map based on their lat and lon coordinates. Cartopy also provides other plotting options, such as pcolormesh. However, in pcolormesh the input data needs to be a 2D array. This type of plotting will be demonstrated in the another use cases.
First, the SO2, latitude and longitude center data are defined. In addition, units and description of the SO2 data are read that are needed for the colorbar label. For plotting a colormap named "batlow" is chosen from the cmcrameri library. The cmcrameri provides scientific colormaps where the colour combinations are readable both by colour-vision deficient and colour-blind people. The Crameri colormap options can be viewed here. In the script the colormaps are called e.g. as cm.batlow
. If you wish to use reversed colormap, append _r to the colormaps name. With vmin and vmax the scaling of the colormap values are defined.
SO2val = product.SO2_column_number_density.data
SO2units = product.SO2_column_number_density.unit
SO2description = product.SO2_column_number_density.description
latc=product.latitude.data
lonc=product.longitude.data
colortable=cm.batlow
vmin=0
vmax=0.0001
Next the figure properties will be defined. By using matplotlib figsize
argument the figure size can be defined, plt.axes(projection=ccrs.PlateCarree())
sets a up GeoAxes instance, and ax.coastlines()
adds the coastlines to the map. The actual data is plotted with plt.scatter
command, where lat and lon coordinates are given as input, and the dots are coloured according to the pixel SO2 value (SO2val).
fig=plt.figure(figsize=(20, 10))
ax = plt.axes(projection=ccrs.PlateCarree())
img = plt.scatter(lonc, latc, c=SO2val,
vmin=vmin, vmax=vmax, cmap=colortable, s=1, transform=ccrs.PlateCarree())
ax.coastlines()
cbar = fig.colorbar(img, ax=ax, orientation='horizontal', fraction=0.04, pad=0.1)
cbar.set_label(f'{SO2description} [{SO2units}]')
cbar.ax.tick_params(labelsize=14)
plt.show()
Step 3: Applying operations when importing data with HARP¶
In the previous blocks one orbit of TROPOMI SO2 data has been imported with HARP and plotted on a map as it is. However, there is one very important step missing that is essential to apply when working with almost any satellite data: the quality flag(s). To ensure that you work with only good quality data and make correct interpretations, it is essential that the recommendations given for each TROPOMI Level 2 data are followed.
One of the main features of HARP is the ability to perform operations as part of the data import.¶
This very unique feature of HARP allows you to apply different kind of operations on the data already when importing it, and hence, no post processing is needed. These operations can include e.g. cutting the data over certain area only, converting units, and of course applying the quality flags. Information on all operations that can be applied can be found in the HARP operations documentation.
Now, we will import the same data file as in Step 1, but now adding four different operations as a part of the import command:
we only ingest data that is between -20S and 40N degrees latitude
we only consider pixels for which the data quality is high enough. The basic quality flag in any TROPOMI Level 2 netcdf file is given as
qa_value
. In the the Product Readme File for SO2 you can find, that the basic recommendation for SO2 data is to use only those pixels whereqa_value > 0.5
. When HARP imports data, the quality values are interpreted as numbers between 0 and 100 (not 0 and 1), hence our limit in this case is 50. In HARP theqa_value
is renamed asSO2_column_number_density_validity
. The list of variables in HARP product after ingestion of S5P TROPOMI SO2 product are found here.we limit the variables that we read to those that we need
we convert the unit of the tropospheric SO2 column number density to Dobson Units (DU) (instead of using mol/m2 in which the original data was stored)
All these operations will be performed by HARP while the data is being read, and before it is returned to Python.
In the following, the HARP operations that are performed when importing data are here given as "operations" variable, that includes each HARP operation (name, condition) as string. All the applied HARP operations are separated with ";" and finally joined together with join()
command. With "keep" operation it is defined which variables from the original netcdf files are imported, while "derive" operation performs the conversion from original units to dobson units. After joining the operations together you can print the resulting string using the print()
command. In Python defining an "operations" string parameter is a convenient way to define and keep track on different operations to be applied when importing the data. Other option would be to write the operations as an input to the HARP import command as: "operation1;operation2;operation3".
operations = ";".join([
"latitude>-20;latitude<40",
"SO2_column_number_density_validity>50",
"keep(datetime_start,scan_subindex,latitude,longitude,SO2_column_number_density)",
"derive(SO2_column_number_density [DU])",
])
print(type(operations))
print(operations)
<class 'str'> latitude>-20;latitude<40;SO2_column_number_density_validity>50;keep(datetime_start,scan_subindex,latitude,longitude,SO2_column_number_density);derive(SO2_column_number_density [DU])
The import with HARP including operations is executed with the same harp.import_product()
command as before, but in addition to filename now also the "operations" variable is given as input, separated with a comma. We will call the new imported variable as "reduced_product":
reduced_product = harp.import_product(filename, operations)
You will see that importing the data now goes a lot faster. If we print the contents of the reduced_product
, it shows that the variable consists only those parameters we requested, and the SO2 units are as DU. Also the time dimension of the data is less than in Step 1, because only those pixels between -20S-40N have been considered:
print(reduced_product)
source product = 'S5P_RPRO_L2__SO2____20210412T151823_20210412T165953_18121_03_020401_20230209T050738.nc' history = "2024-05-17T14:33:21Z [harp-1.21] harp.import_product('S5P_RPRO_L2__SO2____20210412T151823_20210412T165953_18121_03_020401_20230209T050738.nc',operations='latitude>-20;latitude<40;SO2_column_number_density_validity>50;keep(datetime_start,scan_subindex,latitude,longitude,SO2_column_number_density);derive(SO2_column_number_density [DU])')" int scan_subindex {time=455625} double datetime_start {time=455625} [seconds since 2010-01-01] float latitude {time=455625} [degree_north] float longitude {time=455625} [degree_east] double SO2_column_number_density {time=455625} [DU]
Now that the new reduced data is imported, the same approach as in Step 2 can be used to plot the data on a map. Note that now the units of SO2 have changed, and therefore different scaling for the colorscheme is needed. First define the parameters for plotting:
SO2val = reduced_product.SO2_column_number_density.data
SO2units = reduced_product.SO2_column_number_density.unit
SO2description = reduced_product.SO2_column_number_density.description
latc=reduced_product.latitude.data
lonc=reduced_product.longitude.data
colortable=cm.batlow
# For Dobson Units
vmin=0
vmax=8
And then plot the figure:
fig=plt.figure(figsize=(20, 10))
ax = plt.axes(projection=ccrs.PlateCarree())
img = plt.scatter(lonc, latc, c=SO2val,
vmin=vmin, vmax=vmax, cmap=colortable, s=1, transform=ccrs.PlateCarree())
ax.coastlines()
cbar = fig.colorbar(img, ax=ax, orientation='horizontal', fraction=0.04, pad=0.1)
cbar.set_label(f'{SO2description} [{SO2units}]')
cbar.ax.tick_params(labelsize=14)
plt.show()