Get raster dataset information using GDAL in Python

This section begins by explaining how to call GDAL in Python to access data sets. There would be an example of reading remote sensing imagery.

Open the GeoTIFF file

Next, read the information of a GeoTIFF file. The first step is to open a data set.

First of all, we need to clarify the concept of Data Set. For general file formats, a “data set” is a file, such as a GIF file is a file with gif extension. But for raster data in GIS, a data set is composed of one or more files, and some additional information is used to organize the relationship between them. For GDAL, raster data sets are composed of raster band data and common attributes for all bands.

>>> from osgeo import gdal
>>> rds = gdal.Open("/gdata/geotiff_file.tif")

Now that a GeoTIFF file has been opened as a GDAL operable object, let’s see how it can be manipulated.

Python provides the dir() introspection function [to quickly see what the current object is available for:

The dir() function is probably the most famous part of the Python introspection mechanism. It can return a sorted list of attribute names of any objects passed to it. If you do not specify an object, dir() returns the name in the current scope.

>>> dir(rds)
['AddBand', 'BeginAsyncReader', 'BuildOverviews', 'CommitT ...

Let’s take a look at how to get the basic information of the data set, using the following functions and attributes.

  • rds.GetDescription() : Get the description of the raster

  • rds.RasterCount : Get the number of bands in the raster dataset

  • rds.RasterXSize : the width of the raster data (the number of pixels in the X direction)

  • rds.RasterYSize : height of raster data (number of pixels in the Y direction)

  • rds.GetGeoTransform() : The six parameters of the raster data.

  • GetProjection() : projection of raster data

Metadata for Reading Images

GDAL provides a sufficiently convenient function, which can read some metadata information of images, thus facilitating the processing of data. GDAL usually organizes metadata in the form of dictionary, but the types and keys of metadata for different raster data types may be different.

If you want to process metadata, you can consider writing metadata information into an XML file. This problem is not the concern of this book, so let’s not talk more about it here.

Let’s first look at the metadata information of the most commonly used GeoTIFF file. GDAL can be used as data set level metadata to handle the following basic TIFF flags.

Use Python to access metadata:

>>> rds.GetMetadata()
{'DataType': 'Generic', 'AREA_OR_POINT': 'Area'}

The metadata information above is different for each data.

For example, open another file:

>>> ds = gdal.Open('/gdata/lu75c.tif')
>>> ds.GetMetadata()
{'TIFFTAG_XRESOLUTION': '1', 'TIFFTAG_YRESOLUTION': '1',
    'AREA_OR_POINT': 'Area'}

There are three metadata for this file. Two of them are TIFF logos, and the other one is geospatial metadata.

Using GetDescription() to get raster data information

>>> rds.GetDescription()
'/gdata/geotiff_file.tif'

The image description here is the path name of the image. Specific return values are related to different data sets, and different data sets have different descriptions.

Getting the number of raster bands

In GDAL, each band is a data set; moreover, the raster data set may contain sub-data sets, and each sub-data set may contain bands.

First look at the RasterCount that just opened the data:

>>> rds.RasterCount
3

This is a Landsat remote sensing image consisting of three bands. Note that RasterCount is not followed by parentheses because they are properties and not methods.

Then, take a look at MODIS L1B data:

>>> mds = gdal.Open("/gdata/MOD09A1.A2009193.h28v06.005.2009203125525.hdf")
>>> mds.RasterCount
0

The result of the operation is actually 0. This means that the number of rasters in the current data set rds is 0. In fact, the data format of MODISL1B is in HDF format, and its data is organized in sub-data sets. To obtain its related data information, it needs to continue to access its sub-data sets.

Get the number of rows and columns of the image

The size of the raster data indicates the width and height of the image in pixels, such as:

>>> img_width,img_height=rds.RasterXSize,rds.RasterYSize
>>> img_width,img_height
(10572, 9422)

It can be seen that the image size is \(10572\times 9422\) .

Obtain spatial reference

Let’s look at how to get the projection and spatial reference information from the raster data set. Projection and spatial reference will be further discussed in later chapters.

For remote sensing images, the acquisition of spatial reference needs to be positioned in geographic space. In GDAL, there are two ways to obtain spatial reference, one of which is to use six parameter coordinate transformation model. The implementation of this model is different in different software. In GDAL, these six parameters include upper-left coordinates, pixel X, Y direction size, rotation and other information. Note that the pixel size in the Y direction is negative.

>>> ds.GetGeoTransform()
(1852951.7603168152, 30.0, 0.0, 5309350.360150607, 0.0, -30.0)

Obtaining projection information

Using the GetProjection() function, it is relatively easy to get the projection information of the dataset, but more knowledge is needed about what is map projection and how it is implemented in GDAL. Here is a brief look at the output information, a more detailed explanation will also be introduced in the following chapters.

>>> ds.GetProjection()
'PROJCS["unnamed",GEOGCS["unknown",DATUM["unknown",SPHEROID[ ...

Obtaining band information of raster data using GDAL

The main functions for data set operations are described above. But if you need more information about raster data, you need to use more commonly used band operation functions in remote sensing image processing.

Band to get data sets

The GetRasterBand() function gets the band of the raster dataset. The arguments to the function use the index value of the band.

>>> from osgeo import gdal
>>> rds = gdal.Open('/gdata/lu75c.tif')
>>> rds.RasterCount
1
>>> band = rds.GetRasterBand(1)

Here we get the first band band via GetRasterBand(1). note! The band acquisition here is not the same as the usual array index, and the band starts to get a value of 1 instead of 0.

View basic information about the band

Let’s take a look at the properties and methods of band that we just read. You can also use the dir() function to view it.

>>> dir(band)
['Checksum', 'ComputeBandStats', 'ComputeRasterMinMax', ...

Take a look at the common operations, which are also used to obtain band attribute information.

Get the number of band rows

For example:

>>> band.XSize
10572
>>> band.YSize
9422
>>> band.DataType
1

Executing the above code results in the width and height of the band image (in pixels), which is the same as the value obtained by rds using RasterXSize() and RasterYSize(). DataType is the data type of the actual value in the image. The above example shows an 8-bit unsigned integer.

Attributes for obtaining band data

For example:

>>> band.GetNoDataValue()
256.0
>>> band.GetMaximum()
>>> band.GetMinimum()
>>> band.ComputeRasterMinMax()
(0.0, 255.0)

Maximum indicates the largest value in the band value, and Minimum indicates the smallest value of the band value. By running the results, you can see that there are no values at the beginning RasterXSize() and RasterYSize(). Because the file format does not have an inherent maximum and minimum value. It can be calculated by the function ComputeRasterMinMax(). note! The maximum and minimum values here do not include “meaningless values”! This is the NoDataValue shown above.