Skip Navigation
Skip to contents

GEO DATA : GEO DATA

OPEN ACCESS
SEARCH
Search

Articles

Page Path
HOME > GEO DATA > Volume 6(4); 2024 > Article
Data Article
GeoAI Dataset for Urban Water Body Detection Using TerraSAR-X Satellite Radar Imagery
Eu-Ru Lee1,2orcid, Jun-Hyeok Jung3,4orcid, Ki-Chang Kim5orcid, Seong-Jae Yu6orcid, Hyung-Sup Jung7,8,*orcid
GEO DATA 2024;6(4):435-450.
DOI: https://doi.org/10.22761/GD.2024.0046
Published online: December 31, 2024

1Integrated Master and PhD Student, Department of Geoinformatics, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, 02504 Seoul, South Korea

2Integrated Master and PhD Student, Department of Smart Cities, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, 02504 Seoul, South Korea

3Master Student, Department of Geoinformatics, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, 02504 Seoul, South Korea

4Master Student, Department of Smart Cities, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, 02504 Seoul, South Korea

5Senior, New Formal Business Division, Innodep, 47 Digital-ro 9-gil, Geumcheon-gu, 08511 Seoul, South Korea

6Leader, New Formal Business Division, Innodep, 47 Digital-ro 9-gil, Geumcheon-gu, 08511 Seoul, South Korea

7Professor, Department of Geoinformatics, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, 02504 Seoul, South Korea

8Professor, Department of Smart Cities, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, 02504 Seoul, South Korea

Corresponding Author Hyung-Sup Jung Tel: +82-2-6490-2892 E-mail: hsjung@uos.ac.kr
• Received: November 11, 2024   • Revised: November 26, 2024   • Accepted: December 1, 2024

Copyright © 2024 GeoAI Data Society

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

prev next
  • 199 Views
  • 21 Download
  • This study presents the generation of a GeoAI dataset for urban water body detection using TerraSAR-X satellite synthetic aperture radar (SAR) imagery. The study area includes urban regions in Seoul and Gyeonggi Province, chosen for their complex structures and frequent flooding, which pose challenges for SAR analysis. The data preprocessing involved generating Sigma0 images, image co-registration, median filtering for speckle noise reduction, decibel conversion, and orthorectification using Copernicus DEM for precise geometric correction. Label data were created using the global river widths from Landsat dataset combined with the Otsu thresholding method and fine-tuned with Google Map imagery. Annotation guidelines were meticulously designed to account for SAR-specific phenomena such as layover, corner reflections, and side lobe effects, ensuring consistent and accurate labeling across different orbits and observation conditions. The resulting dataset supports deep learning models in learning geometric characteristics of SAR imagery, enhancing water body detection capabilities. This work provides a foundational resource for future applications in urban water management and climate-resilient disaster response.
Since the Industrial Revolution in the 19th century, accelerated industrialization and urbanization have intensified climate change globally, resulting in increased frequency and severity of water-related disasters. Countries, including South Korea, have experienced a rise in flood-related disasters caused by heavy rainfall and monsoon seasons, leading to extreme weather events such as floods that cause significant social and economic damage. These conditions underscore the importance of disaster management and water body monitoring as essential components in responding to climate change. The rapid expansion of urban areas and population growth further emphasize the necessity of effective water resource management and monitoring within urban environments (Taniguchi et al., 2007). To meet these needs, satellite imagery has become an indispensable tool for continuous and comprehensive observation over large areas. Optical satellite imagery is beneficial for delineating boundaries between water bodies and non-water areas due to its rich spectral information. However, it is limited by weather conditions and cloud cover. To overcome these limitations, synthetic aperture radar (SAR) imagery has gained attention for its capability as an active sensor that can monitor the Earth's surface regardless of day or night and under all weather conditions (Al-Wassai and Kalyankar, 2013).
SAR imagery utilizes an active sensor operating in the high-frequency microwave range, enabling data acquisition independent of time and weather conditions, making it highly effective for real-time monitoring. SAR imagery visualizes signals backscattered from the Earth's surface, which are sensitive to the surface's physical properties, such as roughness and dielectric constant. Due to these characteristics, the contrast between water bodies and land is pronounced, making SAR imagery particularly valuable for water body detection studies. Despite its numerous advantages, SAR imagery has certain limitations due to its inherent characteristics (Cigna et al., 2014). First, the side-looking observation mode of SAR can result in geometric distortions. These distortions include layover, foreshortening, and shadowing, which are more pronounced in complex terrains such as urban areas. Layover occurs when signals from elevated structures, like buildings, are received before those from lower areas, causing the high ground to appear as if it is leaning (Brenner and Roessing, 2008). Foreshortening compresses inclined surfaces, making them appear shorter, while shadowing happens when signals cannot reach steep slopes, resulting in shadowed regions in the imagery (Papson and Narayanan, 2012). Such distortions can confuse water body detection. Second, SAR imagery often contains speckle noise caused by multipath reflections during microwave reception, leading to signal interference that degrades visual resolution and complicates image interpretation (Simard et al., 1998). Lastly, corner reflection occurs when signals reflect off multiple surfaces two or three times, a phenomenon particularly noticeable in structures like bridges over water bodies. These structures cause repeated signal reflections that concentrate back to the SAR sensor, creating abnormally bright areas in the image (Soergel et al., 2008). While such characteristics can be useful for detecting urban structures, they introduce challenges and increase uncertainties in water body detection (Baek and Jung, 2019).
Research on water body detection using SAR satellite imagery has primarily relied on threshold-based methods that utilize the backscatter characteristics of SAR data (Liang and Liu, 2020; Yu et al., 2017). These methods distinguish water bodies from non-water areas based on the typically low backscatter values of water in SAR imagery. However, these approaches face limitations in detection accuracy due to inherent geometric distortions (layover, foreshortening, shadowing) and speckle noise present in SAR data. Recently, advancements in artificial intelligence have led to the application of deep learning techniques to SAR imagery, aimed at overcoming these limitations (Lee and Jung, 2023). Deep learning models are capable of recognizing complex patterns and learning multiscale features, which helps to mitigate the impact of distortions and noise inherent in SAR imagery (Sarker, 2021).
In this context, the present dataset paper aims to support the advancement of such research by constructing a GeoAI dataset optimized for water body detection in highly distorted urban areas like Seoul, using high-resolution SAR imagery. This dataset is designed to handle the intricate characteristics of SAR imagery and is expected to serve as a valuable resource for future research in water body detection and analysis.
2.1 Study area
The study area selected for this research includes parts of Seoul and Gyeonggi Province. This region is characterized by a dense concentration of urban structures and infrastructure, providing an environment suitable for effectively analyzing the geometric properties of SAR satellite imagery. In particular, large water bodies such as the Han River feature multiple bridges, where SAR characteristics like corner reflection and layover are prominently observed. Additionally, high-rise buildings and mountainous terrains contribute to geometric distortions, intensifying layover, foreshortening, and shadowing effects, thereby complicating the interpretation of SAR images. These characteristics serve as crucial analytical elements in water body detection research and must be carefully considered for accurate SAR image interpretation. Furthermore, this region is prone to natural disasters such as floods, which are exacerbated by heavy rainfall and climate change (Kim et al., 2016). Since the 2010s, severe weather events, including torrential rains and floods, have frequently occurred, causing significant damage (Lee et al., 2023a; Lee et al., 2017).
2.2 Data
The data used in this study consist of time-series TerraSAR-X imagery capturing urban areas of Seoul and Gyeonggi Province. The backscatter values of water bodies in SAR imagery can fluctuate due to various meteorological factors, such as rainfall and wind. To adequately consider these influences, 13 images were collected over the period from June to December 2012. This approach is essential for evaluating the impact of seasonal changes and climatic conditions on water body detection (Guo et al., 2022). Additionally, the geometric distortions in SAR imagery vary with the satellite's orbit direction (Hong et al., 2017). To achieve a more detailed analysis, images captured from both ascending and descending orbits were collected comprehensively (Fig. 1A). This allowed for the effective observation of distortions arising from different viewing angles. Table 1 provides the detailed specifications of the TerraSAR-X images used in the study, showing that all images were acquired in Stripmap mode with HH polarization.
Fig. 1B presents the Copernicus digital elevation model (DEM) with a spatial resolution of 30 m for the study area. The Copernicus DEM, developed by the European Space Agency (ESA) and the European Union (EU), is based on data from the TanDEM-X mission and utilizes interferometric SAR (InSAR) technology for precise elevation measurements. This DEM includes surface features such as buildings and vegetation, accurately reflecting the natural environment. Consequently, it allows for the correction of geometric distortions during orthorectification, enhancing the geographic projection accuracy of SAR imagery (Li et al., 2022).
The overall workflow of this study is presented in Fig. 2 and consists of two main stages.
The first stage involves the preprocessing of TerraSAR-X imagery. This step is essential for enhancing the accuracy of water body detection and was conducted with careful consideration of the data's characteristics. The preprocessing process includes correcting geometric distortions and removing noise to transform the data into a state suitable for training. This processed data plays a critical role in enabling the deep learning model to effectively distinguish between water and non-water areas under various environmental conditions.
The second stage focuses on the creation of precise label data. For this purpose, reference maps and annotation guidelines were developed. Based on these reference maps and criteria, detailed label data for water body regions were constructed.
3.1 Data preprocess

3.1.1 Generation of Sigma0 image

Satellite radar data are received as raw data at ground stations and provided as level-1 images, such as single look complex (SLC) data, after undergoing preprocessing. To construct a high-quality GeoAI dataset optimized for urban water body detection, Sigma0 images were generated from SLC data. Sigma0 images represent the backscatter coefficient, which indicates the energy emitted by the SAR antenna and reflected by ground scatterers, with these values assigned to each pixel. This conversion enables the imagery to more accurately reflect the Earth's physical characteristics, crucial for precise water body delineation (Yu et al., 2022). Sigma0 images minimize the influence of sensor imaging geometry and incidence angle, allowing for consistent comparison across different sensors. This characteristic makes them particularly effective for applications such as target detection, land cover classification, and water body analysis in complex urban environments. Since SLC images do not inherently consider the local incidence angle, even images taken from the same orbit may display variations in backscatter distribution (Yu et al., 2022). To address this challenge, the generation of Sigma0 images was an essential step in this study, ensuring uniformity and reliability in data analysis. The Sigma0 image generation equation is as follows (Eq. 1):
Eq. 1
σ0=(ks×|DN|2NEBN)×sinθloc
where ks is the calibration and processor scaling factor, DN is the brightness value of the SAR image, NEBN represents the noise equivalent beta naught, and θloc denotes the local incidence angle.

3.1.2 Image co-registration

Even when SAR satellite images are captured from the same orbit, differences in incidence angles and various other factors can result in misalignment between images. These positional discrepancies can significantly impact the accuracy of analyses involving time-series data. To correct for this, this study conducted an image co-registration process for the SAR radar images.
The cross-correlation algorithm proposed by Bernstein (1983) was employed for the image co-registration. This algorithm refines the relative positioning between images by moving a kernel across the entire image and calculating the correlation coefficient to estimate displacement. By comparing the central pixel of the kernel with the corresponding pixels in the image to be aligned, the algorithm identifies the displacement with the highest correlation coefficient. Based on the estimated displacement, a two-dimensional polynomial model was established to determine horizontal and vertical displacements at points within the image, enabling precise image alignment. The co-registration process was performed for each orbit separately. The master image for Orbit 20 was the image captured on October 3, 2012, while for Orbit 73, the master image used was from October 17, 2012.

3.1.3 Noise reduction and decibel conversion

The generated Sigma0 images inherently contain speckle noise due to the nature of SAR data. Speckle noise is characterized as multiplicative noise and follows a Rayleigh distribution, which degrades the quality of SAR imagery and can reduce the accuracy of analyses (Liang et al., 2023). If speckle noise is pronounced, it may affect the feature maps within the training data, ultimately impacting the performance of deep learning models.
To mitigate the impact of noise, this study employed a median filter, known for its effectiveness in reducing impulse noise. The median filter assigns the median value from the surrounding pixel neighborhood to the target pixel, thereby removing noise while preserving important edge information. For this study, a 3×3 filter was applied (Lee et al., 2023b), resulting in Sigma0 images with reduced noise and improved quality for downstream analysis and training.
Given the broad range of values in SAR Sigma0 images, direct visualization or analysis poses challenges. To address this, a decibel (dB) conversion was applied to compress the range and enhance contrast. This step not only simplifies visualization and analysis but also optimizes the input data for training deep learning models by presenting the data in a more manageable scale. The conversion transforms the backscatter values into a linear scale, making them more interpretable and contributing to effective feature extraction. The dB conversion is expressed by the following equation (Eq. 2):
Eq. 2
dB=10log10(σ0)
This conversion is essential to ensuring that the processed data aligns with the study’s goal of constructing an effective GeoAI dataset for accurate urban water body detection.

3.1.4 Orthorectification

Subsequently, time-series SAR images, which had their positions corrected per orbit, were orthorectified using a terrain correction method based on Copernicus DEM. To project TerraSAR-X images accurately onto a geographic coordinate system, the original resolution of Copernicus DEM (30 m) was enhanced to 3 m using bicubic interpolation. This adjustment was made to align with the resolution of TerraSAR-X stripmap mode and achieve greater precision.
The purpose of the orthorectification process is to correct the geometric distortions in SAR images and assign precise geographic location information. Initially, the orbital data of the SAR images were collected to obtain initial geometric information, and a simulated SAR image was generated using DEM data of the corresponding area. This simulated SAR image was constructed to include geometric information where each pixel reflects the actual terrain elevation (Lee et al., 2012).
The simulated SAR image was matched to the actual SAR image with the same topographic geometry, and cross-correlation was used to align the two images. During this process, pixel displacement was extracted by comparing the pixel shift information between the simulated and actual SAR images, forming a lookup table. This lookup table indicates the amount by which each pixel in the SAR image is displaced from its true geographic position, aiding in enhancing the geographic accuracy of the SAR image.
Finally, the lookup table was applied to the noise-reduced SAR images for orthorectification and conversion to a geographic coordinate system. The resolution of the converted TerraSARX image in both range and azimuth directions was set to 3 m, with bicubic interpolation used for resampling. This orthorectification process ensured geographic accuracy, making the SAR images a foundational resource for various applications. The final step involved extracting the overlapping regions from the orthorectified time-series SAR images across all orbits to create the source dataset.
3.2 Generation of label data
The construction of precise label datasets is a critical factor that determines the performance and reliability of deep learning models. In satellite imagery-based water body detection, consistent and accurate label data are essential. To this end, this study developed a labeling guideline document that considers the unique characteristics of SAR imagery and the diverse environments surrounding water bodies. This guideline document (refer to 3.2.2) was designed to reflect the geometric distortions, speckle noise, and corner reflections inherent in SAR imagery, as well as environmental factors, ensuring consistent labeling.
If the precision of the labeling process is insufficient, it can lead to a degradation in the quality of training data, directly impacting the performance of deep learning models. This is especially important for tasks such as water body detection, where distinguishing fine differences is necessary. Variability in individual judgments when creating the dataset can decrease reliability, negatively affecting the predictive accuracy and consistency of the model (Lee and Jung, 2023).

3.2.1 Creation of reference images for label data

To construct the initial reference images, global river widths from Landsat (GRWL) data and the Otsu thresholding method were applied to preprocessed SAR image patches (Allen and Pavelsky, 2018). GRWL data is a global database containing information on river widths and is valuable for accurately identifying water bodies in satellite imagery. The Otsu method analyzes the distribution of pixel values within an image to automatically select a threshold that minimizes variance between two classes. For each time-series SAR image, the Otsu method was first used to classify areas as water bodies, and GRWL data was then applied to refine the initial reference images by excluding misclassified areas.
However, due to characteristics such as speckle noise, geometric distortions, and corner reflections, the boundaries between water and non-water areas in SAR imagery can become ambiguous, complicating accurate interpretation. To address this, the study used Google Map (Google, Mountain View, CA, USA) imagery from 2012 for detailed correction of the water body label data. However, as the high-resolution images from 2012 Google Map were not available, it was challenging to directly utilize the data. To overcome this limitation, images of the study area were captured, and georeferencing was performed using ArcGIS Pro 10.3 to align the coordinates. Although this process resulted in lower resolution compared to the original high-resolution Google Map images, it enabled clear differentiation between water and non-water areas. Consequently, the precision and consistency of the label data were enhanced, contributing to the optimization of the deep learning model's performance.

3.2.2 Design of label data annotation criteria

In this study, a labeling guideline document for TerraSAR-X imagery was developed to provide clear criteria for consistently distinguishing between water bodies and non-water areas. Due to the inherent characteristics of SAR imagery, it can be challenging to delineate water boundaries accurately, making the establishment of specific rules essential. This study designed the following criteria, taking into account various environmental factors and the characteristics of SAR imagery.
1. Definition of water bodies. Areas where water flows or accumulates within river boundaries and regions impounded by dams or levees were defined as water bodies. This criterion was set to enhance the clarity of water body detection and maintain consistent boundary delineation.
2. Impact of layover. If signals appear within water bodies due to layover effects, such areas were classified as water bodies. Layover is a type of geometric distortion inherent in SAR imagery, and considering it ensures consistency in water body detection despite these distortions.
3. Bridge classification. Bridges spanning across water bodies were classified as non-water areas. However, in cases where strong signals due to corner reflections appeared in front of or behind the bridge, resulting from double or triple reflections, these areas were included as water bodies. This approach was adopted to clearly define the boundary between bridges and water bodies, considering the complex reflection properties of SAR imagery.
4. Side lobe effects. Areas affected by Side Lobe effects were also classified as water bodies. Side lobe effects arise from the radiative properties of SAR signals and can introduce signal distortion that influences the delineation of water body boundaries.
5. Land areas within water bodies. Land regions situated within water bodies, such as islands or docks, were classified as non-water areas. These regions, while located within a body of water, are not submerged and must be distinguished to ensure the accuracy and consistency of water body detection.
6. Ship classification. Ships within water bodies were classified as part of the water body. Ships, being structures that float on water, exhibit backscatter signals similar to water bodies in SAR imagery, and were therefore considered as part of the water body.
By applying these criteria, consistent labeling that accurately reflects the characteristics of SAR imagery and environmental factors was achieved.
4.1 Results of preprocessing input data
Fig. 3 presents the original TerraSAR-X imagery used in this study and the dB images generated through the data preprocessing steps in geographic coordinates. Compared to the original images, the dB images exhibit a reduced range of values and follow a normal distribution, allowing for clearer differentiation between water bodies and non-water areas. The application of the median filter effectively reduced the impact of speckle noise, resulting in improved image resolution and quality. Additionally, the transformation from the image coordinate system to the WGS 84 geographic coordinate system was confirmed. Finally, the processed images projected in the WGS 84 coordinate system were reprojected to the UTM Zone 52N coordinate system for further analysis.
Fig. 3A and Fig. 3C show TerraSAR-X images captured from different orbits, illustrating how the shadow areas' direction and geometry vary according to the line of sight (LOS) direction. In contrast, Fig. 3B and Fig. 3D, which are projected onto the same geographic coordinate system, demonstrate that while the geographic coordinates remain consistent, the shapes of the shadow areas differ based on the LOS direction of each image. This highlights the importance of considering the geometric properties and observation angles of SAR imagery during water body detection.
In conclusion, the preprocessed dB images more clearly represent the backscatter coefficient of SAR imagery, offering data suitable for water body analysis. These preprocessing steps enhance the reliability of water body detection and contribute to improved performance across various environmental conditions.
4.2 Results of reference map generation
Table 2 presents the threshold values used to distinguish between water and non-water areas in the time-series images. Each threshold was determined by selecting 256×256 patches from the time-series images with an approximately equal proportion of water and non-water areas and applying the Otsu method. The analysis revealed a trend of decreasing threshold values over time, which is attributed to changes in the surface roughness of water bodies due to rainfall from June to August, resulting in relatively higher backscatter values. These thresholds were applied to create the reference maps for the entire dataset.
Fig. 4 displays the reference maps generated for precise water body labeling in this study. Fig. 4A shows the GRWL data, which marks the center points of water bodies in the study area. Fig. 4B and Fig. 4C illustrate the water body detection results for TerraSAR-X images captured on October 3, 2012, and October 17, 2012, respectively, created by applying both the Otsu thresholding method and GRWL data. The areas marked in light blue represent water bodies detected using only the Otsu method, while the dark blue areas indicate water bodies detected with both GRWL data and the Otsu method, showing a refined reference map where non-water areas are accurately excluded.
However, misclassifications can be observed in areas with noise or at the boundaries of water bodies. These limitations stem from the inherent speckle noise and geometric distortions present in SAR imagery. To address these issues, additional correction steps were undertaken, and 2012 Google Map imagery was utilized to refine the label data, resulting in more precise labeling.
4.3 GeoAI dataset construction
Fig. 5 shows the images of the AI dataset constructed in this study for different orbits. Fig. 5A present the source data acquired from Orbit 20 and Orbit 73, respectively. Differences in geometric characteristics and shadow areas can be observed between the two images due to the SAR satellite’s orbit direction and LOS.
Fig. 5B display the reference data generated using the Otsu thresholding method combined with GRWL data. The reference maps closely resemble the label data shown in Fig. 5C, confirming that the dataset was constructed consistently. Furthermore, a comparison between the orbits reveals that the dataset reflects the geometric characteristics corresponding to the LOS for each image. More detailed results of the dataset construction can be found in Fig. 6.
Fig. 6 shows the label annotation results for TerraSAR-X images constructed in this study, annotated according to the established criteria. By comparing the SAR images from Orbit 20 and Orbit 73 (Fig. 6A, B) with the reference Google Map (Fig. 6C), the consistency and accuracy of the labeling can be analyzed.
1. Layover area: Fig. 6A-1 and Fig. 6B-1 present SAR images of the same region observed from Orbit 20 and Orbit 73, respectively. Fig. 6A-1 displays a Layover effect that is not observed in Fig. 6B-1, resulting in a distorted water boundary. In contrast, Fig. 6C-1 shows the actual terrain as verified through Google Maps. The red boundary in Fig. 6A-1 indicates that the layover-distorted region was disregarded, and the label data was created by referencing the surrounding water boundaries.
2. Corner reflection area: Fig. 6A-2 and Fig. 6B-2 show instances where bridge detection over the Han River in the SAR images was distorted due to corner reflections. Both orbits display strong backscatter values near bridges and surrounding structures due to corner reflections. Fig. 6C-2 clearly shows the actual boundary of the water body, highlighting the differences between the SAR imagery and the reference. The annotation in Fig. 6C-2 confirms that the bridge boundary was accurately identified by consulting the reference map.
3. Lake area: Fig. 6A-3 and Fig. 6B-3 represent the labeling results for flat lake regions. Stable water body detection was achieved in both orbits, with some boundary distortions due to side lobe effects. However, by referencing the surrounding water boundaries, precise annotations were confirmed.
4. Dock area: Fig. 6A-4 and Fig. 6B-4 illustrate the annotation results for areas with complex structures such as docks. Docks, being structures that do not change over time, were classified as non-water areas. This criterion reflects the stationary nature of docks, ensuring consistent labeling.
5. Side lobe area: Fig. 6A-5 and Fig. 6B-5 show regions where water boundaries were distorted due to side lobe effects caused by the SAR signal's radiative pattern. Side lobes produce unintended reflection values around water bodies, distorting the boundary. In this study, the annotations were refined by referencing the surrounding water boundaries, as shown in the results.
This study addressed the process of constructing an AI dataset for urban water body detection using TerraSAR-X satellite SAR imagery. In the initial phase, urban areas in Seoul and Gyeonggi Province were selected as the study area. Source data were collected and preprocessed with consideration for the characteristics of the area and the geometric distortions inherent in SAR satellite imagery.
The preprocessing included the generation of Sigma0 images to convert the original SAR data into a format that accurately reflects the physical properties of the Earth's surface. The image co-registration process was then performed to align the SAR images precisely, ensuring positional consistency across time-series data. This step was followed by noise reduction using a median filter to mitigate the impact of speckle noise and enhance the quality of the data for analysis and training. Decibel conversion was then applied to compress the value range and improve the interpretability of the SAR images, facilitating clearer visualization and analysis. Subsequently, orthorectification was conducted using Copernicus DEM to accurately correct the geometric distortions in the SAR images and project them onto a geographic coordinate system, culminating in the creation of a dataset optimized for urban water body detection.
Label data were created by combining GRWL data and the Otsu thresholding method to differentiate between water and non-water areas, followed by additional fine labeling using Google Map imagery. Emphasis was placed on applying consistent annotation criteria that accounted for the geometric characteristics of SAR imagery, including Layover, corner reflections, and side lobe effects, particularly under various orbits and observation conditions. This approach ensured that the labeling process remained consistent and accurate despite the inherent complexities and variations of SAR data.
The constructed AI dataset can be used as an essential foundational resource for evaluating and improving the performance of deep learning models for urban water body detection. The dataset includes various orbits and geometric characteristics, offering deep learning models the opportunity to learn the geometric properties specific to each orbit. Additionally, due to the high-resolution nature of SAR images, further preprocessing to divide images into patches is necessary when applying them to deep learning models.
However, this study has several limitations that should be acknowledged. The dataset was constructed over a relatively short period, from June 2012 to November 2012. While this timeframe captures some seasonal variations, it does not fully encompass long-term climate change patterns or extreme weather conditions such as droughts or heavy rainfall. Addressing these limitations would require extending the data collection period to include both seasonal and long-term climatic trends. Future research should also consider integrating data from multiple satellite sources to enhance temporal and spatial diversity. Developing methodologies to mitigate discrepancies between satellite characteristics, including resolution, incidence angle, and NESZ, will be essential to ensure consistent and high-quality datasets.
Additionally, data imbalance stemming from SAR-specific geometric distortions, such as layover, foreshortening, and shadowing, poses challenges in model training. Employing diverse data augmentation techniques to address data imbalance in SAR imagery is essential. By addressing data imbalance through augmentation, these techniques enable deep learning models to be trained with enhanced robustness to distortions such as layover and foreshortening, facilitating more precise and reliable performance evaluation in urban water body detection tasks.
Despite its limited temporal and spatial scope, this dataset systematically addresses the geometric distortions and noise characteristics inherent in SAR satellite imagery, establishing itself as a critical foundational resource for urban water body detection research. Furthermore, this dataset is anticipated to extend its utility beyond water body detection, serving as a valuable tool for monitoring water-related disasters intensified by climate change and aiding in the development of effective mitigation and response strategies in future studies.

Conflict of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Funding Information

This research was supported by Ministry of Environment, under the Development of Ground Operation System for Water Resources Satellite from K-water.

This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. RS-2024-00397964).

Data Availability Statement

The data that support the findings of this study are openly available in DataON at https://doi.org/10.22711/idr/1050.

Fig. 1.
Study area. (A) Red box is Orbit 20 swath (ascending direction). Blue box is Orbit 73 swath (descending direction), white line is Seoul border. (B) Copernicus DEM. DEM, digital elevation model.
GD-2024-0046f1.jpg
Fig. 2.
Overview of the data preprocessing and annotation workflow. (A) Data preprocessing. (B) The annotation process. COP DEM, Copernicus digital elevation model; UTM, universal transverse mercator; SAR, synthetic aperture radar; DSC, descending; ASC, ascending.
GD-2024-0046f2.jpg
Fig. 3.
TerraSAR-X raw data and processed data. (A) TerraSAR-X raw data (October 3, 2012). (B) Processed data (October 3, 2012). (C) TerraSAR- X raw data (October 17, 2012). (D) Processed data (October 17, 2012). LOS, line of sight.
GD-2024-0046f3.jpg
Fig. 4.
(A) GRWL data. (B) Index data (October 3, 2012). (C) Index data (October 17, 2012). Sky color is Otsu method, blue color is GRWL+Otsu method. LOS, line of sight; GRWL, global river widths from Landsat.
GD-2024-0046f4.jpg
Fig. 5.
TerraSAR-X AI data. (A) Source data. (B) Index data. (C) Label data. (1) Orbit 20 (October 3, 2012). (2) Orbit 73 (October 17, 2012). LOS, line of sight.
GD-2024-0046f5.jpg
Fig. 6.
TerraSAR-X label annotation results. The red line represents the boundary between water and non-water areas for Orbit 20, while the blue line indicates the same for Orbit 73. (A) Orbit 20. (B) Orbit 73. (C) Google Map. (1) Example of layover. (2) Corner reflect. (3) Lake area. (4) Dock area. (5) Side lobe area.
GD-2024-0046f6.jpg
Table 1.
Spectification of the TerraSAR-X
Number Acquisition date Relative orbit (direction) Incidence angle (°) Pixel spacing (range×azimuth) Image mode Resolution (range×azimuth) (m) Polarization
1 20120618 Orbit 20 (DSC) 28.78183 0.909×1.900 Stripmap 3×3 HH
2 20120707 Orbit 73 (ASC) 42.8246 1.364×2.194
3 20120710 Orbit 20 (DSC) 28.7943 0.909×1.900
4 20120718 Orbit 73 (ASC) 42.8182 1.364×2.194
5 20120812 Orbit 20 (DSC) 28.7742 0.909×1.900
6 20120820 Orbit 73 (ASC) 42.8635 1.364×2.194
7 20120823 Orbit 20 (DSC) 28.7802 0.909×1.900
8 20120831 Orbit 73 (ASC) 42.86 1.364×2.194
9 20120925 Orbit 20 (DSC) 28.7803 0.909×1.900
10 20121003 Orbit 73 (ASC) 42.8648 1.364×2.194
11 20121017 Orbit 20 (DSC) 28.7797 0.909×1.900
12 20121127 Orbit 73 (ASC) 42.8654 1.364×2.194
13 20121130 Orbit 20 (DSC) 28.7652 0.909×1.900

DSC, descending; ASC, ascending.

Table 2.
TerraSAR-X Otsu threshold value
Number Acquisition date Threshold value
1 20120618 -10.92
2 20120707 -12.99
3 20120710 -11.37
4 20120718 -12.58
5 20120812 -12.82
6 20120820 -13.20
7 20120823 -13.51
8 20120831 -13.45
9 20120925 -13.88
10 20121003 -14.38
11 20121017 -14.37
12 20121127 -14.52
13 20121130 -14.62
  • Allen GH, Pavelsky TM (2018) Global extent of rivers and streams. Science 361(6402):585–588ArticlePubMed
  • Al-Wassai FA, Kalyankar NV (2013) Major limitations of satellite images. arXiv:1307.2434
  • Baek WK, Jung HS (2019) A review of change detection techniques using multi-temporal synthetic aperture radar images. KJRS 35(5_1):737–750
  • Bernstein R (1983) Image geometry and rectification. In: Colwell RN, Simonett DS, Ulaby FT (eds) Manual of remote sensing, vol. 1: Theory, instruments and techniques, 2nd ed. American Society of Photogrammetry, Falls Church, pp 873-922
  • Brenner AR, Roessing L (2008) Radar imaging of urban areas by means of very high-resolution SAR and interferometric SAR. IEEE T Geosci Remote 46(10):2971–2982Article
  • Cigna F, Bateson LB, Jordan CJ, Dashwood C (2014) Simulating SAR geometric distortions and predicting Persistent Scatterer densities for ERS-1/2 and ENVISAT C-band SAR and InSAR applications: Nationwide feasibility assessment to monitor the landmass of Great Britain with SAR imagery. RSE 152:441–466Article
  • Guo Z, Wu L, Huang Y, Guo Z, Zhao J, Li N (2022) Water-body segmentation for SAR images: past, current, and future. Remote Sens 14(7):1752Article
  • Hong S, Choi Y, Park I, Sohn HG (2017) Comparison of orbit-based and time-offset-based geometric correction models for SAR satellite imagery based on error simulation. Sensors (Basel) 17(1):170ArticlePubMedPMC
  • Kim H, Kim YK, Song SK, Lee HW (2016) Impact of future urban growth on regional climate changes in the Seoul Metropolitan Area, Korea. Sci Total Environ 571:355–363ArticlePubMed
  • Lee ER, Jung HS (2023) A study of development and application of an inland water body training dataset using Sentinel-1 SAR images in Korea. KJRS 3:9
  • Lee ER, Lee HS, Lee JM, Park SC, Jung HS (2023a) The Cheonji Lake GeoAI dataset based in synthetic aperture radar images: TerraSAR-X, sentinel-1 and ALOS PALSAR-2. GEO DATA 5(4):251–261ArticlePDF
  • Lee KY, Byun YG, Kim YS (2012) Accuracy evaluation of terrain correction of high resolution SAR imagery with the quality of DEM. JKSGPC 30 30(6_1):9–528Article
  • Lee S, Choi Y, Ji J, Lee E, Yi S, Yi J (2023b) Flood vulnerability assessment of an urban area: a case study in Seoul, South Korea. Water 15(11):1979Article
  • Lee S, Kim JC, Jung HS, Lee MJ, Lee S (2017) Spatial prediction of flood susceptibility using random-forest and boosted-tree models in Seoul metropolitan city, Korea. Geomat Nat Haz Risk 8(2):1185–1203ArticlePDF
  • Li H, Zhao J, Yan B, Yue L, Wang L (2022) Global DEMs vary from one to another: an evaluation of newly released Copernicus, NASA and AW3D30 DEM on selected terrains of China using ICESat-2 altimetry data. IJDE 15(1):1149–1168Article
  • Liang J, Liu D (2020) A local thresholding approach to flood water delineation using Sentinel-1 SAR imagery. ISPRS J Photogramm Remote Sens 159:53–62Article
  • Liang Y, Sun J, Zhang J, et al (2023) Prediction of fiber Rayleigh scattering responses based on deep learning. Sci China Inf Sci 66(12):222301ArticlePDF
  • Papson S, Narayanan RM (2012) Classification via the shadow region in SAR imagery. IEEE Trans Aerosp Electron Syst 48(2):969–980Article
  • Sarker IH (2021) Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput Sci 2(6):420ArticlePubMedPMCPDF
  • Simard M, DeGrandi G, Thomson KP, Benie GB (1998) Analysis of speckle noise contribution on wavelet decomposition of SAR images. IEEE Trans Geosci Remote Sens 36(6):1953–1962Article
  • Soergel U, Cadario E, Thiele A, Thoennessen U (2008) Feature extraction and visualization of bridges over water from high-resolution InSAR data and one orthophoto. IEEE J Sel Top Appl Earth Obs Remote Sens 1(2):147–153Article
  • Taniguchi M, Uemura T, Jago-on K (2007) Combined effects of urbanization and global warming on subsurface temperature in four Asian cities. VZJ 6(3):591–596ArticlePDF
  • Yu F, Sun W, Li J, Zhao Y, Zhang Y, Chen G (2017) An improved Otsu method for oil spill detection from SAR images. Oceanologia 59(3):311–317Article
  • Yu JW, Yoon YW, Lee ER, Baek WK, Jung HS (2022) Flood mapping using modified U-NET from TerraSAR-X images. KJRS 3:8
Meta Data for Dataset
Essential
Field Sub-Category
Title of Dataset GeoAI Dataset for Urban Water Body Detection Using TerraSAR-X Satellite Radar Imagery
DOI https://doi.org/10.22711/idr/1050
Category Inland waters
Temporal Coverage 2012.06.-2012.11.
Spatial Coverage Address Seoul and Gyeonggi-do, South Korea
WGS84 Coordinates [Polygon]
Top 37.640847 (dd)
Bottom 37.369736 (dd)
Left 126.794431 (dd)
Right 127.185819 (dd)
Personnel Name Eu-Ru Lee
Affiliation University of Seoul
E-mail Eurulee22@uos.ac.kr
CC License CC BY-NC
Optional
Field Sub-Category
Summary of Dataset GeoAI dataset for urban water body detection using TerraSAR-X Satellite radar imagery
Project Development of major regional analysis and realization intelligence technology based on micro satellite images
Instrument Information & communications Technology Promotion

Figure & Data

References

    Citations

    Citations to this article as recorded by  

      Figure
      • 0
      • 1
      • 2
      • 3
      • 4
      • 5
      Related articles
      GeoAI Dataset for Urban Water Body Detection Using TerraSAR-X Satellite Radar Imagery
      Image Image Image Image Image Image
      Fig. 1. Study area. (A) Red box is Orbit 20 swath (ascending direction). Blue box is Orbit 73 swath (descending direction), white line is Seoul border. (B) Copernicus DEM. DEM, digital elevation model.
      Fig. 2. Overview of the data preprocessing and annotation workflow. (A) Data preprocessing. (B) The annotation process. COP DEM, Copernicus digital elevation model; UTM, universal transverse mercator; SAR, synthetic aperture radar; DSC, descending; ASC, ascending.
      Fig. 3. TerraSAR-X raw data and processed data. (A) TerraSAR-X raw data (October 3, 2012). (B) Processed data (October 3, 2012). (C) TerraSAR- X raw data (October 17, 2012). (D) Processed data (October 17, 2012). LOS, line of sight.
      Fig. 4. (A) GRWL data. (B) Index data (October 3, 2012). (C) Index data (October 17, 2012). Sky color is Otsu method, blue color is GRWL+Otsu method. LOS, line of sight; GRWL, global river widths from Landsat.
      Fig. 5. TerraSAR-X AI data. (A) Source data. (B) Index data. (C) Label data. (1) Orbit 20 (October 3, 2012). (2) Orbit 73 (October 17, 2012). LOS, line of sight.
      Fig. 6. TerraSAR-X label annotation results. The red line represents the boundary between water and non-water areas for Orbit 20, while the blue line indicates the same for Orbit 73. (A) Orbit 20. (B) Orbit 73. (C) Google Map. (1) Example of layover. (2) Corner reflect. (3) Lake area. (4) Dock area. (5) Side lobe area.
      GeoAI Dataset for Urban Water Body Detection Using TerraSAR-X Satellite Radar Imagery
      Number Acquisition date Relative orbit (direction) Incidence angle (°) Pixel spacing (range×azimuth) Image mode Resolution (range×azimuth) (m) Polarization
      1 20120618 Orbit 20 (DSC) 28.78183 0.909×1.900 Stripmap 3×3 HH
      2 20120707 Orbit 73 (ASC) 42.8246 1.364×2.194
      3 20120710 Orbit 20 (DSC) 28.7943 0.909×1.900
      4 20120718 Orbit 73 (ASC) 42.8182 1.364×2.194
      5 20120812 Orbit 20 (DSC) 28.7742 0.909×1.900
      6 20120820 Orbit 73 (ASC) 42.8635 1.364×2.194
      7 20120823 Orbit 20 (DSC) 28.7802 0.909×1.900
      8 20120831 Orbit 73 (ASC) 42.86 1.364×2.194
      9 20120925 Orbit 20 (DSC) 28.7803 0.909×1.900
      10 20121003 Orbit 73 (ASC) 42.8648 1.364×2.194
      11 20121017 Orbit 20 (DSC) 28.7797 0.909×1.900
      12 20121127 Orbit 73 (ASC) 42.8654 1.364×2.194
      13 20121130 Orbit 20 (DSC) 28.7652 0.909×1.900
      Number Acquisition date Threshold value
      1 20120618 -10.92
      2 20120707 -12.99
      3 20120710 -11.37
      4 20120718 -12.58
      5 20120812 -12.82
      6 20120820 -13.20
      7 20120823 -13.51
      8 20120831 -13.45
      9 20120925 -13.88
      10 20121003 -14.38
      11 20121017 -14.37
      12 20121127 -14.52
      13 20121130 -14.62
      Essential
      Field Sub-Category
      Title of Dataset GeoAI Dataset for Urban Water Body Detection Using TerraSAR-X Satellite Radar Imagery
      DOI https://doi.org/10.22711/idr/1050
      Category Inland waters
      Temporal Coverage 2012.06.-2012.11.
      Spatial Coverage Address Seoul and Gyeonggi-do, South Korea
      WGS84 Coordinates [Polygon]
      Top 37.640847 (dd)
      Bottom 37.369736 (dd)
      Left 126.794431 (dd)
      Right 127.185819 (dd)
      Personnel Name Eu-Ru Lee
      Affiliation University of Seoul
      E-mail Eurulee22@uos.ac.kr
      CC License CC BY-NC
      Optional
      Field Sub-Category
      Summary of Dataset GeoAI dataset for urban water body detection using TerraSAR-X Satellite radar imagery
      Project Development of major regional analysis and realization intelligence technology based on micro satellite images
      Instrument Information & communications Technology Promotion
      Table 1. Spectification of the TerraSAR-X

      DSC, descending; ASC, ascending.

      Table 2. TerraSAR-X Otsu threshold value


      GEO DATA : GEO DATA
      TOP