Skip Navigation
Skip to contents

GEO DATA : GEO DATA

OPEN ACCESS
SEARCH
Search

Search

Page Path
HOME > Search
14 "Dataset"
Filter
Filter
Article category
Keywords
Publication year
Authors
Funded articles
Data Article
KOMPSAT-3/3A Image-text Dataset for Training Large Multimodal Models
Han Oh, Dong-Bin Shin, Dae-Won Chung
Received February 6, 2025  Accepted February 28, 2025  Published online March 19, 2025  
DOI: https://doi.org/10.22761/GD.2025.0003    [Epub ahead of print]
  • 10 View
  • 1 Download
AbstractAbstract PDF
This study aims to improve the accuracy and interpretability of large multimodal models (LMMs) specialized in satellite image analysis by constructing an image-text dataset based on KOMPSAT-3/3A imagery and presenting the results of training using this dataset. Conventional LMMs are primarily trained on general images, limiting their ability to effectively interpret the specific characteristics of satellite imagery, such as spectral bands, spatial resolution, and viewing angles. To address this limitation, we developed an image-text dataset, divided into pretraining and finetuning stages, based on the existing KOMPSAT object detection dataset. The pretraining dataset consists of captions summarizing the overall theme and key information of each image. The fine-tuning dataset integrates metadata -including acquisition time, sensor type, and coordinates- with detailed object detection labels to generate six types of question-answer pairs: detailed descriptions, conversations with varying answer lengths, bounding box identification, multiple choice questions, and complex reasoning. This structured dataset enables the model to learn not only the general context of satellite images but also fine-grained details such as object quantity, location, and geographic attributes. Training with the new KOMPSAT-based dataset significantly improved the model’s accuracy in recognizing regional information and object characteristics in satellite imagery. Finetuned models achieved substantially higher accuracy than previous models, surpassing even the GPT-4o model and demonstrating the effectiveness of a domain-specific dataset. The findings of this study are expected to contribute to various remote sensing applications, including automated satellite image analysis, change detection, and object detection.
DATA ARTICLE
GeoAI Dataset for Industrial Park Segmentation from Sentinel-2 Satellite Imagery and GEMS
Sung-Hyun Gong, Hyung-Sup Jung, Geun-han Kim, Geun-Hyouk Han, Il-Hoon Choi, Jin-Sung Hong
Received November 20, 2024  Accepted January 7, 2025  Published online February 13, 2025  
DOI: https://doi.org/10.22761/GD.2024.0054    [Epub ahead of print]
  • 76 View
  • 10 Download
AbstractAbstract PDF
Air pollution in East Asia presents critical environmental and health challenges, particularly in industrial regions affected by domestic and cross-border emissions. This study developed a GEO AI dataset specifically for industrial park segmentation, integrating Sentinel-2 satellite imagery, Geostationary Environment Monitoring Spectrometer (GEMS) geostationary satellite data, and Air Quality Monitoring Network data. Optimized for semantic segmentation tasks with labeled data specifically for industrial park classification, this dataset serves as a foundational asset for the precise identification and spatial tracking of major air pollution sources. We validated the dataset’s applicability using a modified U-Net model, achieving a mean intersection over union of 0.8146 and pixel accuracy of 0.9608, thereby demonstrating its potential as a tool for detecting and monitoring pollutant sources in industrial areas. With future expansion through additional temporal data and diverse pollutant measurements, this dataset is anticipated to support regional air quality monitoring efforts and inform strategies for pollution control across East Asia.
Data Articles
GeoAI Dataset for Training a Deep Learning-based GEMS Snow Detection Model
Jin-Woo Yu, Jun-Hyeok Jung, Kyoung-Hee Kang, Yong-Mi Lee, Hyung-Sup Jung
GEO DATA. 2024;6(4):552-560.   Published online December 31, 2024
DOI: https://doi.org/10.22761/GD.2024.0060
  • 112 View
  • 12 Download
AbstractAbstract PDF
The Geostationary Environment Monitoring Spectrometer (GEMS) observes air quality across East Asia from an altitude of approximately 36,000 km, analyzing the spatiotemporal distribution of atmospheric pollutants that spread beyond localized regions. GEMS currently provides 21 core air quality-related products, most of which are derived from Level 1C data, which has undergone geometric and radiometric correction. For enhanced accuracy in air quality analysis, precise surface reflectance estimation is essential. However, high-reflectance elements, such as snow, interfere with the accurate estimation of radiance values, necessitating precise detection of such areas. Despite this, GEMS relies solely on the ultraviolet and partial visible bands, lacking the infrared bands crucial for snow detection, and it has no proprietary snow detection algorithm, instead utilizing near-real-time ice and snow extent data from the U.S. National Snow and Ice Data Center. Recently, deep learning techniques have shown potential in image processing, outperforming traditional algorithms, which could address these limitations. However, there is currently no deep learning training dataset available for snow detection specifically for GEMS. To address this issue, this study developed a GeoAI dataset for training a deep learning-based snow detection model for GEMS. In this research, we constructed input data using GEMS Level 1C data and generated label data based on GEMS, Advanced Meteorological Imager, and MODIS snow cover data. The snow detection dataset developed in this study is expected to address the snow detection limitations of GEMS, providing foundational data to enhance the reliability of future geostationary satellite-based air quality research.
GeoAI Dataset for Urbanized Area Segmentation from Landsat 8/9 Satellite Imagery and GEMS
Sung-Hyun Gong, Hyung-Sup Jung, Geun-Han Kim, Geun-Hyouk Han, Il-Hoon Choi, Jin-Sung Hong
GEO DATA. 2024;6(4):478-486.   Published online December 31, 2024
DOI: https://doi.org/10.22761/GD.2024.0053
  • 164 View
  • 14 Download
AbstractAbstract PDF
In South Korea, air pollution has emerged as a pressing social issue, necessitating data-driven approaches to monitor sources of air pollutants. This study constructed a GEO AI dataset for detecting air pollution sources in urbanized areas, utilizing Landsat 8/9 satellite imagery, Geostationary Environment Monitoring Spectrometer geostationary satellite data, and air quality monitoring network data. The dataset is optimized for semantic segmentation tasks, including labeled data for urban area segmentation, and is designed to enable precise detection of pollution sources within urban regions by integrating satellite imagery and air quality information. Using this dataset, we applied a modified U-Net model to classify pollutant sources in urbanized areas, achieving high performance with an mIoU of 0.8592 and pixel accuracy of 0.9433. These results demonstrate the effectiveness of the GEO AI dataset as a tool for identifying and managing major pollution sources, providing foundational data for air quality monitoring and policy development across South Korea and East Asia. With further integration of additional air pollution data, this dataset is expected to contribute to long-term air quality management and the mitigation of health impacts associated with pollution.
GeoAI Dataset for Urban Water Body Detection Using TerraSAR-X Satellite Radar Imagery
Eu-Ru Lee, Jun-Hyeok Jung, Ki-Chang Kim, Seong-Jae Yu, Hyung-Sup Jung
GEO DATA. 2024;6(4):435-450.   Published online December 31, 2024
DOI: https://doi.org/10.22761/GD.2024.0046
  • 200 View
  • 21 Download
AbstractAbstract PDF
This study presents the generation of a GeoAI dataset for urban water body detection using TerraSAR-X satellite synthetic aperture radar (SAR) imagery. The study area includes urban regions in Seoul and Gyeonggi Province, chosen for their complex structures and frequent flooding, which pose challenges for SAR analysis. The data preprocessing involved generating Sigma0 images, image co-registration, median filtering for speckle noise reduction, decibel conversion, and orthorectification using Copernicus DEM for precise geometric correction. Label data were created using the global river widths from Landsat dataset combined with the Otsu thresholding method and fine-tuned with Google Map imagery. Annotation guidelines were meticulously designed to account for SAR-specific phenomena such as layover, corner reflections, and side lobe effects, ensuring consistent and accurate labeling across different orbits and observation conditions. The resulting dataset supports deep learning models in learning geometric characteristics of SAR imagery, enhancing water body detection capabilities. This work provides a foundational resource for future applications in urban water management and climate-resilient disaster response.
Dataset for Deep Learning-based GEMS Asian Dust Detection
Jin-Woo Yu, Che-Won Park, Won-Jin Lee, Yong-Mi Lee, Yu-Ha Kim, Hyung-Sup Jung
GEO DATA. 2024;6(3):175-185.   Published online September 27, 2024
DOI: https://doi.org/10.22761/GD.2023.0049
  • 616 View
  • 42 Download
  • 1 Citations
AbstractAbstract PDF
In South Korea, Asian dust frequently occurs during the spring, causing various health issues, including respiratory diseases. Consequently, public awareness and concern about air pollutants have increased, leading to demands for improved air quality and accurate forecasting. To meet these demands, the Ministry of Environment has deployed the Geostationary Environment Monitoring Spectrometer (GEMS) on the GK2B satellite to monitor atmospheric pollutants and climate change-inducing substances in real-time. The current GEMS dust product, generated using thresholds of the UV-aerosol index and visible-aerosol index, has shown limitations in accurately detecting suspended particulate matter. This study aims to develop a comprehensive AI dataset for improving GEMS Asian dust detection. Data were collected from January to May 2021, focusing on dates with significant dust events. Label data were meticulously generated through annotations based on outputs from various satellites and groundbased observations. Subsequent data preprocessing and augmentation techniques, including normalization and cut-mix, were applied to enhance the dataset’s robustness and generalizability. To evaluate the dataset, model training was conducted. The results predicted by the model showed improvements over the detection results of existing algorithms. Future datasets will be developed with improved labeling methods and accuracy verification techniques. These dataset improvements are expected to contribute to the development of deep learning models with superior predictive performance compared to current dust detection algorithms.

Citations

Citations to this article as recorded by  
  • GeoAI Dataset for Training a Deep Learning-based GEMS Snow Detection Model
    Jin-Woo Yu, Jun-Hyeok Jung, Kyoung-Hee Kang, Yong-Mi Lee, Hyung-Sup Jung
    GEO DATA.2024; 6(4): 552.     CrossRef
Original Papers
GeoAI Dataset for Rural Hazardous Facilities Segmentation from KOMPSAT Ortho Mosaic Imagery
Sung-Hyun Gong, Hyung-Sup Jung, Moung-Jin Lee, Kwang-Jae Lee, Kwan-Young Oh, Jae-Young Chang
GEO DATA. 2023;5(4):231-237.   Published online December 28, 2023
DOI: https://doi.org/10.22761/GD.2023.0054
  • 2,168 View
  • 82 Download
  • 1 Citations
AbstractAbstract PDF
In South Korea, rural areas have been recognized for their potential as sustainable spaces for the future, but they are currently facing major problems. Unplanned construction of facilities such as factories, livestock facilities, and solar panels near residential areas is destroying the rural environment and deteriorating the quality of life of residents. Detection and monitoring of rural facilities are necessary to prevent disorderly development in rural areas and to manage rural space in a planned manner. In this study, satellite imagery data was utilized to obtain information on rural areas, which is useful for observing large areas and monitoring time series changes compared to field surveys. In this study, KOMPSAT ortho-mosaic optical imagery from 2019 and 2020 were utilized to construct AI training datasets for rural hazardous facilities segmentation for Seosan, Anseong, Naju, and Geochang areas. The dataset can be used in image segmentation models to classify rural facilities and can be used to monitor potentially hazardous facilities in rural areas. It is expected to contribute to solving rural problems by serving as the basis for rural planning.

Citations

Citations to this article as recorded by  
  • Performance Comparison of Water Body Detection from Sentinel-1 SAR and Sentinel-2 Optical Imagery Using Attention U-Net Model
    Il-Hoon Choi, Eu-Ru Lee, Hyung-Sup Jung
    Korean Journal of Remote Sensing.2024; 40(5-1): 507.     CrossRef
GeoAI Dataset for Training Deep Learning-Based Optical Satellite Image Matching Model
Jin-Woo Yu, Che-Won Park, Hyung-Sup Jung
GEO DATA. 2023;5(4):244-250.   Published online December 28, 2023
DOI: https://doi.org/10.22761/GD.2023.0048
  • 1,421 View
  • 65 Download
  • 1 Citations
AbstractAbstract PDF
Satellite imagery is being used to monitor the Earth, as it allows for the continuous provision of multi-temporal observations with consistent quality. To analyze time series remote sensing data with high accuracy, the process of image registration must be conducted beforehand. Image registration techniques are mainly divided into region-based registration and feature-based registration, and both techniques extract the same points based on the similarity of spectral characteristics and object shapes between master and slave images. In addition, recently, deep learning-based siamese neural network and convolutional neural network models have been utilized to match images. This has high performance compared to previous non-deep learning algorithms, but a very large amount of data is required to train a deep learning-based image registration model. In this study, we aim to generate a dataset for training a deep learning-based optical image registration model. To build the data, we acquired Satellite Side-Looking (S2Looking) data, an open dataset, and performed preprocessing and data augmentation on the data to create input data. After that, we added offsets to the X and Y directions between the master and slave images to create label data. The preprocessed input data and labeled data were used to build a dataset suitable for image registration. The data is expected to be useful for training deep learning-based satellite image registration models.

Citations

Citations to this article as recorded by  
  • Performance Comparison of Water Body Detection from Sentinel-1 SAR and Sentinel-2 Optical Imagery Using Attention U-Net Model
    Il-Hoon Choi, Eu-Ru Lee, Hyung-Sup Jung
    Korean Journal of Remote Sensing.2024; 40(5-1): 507.     CrossRef
Research on Building AI Learning Dataset for Synthetic Aperture Radar Waterbody Detection through Optical Satellite Image Fusion
Joonhyuk Choi, Ki-mook Kang, Euiho Hwang
GEO DATA. 2023;5(3):177-184.   Published online September 27, 2023
DOI: https://doi.org/10.22761/GD.2023.0029
  • 924 View
  • 32 Download
  • 1 Citations
AbstractAbstract PDF
For the spatiotemporal analysis of water resources and disasters, water body detection using satellite imagery is crucial. Recently, AI-based methods have been widely employed in water body detection using satellite imagery. To use these AI techniques, a substantial amount of training data is required. When creating training data for water body detection, optical imagery and synthetic aperture radar (SAR) imagery have their respective strengths and weaknesses. To use the advantages of both, this study proposes a water body detection method through the fusion of optical and SAR imagery. The results of the proposed model show an Intersection over Union of 0.612 and an F1 score of 0.759, which is better compared to using either optical or SAR imagery alone. This research presents a method that can easily generate a large amount of water body data, making it promising for use as AI training data for water body detection.

Citations

Citations to this article as recorded by  
  • A Comprehensive Review of Remote Sensing for Water-Related Disaster Management in South Korea: Focus on Floods and Droughts
    Eui-Ho Hwang, Jin-Gyeom Kim, Jang-Yong Sung, Ki-Mook Kang
    Korean Journal of Remote Sensing.2024; 40(5-2): 833.     CrossRef
Articles
Construction of a Training Dataset for Vessel Distribution Prediction: The Northern Seas of Jeju Island
Yonggil Park, Taehoon Kim, Hyeon-Gyeong Han, Cholyoung Lee
GEO DATA. 2022;4(2):37-46.   Published online June 30, 2022
DOI: https://doi.org/10.22761/DJ2022.4.2.004
  • 1,370 View
  • 70 Download
AbstractAbstract PDF
Recently, interest in maritime accidents and safety-related research, such as preventing collisions between marine vessels, detecting illegal vessels, and predicting vessel routes, is increasing. Vessel location data-based vessel distribution map can support decision-making for maritime safety management, and if the vessel distribution can be predicted, it is possible to take a preemptive response for maritime security such as fishing safety management and illegal fishing prevention. In this study, a training dataset for vessel distribution prediction was constructed by collecting V-Pass data, weather warnings, and marine environment data. The result of resampling of reporting interval of vessel location data was mapped to grid data to evaluate the vessel density, and a total of 1,314,000 of training data were constructed for the study area. In the future, research to evaluate the accuracy by performing vessel distribution prediction modeling should be conducted.
A Geological Environment Characteristics Dataset of Tidal Flat Surface Sediments: A 2021 Pilot Study of the Gomso Bay Tidal Flat Area to Use of Sediment Type Data
Kyoungkyu Park, Han Jun Woo, Hoi-Soo Jung, Joo Bong Jeong, Joo-Hyung Ryu, Jun-Ho Lee
GEO DATA. 2022;4(2):9-22.   Published online June 30, 2022
DOI: https://doi.org/10.22761/DJ2022.4.2.002
  • 2,081 View
  • 54 Download
  • 2 Citations
AbstractAbstract PDF
The Gomso Bay tidal flat is located between Buan-gun and Gochang-gun in Jeollabuk-do, Korea; it is a semi-closed bay in an area where tides prevail over waves. Tidal flats are mainly found south of Gochang-gun, and the main stream located north of the tidal flats is about 15 m deep and 900 m wide at low tide. Limited direct sampling is necessary for analyzing the geological environment of intertidal tidal flats, depending on the expected ebb-tide time and the number of survey items allowed for tidal flat access. This study assessed field measurement and laboratory analysis items for obtaining and establishing geological environment data to use of sediment type data in a pilot research area in the Gomso Bay tidal flat. Thirty sites were examined on June 22 and 24, 2021 (survey time about 3.5 hours for the 2 days). The field measurements were the sample date (year/month/day/hour/minute), ellipsoid height using a real-time kinematics global positioning system (RTK GPS) (m), shear strength (kg/cm2), and Munsell color. Samples for particle size (phi, Φ), specific density, porosity (%), moisture content (%), total organic carbon (%), total carbon (%) and total nitrogen (%) were placed in zipper bags and polypropylene (PP) bottles. The sedimentary phases were classified following Folk and Ward (1957), the organic matter was characterized based on particle size analysis and each experimental result was verified. In the future, a geological environment characteristics dataset based on this pilot study will be used as basic data to assess changes in the tidal flat topography and sedimentation environment. It should be useful data for research, tidal flat environment conservation management and free open data for users of related researchers.

Citations

Citations to this article as recorded by  
  • Study on Grain Size, Physical Properties and Organic Matter Characteristics of Tidal Flat Surface Sediments: May 2022 Hwangdo Tidal Flat Dataset, Cheonsu Bay
    Jun-Ho Lee, Hoi-Soo Jung, Huigyeong Ryu, Keunyong Kim, Joo-Hyung Ryu, Yeongjae Jang
    GEO DATA.2024; 6(3): 159.     CrossRef
  • Characteristics of temporal-spatial variations of zooplankton community in Gomso Bay in the Yellow Sea, South Korea
    Young Seok Jeong, Min Ho Seo, Seo Yeol Choi, Seohwi Choo, Dong Young Kim, Sung-Hun Lee, Kyeong-Ho Han, Ho Young Soh
    Environmental Biology Research.2023; 41(4): 720.     CrossRef
AI Dataset for Road Detection using KOMPSAT Images
Hoonhee Lee, Han Oh
GEO DATA. 2022;4(1):43-48.   Published online March 31, 2022
DOI: https://doi.org/10.22761/DJ2022.4.1.005
  • 1,597 View
  • 49 Download
AbstractAbstract PDF
Information on shape and type of road present in an optical image of satellite is useful for digital mapping and monitoring of road changes. Processing and structuring optical image data collected from payloads mounted on KOMPSAT 3 and 3A can accelerate the development of road detection algorithms and the extraction of road information using them. In particular, if it is built with a learning dataset for AI (Artificial Intelligence) prepared to apply deep learning technology, the latest artificial intelligence technology in the field of computer science can be spun off to the field of satellite image-based road detection to attempt a wide range of analysis. Korea Aerospace Research Institute constructed an image dataset for AI learning using satellite optical images with Korean companies, and this paper explains the type and size of datasets along with examples of the use of the dataset. The established data can be used through the website, aihub.or.kr.
Dataset for Water Body Detection Using Satellite SAR Images
SeungJae Lee, Han Oh
GEO DATA. 2021;3(2):12-19.   Published online July 21, 2021
DOI: https://doi.org/10.22761/DJ2021.3.2.002
  • 1,519 View
  • 49 Download
AbstractAbstract PDF
Satellite synthetic aperture radar (SAR) generates valid image information in all-weather. Thus, it can be effectively used for near real-time monitoring and damage analysis of flood areas which always involve overcast skies. Water body detection (WBD) using SAR images can be implemented by various techniques which discriminate electromagnetic characteristics between water and non-water areas. Especially, semantic segmentation exploiting artificial intelligence techniques can be used to develop a high-performance WBD model. To this end, Korea Aerospace Research Institute has built an WBD dataset using KOMPSAT-5 images. The dataset is currently available through the website, aihub.or.kr.
AI Training Dataset for Cloud Detection of KOMPSAT Images
Bo-Ram Kim, Han Oh
GEO DATA. 2020;2(2):56-62.   Published online December 30, 2020
DOI: https://doi.org/10.22761/DJ2020.2.2.008
  • 1,389 View
  • 57 Download
  • 1 Citations
AbstractAbstract PDF
Clouds that appear inevitably when acquiring optical satellite images hinder the interpretation of surface information, so removing them is a crucial procedure to increase the utilization of satellite images. Currently, for KOMPSAT (Korea Multi-purpose Satellite) images, only the cloud amount by visual measurement is proved for the entire scene and detailed cloud masks are not provided. Since cloud detection is a time-consuming task, we built a cloud dataset for KOMPSAT images so as to develop an algorithm that expedites the task with state-of-the-art artificial intelligent techniques. In the dataset, satellite images were selected from various regions considering that clouds have different characteristics depending on the region, and masks were classified into thin clouds, thick clouds, cloud shadows, and clear sky. The size of dataset is over 4,000 image/mask pairs by an image size of 1000x1000 and one of the largest among publicly available cloud datasets, as of this writing. The dataset is built by a government AI (artificial intelligent) training dataset building program and will be available through the website, aihub.or.kr.

Citations

Citations to this article as recorded by  
  • Cloud Detection Using a UNet3+ Model with a Hybrid Swin Transformer and EfficientNet (UNet3+STE) for Very-High-Resolution Satellite Imagery
    Jaewan Choi, Doochun Seo, Jinha Jung, Youkyung Han, Jaehong Oh, Changno Lee
    Remote Sensing.2024; 16(20): 3880.     CrossRef

GEO DATA : GEO DATA
TOP