Object-Based Land Cover Classification Using SAM and Google Earth Engine

AGSRT Student
6 hours ago
6 min read

1. INTRODUCTION

Land cover classification is one of the most important applications of remote sensing, as it helps in monitoring urban expansion, studying vegetation cover, identifying water bodies, and analyzing land use. Many traditional classification methods rely on classifying each pixel independently; however, this approach can produce noisy maps, especially when using Sentinel-2 imagery with medium spatial resolution, where a single pixel can contain more than one type of land cover.

In this project, a different methodology was used, based on object-based classification instead of direct pixel-by-pixel classification. The core idea is to use the Segment Anything Model (SAM) to extract homogeneous objects or regions from Sentinel-2 imagery, and then classify these objects based on spectral indices and land cover probabilities generated by Dynamic World within Google Earth Engine.

SAM is not used here as a direct semantic classifier, as it does not provide categories such as water, vegetation, or urbanization, but rather segmentation masks. Therefore, SAM was combined with spectral and semantic information from Sentinel-2 and Dynamic World to classify the resulting objects into three main visually distinct categories: water, vegetation, and urban areas, in addition to an auxiliary category called Other land for objects that do not meet the criteria of the three main categories.

This study aims to build a practical workflow combining Google Earth Engine, SAM, and Python/Colab to produce an object-based land cover map, and then to evaluate it initially using visually defined reference points from Google Earth Pro.

2. STUDY AREA

The study area in Chandigarh, India, was chosen because it contains a clear diversity of land cover patterns. The area includes dense urban areas, extensive vegetation, hills and forests in the east and northeast, and prominent bodies of water such as Sukhna Lake.

This area is suitable for the project because it contains three distinct classes that can be visually and spectrally differentiated: buildings, vegetation, and water. It also contains mixed and challenging areas at the edges, making it suitable for testing the usefulness of object-based classification rather than direct pixel-based classification.

Map of Chandigarh — **Figure 1.** Study area location in Chandigarh, India

3. MATERIALS AND METHODS

3.1 Data to be Downloaded

The following data was used in the project:

Sentinel-2 images were exported from Google Earth Engine, and then the following spectral channels

were used:

The following spectral indices were also calculated:

3.2 Methodology Overview

3.3 Sentinel-2 Composite Preparation

Sentinel-2 images were processed within Google Earth Engine, and several spectral compositions were exported in GeoTIFF format. The final methodology relied on three key visual information:

Using more than one spectral composition was necessary because each composition renders certain elements differently. RGB was used as the base, SWIR to cover additional areas, and a water-sensitive composite to enhance water capture.

Figure 3. Sentinel-2 composites used for SAM-based object extraction — **Figure 3.** Sentinel-2 composites used for SAM-based object extraction

3.4 SAM-Based Object Extraction

The SAM ViT-B model was used to extract objects from composite images. The SAM output was transformed from nested masks to a non-overlapping object map, so that each object received a unique identifier.

Figure 4. SAM-based object extraction — **Figure 4.** SAM-based object extraction

SAM was used solely as an object generator, not as a land-cover classifier. The object map was then saved in GeoTIFF format and used as the basis for extracting properties and classifying each object.

The final object map was created. The goal of this step was to improve object coverage without introducing significant noise. Water-sensitive elements also helped capture some small bodies of water that were not clearly visible in the original image. The final merge result produced an object map covering approximately:

Figure 5. Final RGB + SWIR + Water-aware SAM object fusion — **Figure 5.** Final RGB + SWIR + Water-aware SAM object fusion

3.6 Object-Level Feature Extraction

After creating the final object map, properties were extracted for each object. Each row in the properties table represents one object.

The averages within each object were calculated for the following data:

Sentinel-2 channels: B2, B3, B4, B8, B11, B12.
Spectral indices: NDVI, NDWI, MNDWI, NDBI, BSI.
Dynamic World probabilities.
Object area in square meters.

These properties were used to determine the class of each object.

Figure 6. Example of extracted object-level spectral and semantic features. — **Figure 6.** Example of extracted object-level spectral and semantic features.

3.7 Object-Based Classification Rules

The objects were classified into the following categories:

The classification was based on a prioritized set of rules:

Water: If there is strong water evidence from DW_water, MNDWI, or NDWI, with low NDVI and low urbanization evidence.
Built-up: If there is support from DW_built, NDBI, or the built-up support ratio.
Vegetation: If there is strong support from NDVI or Dynamic World vegetation probability.
Other land: For residual objects.

4. RESULTS

4.1 Final Classification Map

The final methodology produced an object-based classification map for the Chandigarh region. The map shows four main outputs: water, vegetation, urbanization, and other land, in addition to No SAM object areas.

Figure 6. Final object-based land-cover classification map — **Figure 6.** Final object-based land-cover classification map

4.2 Area Statistics

The following table shows the final areas for each class:

The results indicate that urban areas are the largest proportion of the classified area, followed by vegetation. Water appears in a small percentage, which is logical given the limited area of water bodies within the study zone. Approximately 29.73% of the area remained within the "No SAM object" category, meaning it was not covered by SAM objects.

5. ACCURACY ASSESSMENT

5.1 Reference Samples

Accuracy was assessed using reference points identified from Google Earth Pro for Water, Vegetation, and Built-up.

Figure 8. Google Earth Pro reference samples used for accuracy assessment — **Figure 8.** Google Earth Pro reference samples used for accuracy assessment

The number of reference samples:

5.2 Accuracy Assessment

The accuracy of the final classification map was assessed based on visually interpreted reference points from Google Earth Pro imagery. The assessment focused on the three most visually prominent categories in the study area: water, vegetation, and urban areas. The specific matrixes used to measures the accuracy for each class: Precision, Recall, and F1-score.

The results showed that the final classification map achieved an overall accuracy of 80.49%, with a Kappa coefficient of 0.705, indicating good agreement between the classified map and the visual reference samples. The urban area category performed best, achieving an F1-score of 0.97, demonstrating the methodology's high ability to distinguish built-up areas. The vegetation category also performed well, with an F1-score of 0.80, showing a high ability to capture vegetation areas. The water classification achieved Precision = 1.00, meaning that the areas classified as water were highly accurate. Recall = 0.38, however, indicates that some water areas were not fully captured, particularly in small areas or near edges. The following table shows Confusion matrix for the final three-class classification.

These results indicate that the proposed methodology is effective in classifying visually distinct species, particularly urban and vegetation cover. The water classification also demonstrates that the classification was conservative; identifying a body as water is generally accurate, although some small or mixed water bodies may require further refinement in future work.

6. DISCUSSION

The results showed that the object-based SAM-GEE methodology is capable of producing a useful land cover map for visually distinct categories, particularly urban and vegetation. The urban category achieved an F1 score of 0.97, indicating that urban objects were well distinguished within the final map.

Vegetation achieved a very high recall, with all reference vegetation points being captured during the conditional evaluation. However, its precision was lower than that of urban because some water points were misclassified as vegetation, especially in mixed areas or near edges.

The water category achieved a high precision of 1.00, meaning that objects misclassified as water were mostly correct. However, its recall remained relatively low, indicating that some water points were not fully captured. because of the presence of small water points , as well as the effect of the 10-meter Sentinel-2 resolution, where pixels may be mixed with water, vegetation, or shadows.

The results demonstrate that using SAM alone is insufficient for semantic classification but is very useful for object generation. When combined with Sentinel-2 and Dynamic World indicators, it becomes possible to build a clear and explainable object-based workflow. Using Google Earth Pro as a reference point also allowed for an initial visual assessment of map quality.

7. LIMITATIONS

Several limitations must be considered:

1- Study area

The methodology was applied to Chandigarh only, so it needs to be tested in other areas to confirm its generalizability.

2- Sentinel-2 spatial resolution

A resolution of 10 meters may not be sufficient to distinguish small objects, especially small bodies of water or mixed urban areas.

3- SAM does not provide semantic classes

SAM only generates segmentation masks, so we needed spectral rules and Dynamic World probabilities to determine classes.

4- final assessment

Google Earth Pro points were interpreted visually and are suitable as a preliminary reference source.