Select to view content in your preferred language

Getting Started with GeoAnalytics for Fabric: Let's Get Some Spatial Data!

521
0
03-05-2025 07:50 AM
SBattersby
Esri Contributor
0 0 521

To help kick off the Public Preview of ArcGIS GeoAnalytics for Microsoft Fabric, we will have a series of blog posts exploring the functionality in GeoAnalytics for Fabric.  In this first post, we will explore the fundamentals of working with spatial data in GeoAnalytics for Fabric by showcasing the process of reading data and creating new geometries to power further analysis and visualization.

Where can you use GeoAnalytics for Fabric

GeoAnalytics for Fabric is accessible in the Data Science and Data Engineering workloads within Fabric.  In these workloads, you'll be able to work with GeoAnalytics for Fabric within notebooks.  The library is only included in the Microsoft Fabric Runtime 1.3.  It is not included in earlier runtimes. 

Fabric data science and data engineering workloadsFabric data science and data engineering workloads  

 

 Importing GeoAnalytics for Fabric

To import GeoAnalytics for Fabric you just need to add import geoanalytics_fabric to a cell in your notebook. For more information on enabling the library, see the Getting started documentation.

For convenience, we recommend also importing the geospatial functions directly, and to give them an easy to use alias for reference. In the example below, we import the library and the functions with an alias of ST. Since each of the functions are named ST.<function_name> this makes them easy to reference and identifies them as being Spatial Type (ST) functions. This will also align with the structure of examples throughout the ArcGIS GeoAnalytics documentation.  Your cell will look like this:

 

# import ArcGIS GeoAnalytics
import geoanalytics_fabric
import geoanalytics_fabric.sql.functions as ST

 

For instance, after importing, you would be able to reference the ST_Buffer function using ST.buffer instead of geoanalytics_fabric.sql.functions.buffer.

Ingest data

Now, let's look at how we can read data that we can work with in GeoAnalytics for Fabric.  Since were working in Microsoft Fabric, we have access to datasets from any source that Fabric can connect to directly, and using GeoAnalytics for Fabric functionality we can also read from Esri feature services.

Spark, by itself, supports reading data from a number of different source types, and GeoAnalytics for Fabric adds an additional set of spatial data sources.  

For this example, we will demonstrate reading in a dataset in parquet format for New York City yellow cab pick-up / drop-off locations from the Azure open datasets.  This dataset contains latitude and longitude coordinates for both pick-up and drop-off locations for yellow cabs. The dataset is quite large, so in this example I will limit to 1000 records collected starting in 2015 to simplify.

The process is as easy as pointing Spark's read function to the location of the dataset and indicating that it is in parquet format - like this:

 

# https://learn.microsoft.com/en-us/azure/open-datasets/dataset-taxi-yellow
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "nyctlc"
blob_relative_path = "yellow"
blob_sas_token = r""

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# SPARK read parquet
df = spark.read.parquet(wasbs_path).filter(F.col("puYear") > 2015).limit(1000)
df.persist().count()

 

This stores our data in a Spark DataFrame that we can access with for further transformation or analysis.  Microsoft Fabric has nice functionality for displaying and interacting with DataFrames as tables using the display() function.  Display() also allows for some lightweight analysis and exploration of the non-spatial aspects of the data using tables and charts.   As an example, we can quickly explore our data like this:

 

# display the table
display(df)

 

 Which results in a nicely formatted table in Fabric:

Microsoft Fabric's display() functionality for exploring contents of a DataframeMicrosoft Fabric's display() functionality for exploring contents of a Dataframe

Create Geometry

To work with data using GeoAnalytics for Fabric, you will want to have a geometry.  This dataset, however, didn't come with any geometry - but it does have columns for latitude and longitude.  We can use those to create a geometry column.

Since this dataset has point coordinates in two separate columns, we will use ST_Point, but there are numerous functions for creating geometry such as:

Let's look at how ST_Point works.  

ST_Point takes two numeric columns or double values as input and returns a point geometry column. The two numeric columns or values as input must contain the x,y coordinates of the point geometries. When using ST_Point, you can optionally specify a spatial reference for the result point geometry column. In the example below, we demonstrate including the spatial reference identifier of 4326. This indicates that the coordinates use the World Geodetic System 1984 coordinate system, or WGS84.

 

# create a new column called geom_start to represent the starting location for each yellow cab trip
# when we create the point geometry here, we are adding an optional value for the spatial reference. In this case it is 4326 (the World Geodetic Survey of 1984, or WGS84)
df = df.withColumn("geom_start", ST.point("startLon", "startLat", 4326))

 

You can find more information about coordinate systems and spatial reference identifiers in the ArcGIS GeoAnalytics core concepts on Coordinate systems and transformations

Note that the ordering of the coordinates used as input for ST_Point is (longitude, latitude) or (x,y).

Once we create our geometry column using ST_Point, we have a new column that looks like the table below. Note that by default GeoAnalytics for Fabric will display the geometry in a human-readable format to make it easier to get a feel for what is in the column when you display it.  You may also notice that there are NaN for the "z" and "m" elements in the point.  This is because ST_Point only takes X and Y coordinates as input, but a point geometry can also store values for "m" (measure) and "z" (3rd dimension) elements.

One column of point geometry dataOne column of point geometry data

Now that we have a point geometry, we can map it or use it in our analyses. 

We can use the st.plot functionality to map the points.  

 

df.st.plot(basemap="light", 
             marker_size=5, 
             color="black", 
             geometry="geom_start");

 

Example of plotting point geometriesExample of plotting point geometries

 

from geoanalytics_fabric.tools import AggregatePoints

# Use Aggregate Points to summarize the count of taxi pickups
result = AggregatePoints() \
            .setBins(bin_size=0.5, bin_size_unit="Kilometers", bin_type="Hexagon") \
            .run(df)

 

Note that in the Public Preview, to render basemaps you need to include a special configuration cell at the start of your notebooks.  More detail is provided in the Known Limitations post in the GeoAnalytics for Fabric Community.

Or we can perform any number of analyses using the functions and tools available in GeoAnalytics for Fabric.  For instance, we could create hexagonal bins to explore the data distribution visually - or to simplify to incorporate into a Power BI dashboard. For instance, we can use the aggregate points tool:

 

from geoanalytics_fabric.tools import AggregatePoints

# Use Aggregate Points to summarize the count of taxi pickups
result = AggregatePoints() \
            .setBins(bin_size=0.5, bin_size_unit="Kilometers", bin_type="Hexagon") \
            .run(df)

result.st.plot(basemap="light", cmap_values="count")

 

SBattersby_0-1740863551859.png

Read in data from spatial files

You might also have data that is already in a spatial format.  Using GeoAnalytics for Fabric, you can read in data from a number of spatial files, such as file geodatabase, GeoJSON, GeoParquet, Shapefile, and Esri Feature Services.  The full list of data sources supported, and code samples for reading each type of data, is included in the developer documentation.

Here we'll show quick examples of reading in Feature Services from the ArcGIS Living Atlas of the World. The Living Atlas of the World is the foremost collection of geographic information from around the globe. It includes maps, apps, and data layers to support your work. 

A feature service is a data service that stores and hosts spatial and non-spatial data online. In a feature service, spatial datasets are feature layers and non-spatial datasets are tables. You can query, edit and analyze data in a feature service.

Feature services are not stored in OneLake, but you can access using the URL for the feature service, and any credentials needed to access the service, if it is not publicly available.

As an example, we can read a feature service with USA States Generalized Boundaries from the ArcGIS Living Atlas like this - the URL for the feature service comes from the details page for the feature:

 

# read a feature service hosted in the Living Atlas of the World
myFS="https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_States_Generalized_Boundaries/FeatureServer/0"
df = spark.read.format('feature-service').load(myFS)

# plot a DataFrame with geometry from a feature service
df.st.plot(basemap="light", facecolor="yellow", edgecolor="black", alpha=0.5);

 

SBattersby_1-1740863786523.png

The feature service shown above has numerous attributes, including the polygon geometry that we used to plot the map.

SBattersby_2-1740863849863.png

 

Writing results 

DataFrames with results of your analyses can be written back to OneLake or into ArcGIS as feature services.    In this section we will show examples of writing our taxi dataset into several common formats.  

Note that when you are writing spatial data into formats that don't support a geometry type (e.g., csv, delta, etc.) you will want to convert your geometry to a string or binary format.  Here are examples of how you can convert a geometry into a number of common formats:

 

# Examples of converting a geometry type to other formats
# Convert to GeoJSON, well-known text, well-known binary, and creating separate columns for the x & y coordinates
df = df\
      .withColumn("Geom as GeoJSON", ST.as_geojson("geom_start"))\
      .withColumn("Geom as Text", ST.as_text("geom_start"))\
      .withColumn("Geom as Binary", ST.as_binary("geom_start"))\
      .withColumn("Geom X coordinate", ST.x("geom_start"))\
      .withColumn("Geom Y coordinate", ST.y("geom_start"))

 

And here are examples of writing our DataFrame into various common formats:

 

# Write a delta file with well-known binary geometry
df\
    .withColumn("geom_start", ST.as_binary("geom_start"))\
    .write\
    .mode("overwrite")\
    .format("delta")\
    .save("Tables/taxi_delta")

# Write a CSV file with geometry as well-known text
df\
    .withColumn("geom_start", ST.as_text("geom_start"))\
    .write\
    .mode("overwrite")\
    .format("csv")\
    .option("header", True)\
    .save("Tables/taxi_csv_wkt.csv")

# Write a geoJSON file using the geometry column in the DataFrame
df\
    .write\
    .mode("overwrite")\
    .format("geojson")\
    .save("Tables/taxi_geojson")

# Write geoparquet using the geometry column in the DataFrame
df\
    .write\
    .mode("overwrite")\
    .format("geoparquet")\
    .save("Tables/taxi_geoparquet")

# Write a feature service
## First, register a GIS to save feature services

## The default GIS is ArcGIS Online
## The credentials below are not valid in ArcGIS Online, this code is for demonstration only
## If you have valid credentials, you can uncomment the line below and provide your credential information
## It is recommended to secure your credentials using the Azure Key Vault instead of hard-coding them directly into any notebooks in Microsoft Fabric.
## https://learn.microsoft.com/en-us/azure/key-vault/general/overview

# geoanalytics_fabric.register_gis("myGIS", username="User", password="p@ssw0rd")

# example code for writing to a feature service
# this will not write unless you connect to and register a valid GIS as shown above

# df\
#     .write\
#     .format("feature-service")\
#     .option("gis", "myGIS")\
#     .option("serviceName", "myServiceName")\
#     .option("layerName", "myLayer")\
#     .save()

 

 

 

 

 

 

 

Conclusion

Hopefully this quick primer on reading data, creating geometries, and writing results back to OneLake or to Esri feature services has been helpful.  We'll be posting additional content to help you get started, so check back on the Community for more!  And please let us know what type of things you'd like to learn more about!

Contributors