To help kick off the Public Preview of ArcGIS GeoAnalytics for Microsoft Fabric, we will have a series of blog posts exploring the functionality in GeoAnalytics for Fabric. In this post, we will explore the fundamentals of constructing data and converting between geometry types in GeoAnalytics for Fabric.
We'll start with a refresher on how to import GeoAnalytics for Fabric and start working with it. To import GeoAnalytics for Fabric you just need to add import geoanalytics_fabric to a cell in your notebook. For more information on enabling the library, see the Getting started documentation.
For convenience, we recommend also importing the geospatial functions directly, and to give them an easy to use alias for reference. In the example below, we import the library and the functions with an alias of ST. Since each of the functions are named ST.<function_name> this makes them easy to reference and identifies them as being Spatial Type (ST) functions. This will also align with the structure of examples throughout the ArcGIS GeoAnalytics documentation. Your cell will look like this:
# import ArcGIS GeoAnalytics
import geoanalytics_fabric
import geoanalytics_fabric.sql.functions as ST
For instance, after importing, you would be able to reference the ST_Buffer function using ST.buffer instead of geoanalytics_fabric.sql.functions.buffer.
Now, let's look at how we can read data that we can work with in GeoAnalytics for Fabric. Since were working in Microsoft Fabric, we have access to datasets from any source that Fabric can connect to directly, and using GeoAnalytics for Fabric functionality we can also read from Esri feature services.
Spark, by itself, supports reading data from a number of different source types, and GeoAnalytics for Fabric adds an additional set of spatial data sources.
For this example, we will demonstrate reading in a dataset in parquet format for public safety data from the city of Boston, MA from the Azure open datasets. This dataset contains latitude and longitude coordinates for public safety service requests from Boston.
The process is as easy as pointing Spark's read function to the location of the dataset and indicating that it is in parquet format - like this:
# https://learn.microsoft.com/en-us/azure/open-datasets/dataset-boston-safety?tabs=pyspark
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=Boston"
blob_sas_token = r""
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
blob_sas_token)
print('Remote blob path: ' + wasbs_path)
# SPARK read parquet
df = spark.read.parquet(wasbs_path)
df.persist().count()
This stores our data in a Spark DataFrame that we can access with for further transformation or analysis. Microsoft Fabric has nice functionality for displaying and interacting with DataFrames as tables using the display() function. Display() also allows for some lightweight analysis and exploration of the non-spatial aspects of the data using tables and charts. As an example, we can quickly explore our data like this:
# display the table
display(df)
Which results in a nicely formatted table with the option to inspect the different columns of data using charts and graphs:
Microsoft Fabric's display() functionality for exploring contents of a Dataframe
The GeoAnalytics for Fabric library contains a set of data constructor functions for creating or converting geometry. These are useful for ingesting data and transforming it into a geometry.
There are five spatial formats supported. Each of these has a function for creating geometry from the text or binary representation (ST_GeomFrom{spatial type}), and for converting geometry to this text or binary representation (ST_As{spatial type}). You can ingest data and convert between geometry types using the functions for these spatial formats. Below we list the generic "geometry" form for each format, however, each also has associated functions to support reading in specific geometry types (Point, Line, and Polygon), for instance ST_PointFromBinary is the Point-specific format of ST_GeomFromBinary.
There are also constructors for generating geometry from x,y coordinates or arrays of points:
Let's take a look at how these functions work for creating or converting geometry.
We'll start by creating a geometry from the Boston public safety dataset that we imported above. By default, this dataset doesn't have any geometry, but it does have latitude and longitude coordinates for each record. To work with any data using GeoAnalytics for Fabric, if there isn't already at least one geometry column we will start by creating a geometry.
Depending on the format of the data, there are a number of functions included with GeoAnalytics for Fabric that can be used. Since this dataset has point coordinates in two separate columns, we will use ST_Point.
ST_Point takes two numeric columns or double values as input and returns a point column. The two numeric columns or values must contain the x,y coordinates of the point geometries. You can optionally specify a spatial reference for the result point column. In the example below, we add the spatial reference identifier of 4326. This indicates that the coordinates use the World Geodetic System 1984 coordinate system, or WGS84.
You can find more information about coordinate systems and spatial reference identifiers in the GeoAnalytics for Microsoft Fabric core concepts on Coordinate systems and transformations.
Note that the ordering of the coordinates used as input for ST_Point is (longitude, latitude) or (x,y).
# create a new column called geometry for each record
# when we create the point geometries here, we are adding an optional value for the spatial reference. In this case it is 4326 (the World Geodetic Survey of 1984, or WGS84)
df = df\
.withColumn("geometry", ST.point("Longitude", "Latitude", 4326))
When displaying tables with a geometry, the point, line, or polygon geometry is automatically converted into a more human-readable format for display.
Take a look at the point geometry as listed in new geometry column - we have a single column with all of the details for our points. We can now use this for analysis or visualization using GeoAnalytics for Fabric.
Now we have a geometry field to work with, we can start to explore the spatial functions available with GeoAnalytics for Fabric library.
To demonstrate the process of converting between geometry types, we can translate this new point geometry between the different formats supported. The formats that we'll demonstrate are:
In the example below we will just do the conversions for display so that we can see the differences, but don't explicitly create new columns with the converted geometry. If you wanted to create new columns, you would use the .withColumn() function.
# show the results of converting a geometry to binary, geoJSON, esriJSON, text, and shape formats
display(
df.select("geometry",
ST.as_binary("geometry").alias("geom_binary"),
ST.as_geojson("geometry").alias("geom_geojson"),
ST.as_esri_json("geometry").alias("geom_esrijson"),
ST.as_text("geometry").alias("geom_text"),
ST.as_shape("geometry").alias("geom_shape"))
)
The GeoJSON, EsriJSON, and well-known text (WKT) formats are all human-readable representations of the geometry.
The well-known binary and shapefile representations are not.
We can see all of these in the image of the displayed table below:
When working with geometry using the GeoAnalytics in Microsoft Fabric library, you will primarily work with the native geometry format. This is what you end up with when you read a spatial file directly into a DataFrame (e.g., from a shapefile, feature service, or geodatabase) or convert using any of the ST_GeomFrom{spatial type} functions.
However, the geometry format that GeoAnalytics for Fabric uses isn't recognized everywhere and you might need to use a different format when you save your data back into OneLake or into another storage location. You might want a well-known text or geoJSON format to save data into Snowflake, or an EsriJSON format to visualize your geometries in a Power BI dashboard.
Or, since the Delta format doesn't have a native geometry type, if you want to save the results of spatial analysis back to OneLake in a Delta format, we recommend you first convert your geometries into a text or binary representation. Well-known binary is generally the most compact representation, however, it isn't human-readable. If you need a human-readable format, well-known text is another option.
You can convert your data into one of those formats with the ST_AsBinary or ST_AsText functions.
Let's look at an example of writing a Delta table and converting the point geometries.
# Write a delta file with well-known binary geometry
df\
.withColumn("geometry_wkb", ST.as_binary("geometry"))\
.write\
.mode("overwrite")\
.option("overwriteSchema", True)\
.format("delta")\
.save("Tables/service_calls_delta")
Hopefully this quick primer on data types and data constructors has been helpful. We'll be posting additional content to help you get started, so check back on the Community for more! And please let us know what type of things you'd like to learn more about!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.