To help kick off the Public Preview of ArcGIS GeoAnalytics for Microsoft Fabric, we will have a series of blog posts exploring the functionality in GeoAnalytics for Fabric. In this post, we will explore the fundamentals of identifying and transforming coordinate systems in GeoAnalytics for Fabric. We won't dive deep into the details of how coordinate systems work or the impact they have on your analysis, so if you'd like more information on that to start, we recommend the GeoAnalytics for Fabric core concept on coordinate systems and transformations.
We'll start with a refresher on how to import GeoAnalytics for Fabric and start working with it. To import GeoAnalytics for Fabric you just need to add import geoanalytics_fabric to a cell in your notebook. For more information on enabling the library, see the Getting started documentation.
For convenience, we recommend also importing the geospatial functions directly, and to give them an easy to use alias for reference. In the example below, we import the library and the functions with an alias of ST. Since each of the functions are named ST.<function_name> this makes them easy to reference and identifies them as being Spatial Type (ST) functions. This will also align with the structure of examples throughout the ArcGIS GeoAnalytics documentation. Your cell will look like this:
# import ArcGIS GeoAnalytics
import geoanalytics_fabric
import geoanalytics_fabric.sql.functions as ST
For instance, after importing, you would be able to reference the ST_Buffer function using ST.buffer instead of geoanalytics_fabric.sql.functions.buffer.
For the examples in this blog post, we will use a dataset of public safety data from the city of Boston, MA from the Azure open datasets. This dataset contains latitude and longitude coordinates for public safety service requests from Boston.
# https://learn.microsoft.com/en-us/azure/open-datasets/dataset-boston-safety?tabs=pyspark
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=Boston"
blob_sas_token = r""
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
blob_sas_token)
print('Remote blob path: ' + wasbs_path)
# SPARK read parquet
df = spark.read.parquet(wasbs_path)
df.persist().count()
Now let's dig into coordinate systems using this dataset!
A spatial reference describes where features are located in the world. Most spatial references are either geographic (using a geographic coordinate system, i.e., latitude and longitude) or projected (using a projected coordinate system).
More information about spatial reference, coordinate systems, and transformations can be found in the core concepts for ArcGIS GeoAnalytics.
If there is a spatial reference system defined for your data, you can check this information using the ST_SRID function or get_spatial_reference().
In the examples below, the result should show that the spatial reference is "4326." 4326 is the identifier for WGS84. For more details on spatial reference IDs, they can be looked up on the spatialreference.org website. For example, here is the entry for WGS84.
# check the spatial reference ID (SRID) for each row in a DataFrame
display(df.select("geometry", ST.srid("geometry").alias("SRID")))
We can also look at more detailed spatial reference information for a geometry column in a DataFrame like this:
# retrieve detailed spatial reference information for a geometry column in a DataFrame
# if there is only one geometry column in the dataset, you do not need to specify the column
sr = df.st.get_spatial_reference("geometry")
print("SRID:", sr.srid)
print("Is Projected:", sr.is_projected)
print("Unit:", sr.unit)
print("WKT:", sr.wkt)
If you have a spatial reference set for your geometry, you can transform it into a different spatial reference using ST_Transform and specifying the spatial reference ID (SRID) or provide a valid Well-Known Text (WKT) string for the new spatial reference.
For some GeoAnalytics functions and tools you can perform your analysis without transforming the data between different coordinate systems, as the GeoAnalytics library will project on the fly, as needed. However, when calculating spatial properties (e.g., area, distance, etc.) the unit of measure is generally the unit of the input geometry. If the geometry is in a geographic coordinate system then your results will likely be in decimal degrees, which are angular units. You should work in a projected coordinate system to get results in meters or other planar linear units. We recommend always reading the documentation to ensure you understand the spatial units of your dataset and the units that will result from various analysis functions / tools. Additional information regarding coordinate systems and transformations can be found in the GeoAnalytics for Fabric core concept on coordinate systems and transformations.
For many calculations of geographic properties there are also options for using geodesic calculations, which take the curvature of the Earth into account and result in accurate linear distance measurements (in meters).
In the example below, we will transform the geometry from WGS84 (SRID 4326) to Massachusetts State Plane Coordinate System (SRID 2249). This is an example of transforming a geographic coordinate system to a projected coordinate system. The units of the Massachusetts State Plane Coordinate system are feet, so measurements of distance on this projected data will be in feet. Note that in this example below we are just transforming the geometry and displaying the result in a table.
# Transform geometry using ST_transform and check the SRID values
display(df
.select(
"geometry",
ST.srid("geometry").alias("geometry_srid"),
ST.transform("geometry", 2249).alias("geometry_2249"),
ST.srid("geometry_2249").alias("geometry_2249_srid"),
)
)
When we look at the results, it's clear that the coordinates are very different for our geometry and geometry_2249 columns. geometry is using latitude and longitude in WGS84 and geometry_2249 is using the Massachusetts State Plane Coordinate System in feet.
The example above just displayed a table of transformed values, if we wanted to truly transform the geometry in the table and either create a new column or update the existing column, we would need to update the table with the transformed values. Here is an example of replacing our original geometry column with the new transformed data:
# update a DataFrame with a transformed geometry
df = df\
.withColumn("geometry", ST.transform("geometry", 2249))
If you have more questions about coordinate systems in GeoAnalytics for Fabric, take a look at the core concept documentation for Coordinate systems and transformations. This provides additional detail on coordinate systems, selecting correct coordinate systems for your data or analysis, and how they impact plotting, analyzing, and sharing data.
Hopefully this quick primer on coordinate systems has been helpful. We'll be posting additional content to help you get started, so check back on the Community for more! And please let us know what type of things you'd like to learn more about!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.