I need help creating a select statement for an attribute table of temperature data. The table has fields for Station_Name, Date and Max_Temp and I would like to select only those records where the temperature difference between weather stations was 20 degrees or more in the same day. I'm stumped on how to compare the 18 Max_Temp values in any single day for each of 365 Dates.
Using Summary Statistics—Help | ArcGIS for Desktop , you can get the Min. and Max. temperature of each date. Then, in the new table, you could find the temp. difference using field calculator (or using MS Excel).
Your options in situations like this depend on several factors. How are you storing the information currently? If it is stored in shapefiles or a file geodatabase, can it be stored in other data stores that support more robust SQL? How do you want to work with it afterwards? Is this a once-and-done kind of action or something you will be repeating over and over?
Thanks Joshua. The data comes from a Hosted Feature Class on arcgis.com. I'm going to try exporting it into a MySql DB and working with the data there. I'm not sure what steps to take to get the outliers - I'm thinking it can be done with subqueries, which I've never written, but I think I need to learn 🙂
After reading over your question a couple of times, I must admit I am not completely clear what you are trying to select. Since scaling from 1 day to 365 days will be straightforward, let's talk about a single day. There is what I believe you have for information:
I get a bit confused with, "select only those records where the temperature difference between weather stations was 20 degrees or more in the same day." Are you wanting to compare each weather station to the other 17? If so, do you select a pair of weather station records if the maximum temperature is 20 degree apart? If that is the case, won't the output be pairs of stations by date?
If you could mock up what you expect the results to look like, it would be helpful.
I ended up using your suggestion and pulled it into a MySql DB and queried it there, then exported the tables as CSV and imported those into the GDB. I used two subqueries to test for the upper and lower deviations from the average like so:
SELECT `ObjectID`, `Date`, `StationName`, `MaxTemp`  FROM  `DailyTemps` WHERE    `MaxTemp` > (SELECT AVG(`MaxTemp`) FROM `DailyTemps`) +10OR    `MaxTemp` < (SELECT AVG(`MaxTemp`) FROM `DailyTemps`) -10I ultimately exported two tables - 1 for extreme highs and one for extreme lows. I'll display XY locations for each table using graduated color symbology to illustrate the extremes.
I should have added that this is all for time-aware layers so the map shows the temperature changes over time.
Shame we can't add formatted code into these messages - here's a screenshot:
You can add Syntax Highlighting, including for SQL: https://community.esri.com/docs/DOC-8691-posting-code-with-syntax-highlighting-on-geonet?sr=search&s..., /blogs/dan_patterson/2016/08/14/script-formatting?sr=search&searchId=cfd1fec6-e6ee-4e2d-b5fb-67d9e22..., https://community.esri.com/people/curtvprice/blog/2014/09/25/posting-code-blocks-in-the-new-geonet?s....
I don't believe the query you provided is giving the answer you expect. As written, you are using the averages over the entire data set, not just one day, which is what I thought you were wanting. Try the following, let me know if it works for you (may not be formatted completely correctly for MySQL😞
SELECT  dt.ObjectID ,
        dt.Date ,
        dt.StationName ,
        dt.MaxTemp ,
        at.AvgMax
FROM    DailyTemps dt
        INNER JOIN ( SELECT Date ,
                            AVG(MaxTemp) AS AvgMax
                     FROM   DailyTemps
                     GROUP BY Date
                   ) AS at ON dt.Date = at.Date
WHERE   ABS(dt.MaxTemp - at.AvgMax) > 10