SQL: Select first row of duplicates (RESOLVED)

171
6
12-01-2020 07:54 AM
Labels (3)
New Contributor

Edit: I was able to accomplish my objective by summarizing my field, joining it back to the original dataset and selecting my desired rows that way. Thank you everyone for responding to my post.

Happy December everyone!

Could anyone suggest me a chunk of SQL code to use in Select by Attributes to select only the first row of each group of duplicates? Or select a duplicate that associated with another column's max value? 

Objective:

I have a table of 520,000 census block codes that include broadband download speeds. I have the table sorted by Census Blocks (Ascending) and then by download speeds (Descending) to list the census blocks in the order of highest download speed to the lowest (each group of duplicate census blocks are ordered from highest to lowest download speeds).

Because there are duplicates of each census block, I only want to select the census block with the highest download speed. I have the table sorted where the first census block code of each group of duplicates has the highest download speed.

My goal is to select each block code with the highest download speed. The data is stored in a file geodatabase 

Example: 

 

Block Code Download Speed 
1234 (row to select)25
12342
12342
5678 (row to select)100
56782

 

Thanks,

Jamie 

 

Edit: I do not have access to Delete Identical/Find Identical Tools

 

 

Reply
0 Kudos
6 Replies
MVP Frequent Contributor

Find Identical (Data Management)—ArcGIS Pro | Documentation

you can use a single or multiple fields to determine how identical, identical is


... sort of retired...
Reply
0 Kudos
New Contributor

I forgot to mention that I do not have the license for that tool. Thanks for the reply! 

Reply
0 Kudos
by
Regular Contributor III

For SQL, you can may also be able to use SELECT DISTINCT [field] FROM [table] ORDER BY ID. Adjust to your situation.

Reply
0 Kudos
MVP Esteemed Contributor

Regarding


@JamieHoffmann wrote:

I have a table of 1000 internet providers. Lets say three rows are listed Comcast, Comcast, Comcast. I only want to select the first of the three that appears in the table. 


I think it is better stated that you only want to select one of them, not selecting the "first" because there is no first if the records are truly duplicates.  If you want to select first based on OID, then it would be important to state that.

The SQL that is supported with the Select By ... tools varies depending on back-end data store.  What is the back-end data store, file geodatabase?

Reply
0 Kudos
New Contributor

You are right about that which I do not clear up. I did not fully clarify my objective. 

I have a table of 520,000 census block codes that include broadband download speeds. I have the table sorted by Census Blocks (Ascending) and then by download speeds (Descending) to list the census blocks in the order of highest download speed to lowest (each group of duplicate census blocks are ordered from highest to lowest download speeds).

Because there are duplicates of each census block, I only want to select the census block with the highest download speed. I have the table sorted where the first census block code of each group of duplicates has the highest download speed.

My goal is to select each block code with the highest download speed. The data is stored in a file geodatabase 

Example: 

Block Code Download Speed
1234 (row to be selected)25
1234 2
12342
5678 (row to be selected)100
56782
Reply
0 Kudos
by
Regular Contributor III

Definitely test this before relying on it, but in my response above, replace ID by Download Speed in the ORDER BY clause.
There are other SQL methods, such as TOP(n), subqueries, etc., but been a while since I've used it much and don't want to mislead you.

Reply
0 Kudos