Is there a way to find groupings of non-consecutive numbers in a feature class? I know how to sequentially number assets in a feature class, but in this case that's not what I need to do. I just need to identify where the groups of non-consecutive numbers are so we can use those groups up.
How exactly are these "groupings" being defined? You use the term "non-consecutive" and "non-sequential" in your post, but these can be interpreted differently.
Are you looking for places where a sequence breaks? And does a series of numbers only match the grouping if there is nothing sequential on both sides, or on either? For instance, the series 1, 2, 4, 7, 8. The inner series 2, 4, 7 in non-sequential, but the members 2 and 7 both have sequential numbers before or after them. Would you consider these as part of your group, or excluded from it? If excluded, how would you identify the "skipped" numbers? Are you even interested in the gaps?
Should your data be sorted prior to looking for non-sequential values? Or are we looking for non-sequential entries using the default sort order of the table?
Good point. I should have said non-consecutive. These are just unique identifiers that start from 1 and go to X, where X is the last number that was created using attribute assistant when creating a new feature. There were times before attribute assistant was used though and sometimes we those groups (or blocks), if they are large enough, to provide to contractors when the find whatever asset, in this case manholes, in the field. They provide them with one of these numbers and it makes our lives a little easier for certain tasks in our GIS database or in CityWorks when creating work orders against the asset.
Yes, I would sort the data based on this column, we call it Facility ID, then I'm attempting to find the gaps or non-consecutive numbers.
-Mark
City of Durham, NC
The logic. It can be done by exporting the field using TableToNumPyArray and proceeding from there, but at least the sequence of events is here for someone wanting to make a field calculation from it.
# -- putting some sequences together
a0 = np.arange(5); a1 = np.arange(6, 11); a2 = np.arange(13, 20)
a = np.concatenate((a0, a1, a2))
a # -- note 5, 11, 12 are missing
array([ 0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17, 18, 19])
#
# -- calculate the sequential difference in the array/list/column and
# identify where they are
diff = a[1:] - a[:-1]
whr = np.where(diff != 1)
diff
array([1, 1, 1, 1, 2, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1])
#
# -- note the start and end values determined from the position and difference
st = a[whr[0]] # -- start of the mis-sequence
array([ 4, 10])
en = st + diff[whr[0]] # -- end determined from start and numeric difference
en
array([ 6, 13])
# -- piece the sequences together
m = [np.arange(st[i] + 1, en[i]) for i in range(st.size)]
m = [np.arange(st[i] + 1, en[i]) for i in range(st.size)]
np.concatenate(m)
array([ 5, 11, 12])
def get_missing_groups(values):
if not values:
return []
# get the missing values
val_range = range(1, max(values))
missing_vals = sorted(list(set(val_range) - set(values)))
# get the differences between the missing values
differences = [missing_vals[i] - missing_vals[i-1] for i in range(1, len(missing_vals))]
# group up
missing_groups = []
start = None
i_max = len(differences) - 1
for i, k in enumerate(missing_vals):
if start is None:
start = k
if i > i_max or differences[i] > 1:
missing_groups.append([start, k])
start = None
return missing_groups
# some tests
get_missing_groups([1, 2, 5, 6, 10, 11, 20])
# [[3, 4], [7, 9], [12, 19]]
get_missing_groups([5, 6, 10, 178])
# [[1, 4], [7, 9], [11, 177]]
get_missing_groups([])
# []
get_missing_groups(range(20))
# []
# get the groups missing in the table
values = [row[0] for row in arcpy.da.SearchCursor("Table", ["Field"])]
get_missing_groups(values)
or using a def with numpy and my previous example in table form
def skipped(tbl, fld, do_sort=False):
"""Specify path to the table and the field name and sort if needed."""
import numpy as np
from arcpy.da import TableToNumPyArray
a = TableToNumPyArray(tbl, fld, skip_nulls=True).astype('int')
if do_sort:
a.sort() # sort inplace
diff = a[1:] - a[:-1]
whr = np.where(diff != 1)[0]
st = a[whr]
en = st + diff[whr]
m = [np.arange(st[i] + 1, en[i]) for i in range(st.size)]
return np.concatenate(m).tolist()
skipped(tbl, fld)
[5, 11, 12]