Flatten nested data

214
6
4 weeks ago
DanPatterson
MVP Esteemed Contributor
2 6 214

Replace the numbers below with anything you want.

lst = [
       [[1, 2, 3], [4, 5]],
       [[1, 2]],
       [1, 2],
       [[]],
       [[1, 2, 3], [4, 5], [6, 7, 8]],
       [[[1, 2, 3], [4, 5]], [[6, 7, 8]]],
       ]

 

Flatten with recursion (a function calling itself 'recursively')

def flatten(a_list, flat_list=None):
    """Change the isinstance as appropriate.

    :  Flatten an object using recursion
    :  see: itertools.chain() for an alternate method of flattening.
    """
    if flat_list is None:
        flat_list = []
    for item in a_list:
        if isinstance(item, list):
            flatten(item, flat_list)
        else:
            flat_list.append(item)
    return flat_list

 

Simple flattening

flatten(lst)

[1, 2, 3, 4, 5, 1, 2, 1, 2, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8]

Who says you can't start out with initial values.

flatten(lst, ['a', 'b']))
['a', 'b', 1, 2, 3, 4, 5, 1, 2, 1, 2, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8]

6 Comments
HaydenWelch
MVP Regular Contributor

Love some recursive solutions! Here's a version of that flatten function that utilizes Python generators. Assuming your large list of nested values and wanted to avoid loading the whole dataset into memory, this would make sure only one item at a time is loaded in:

NOTE: Because the check here is for an Iterable, strings will be flattened as well because they are an Iterable container of bytes, you can explicitly exclude them if you want

 

from typing import (
    Any,
    Generator,
    Iterable,
)

def flatten(sequence: Iterable[Any]) -> Generator[Any, None, None]:
    for item in sequence:
        if not isinstance(item, Iterable):
            yield item
        else:
            yield from flatten(item)

 

 

Because this version lazy loads, you can access it's values in multiple ways:

 

>>> x = flatten([1,2,[3,4,[5,6,[7,8],9],10],11])
>>> next(x)
1
>>> for i in x:
...    print(i)
2
3
4
5
6
...
>>> x = flatten([1,2,[3,4,[5,6,[7,8],9],10],11])
>>> list(x)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> list(x)
[]

 

 Calling `next` will ask the generator to get the next value of the sequence it represents. As you can see that initial next call moves the pointer to the next item so iterating `x` after calling `next` on it starts at 2.

You can also just cast a generator to any container that supports Iterable arguments (list, tuple, set, deque, etc.). This will immediately exhaust the generator meaning that the next time you try to call next on it (either with an implicit sequence cast, next, for a for loop) it will return an empty sequence.

DavidSolari
MVP Regular Contributor

If you have a more regular set of nested data you can also one line it with a comprehension:

z =[[[1, 2], [3, 4]]]

// We can use this generalized form for flattening a list
// where n is the amount of nesting
// and a starts at 0 but increases by 1 every loop

// start = [$0 for $n in z
// while a < n:
//  start += for $n-(a+1) in $n-a
//  a + 1
// return start + ]

// concrete example for z, which has 2 levels of nesting
[a for c in z for b in c for a in b]
DanPatterson
MVP Esteemed Contributor

but only if you know all entities are lists of the same depth

z =[[[1, 2], [3, 4], [5, 6], [7, 8, 9], [10]]]
sum(*z, [])
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# .... but

z =[[[1, 2], [3, 4], [5, 6], [7, 8, 9], [10], [[11]], [[[12]]]]]
sum(*z, [])
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, [11], [[12]]]

# and no, sum(sum... fails
DanPatterson
MVP Esteemed Contributor

Forgot the python/numpy solution,.

What it lacks in elegance, it should be admired for its persistence

z = [[1, 2], [3, 4], [5, 6], [7, 8, 9], [10], [[11]], [[[12]]]]
z0 = sum([np.array(i).reshape(-1).tolist() for i in z], [])
z0
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

 

HaydenWelch
MVP Regular Contributor

What about going maximalist with it?

 

 

from typing import (
    Generator,
    Iterable,
    TypeVar,
    Generic,
)

T = TypeVar('T')            
class Flattener(Generic[T]):
    @classmethod
    def flatten(cls, sequence: Iterable[T]) -> Generator[T, None, None]:
        for item in sequence:
            if isinstance(item, Iterable):
                yield from Flattener.flatten(item)
            else:
                yield item
                
    def __mul__(self, other) -> list[T]:
        return list(Flattener.flatten(other))
>>> int_flat: Flattener[int] = Flattener()
>>> int_flat * [1,[2,3,[4,5]]]
[1,2,3,4,5]

 

 

Here's probably the most illegal, cursed version:

 

from typing import (
    Generator,
    Iterable,
    TypeVar,
    Generic,
)

T = TypeVar('T')            
class Flattener(Generic[T]):
    
    def flatten(self, sequence: Iterable[T]) -> Generator[T, None, None]:
        for item in sequence:
            # Iterate everything but strings (they are infinite)
            if isinstance(item, Iterable) and not isinstance(item, str):
                yield from self.flatten(item)
            
            # No Concrete Generic Filter
            elif not hasattr(self, '__orig_class__'):
                yield item
            
            # Has Concrete Generic Filter 
            elif isinstance(item, self.__orig_class__.__args__[0]):
                yield item
                
    def __mul__(self, other) -> list[T]:
        return list(self.flatten(other))
            

 

 

 

>>> str_flat = Flattener[str]()
>>> int_flat = Flattener[int]()
>>> flt_flat = Flattener[float]()

>>> seq = [1,2,[1.4, 'foo', [9, 42.0, 6], 'bar'], 'baz', 2, [9]]

>>> str_flat * seq
['foo', 'bar', 'baz']
>>> int_flat * seq
[1, 2, 9, 6, 2, 9]
>>> flt_flat * seq
[1.4, 42.0]

 

 

DanPatterson
MVP Esteemed Contributor

😂

About the Author
Retired Geomatics Instructor (also DanPatterson_Retired). Currently working on geometry projects (various) as they relate to GIS and spatial analysis. I use NumPy, python and kin and interface with ArcGIS Pro.
Labels