Python string slicing fails to return the desired data.

BusinessNews · ‎01-02-2023

I am trying to search for the sum of occurances of a substring within a string:

string = 'ABCDCDC'
sub_string = 'CDC'
for i in range(len(string)-len(sub_string)):
print(string[i:len(substring)]

I am unsure why this is my output:

ABC
BC
C

Should'nt it be:

ABC
BCD
CDC
DCD
CDC

Luke_Pinner · ‎01-02-2023

I think you mean

print(string[i:i+len(sub_string)])

JohannesBierer · ‎01-02-2023

In addition, if you want to count the number of overlapping sub_strings in your string, maybe this could work:

string = 'ABCDCDC'
sub_string = 'CDC'

results = 0
sub_len = len(sub_string)
for i in range(len(string)):
    if string[i:i+sub_len] == sub_string:
        results += 1
print (results)

source:

https://stackoverflow.com/questions/8899905/count-number-of-occurrences-of-a-substring-in-a-string#c...

DanPatterson · ‎01-03-2023

Love sliding windows questions

z = np.array(list('ABCDCDC'))  # -- your base string
s = np.array(list('CDC'))  # -- sub string for query
# -- numpy magic
import numpy as np
sl_window = np.lib.stride_tricks.sliding_window_view(z, 3)  # -- windowing
whr_eq = (sl_window == s[None, :]).all(-1)  # -- find out where equal
position = np.nonzero(whr_eq)[0]  # -- at what location

sl_window
array([['A', 'B', 'C'],
       ['B', 'C', 'D'],
       ['C', 'D', 'C'],
       ['D', 'C', 'D'],
       ['C', 'D', 'C']], dtype='<U1')
position
array([2, 4], dtype=int64)

sl_window[position] 
array([['C', 'D', 'C'],
       ['C', 'D', 'C']], dtype='<U1')

len(position)
2

recorded for posterity

... sort of retired...

Anonymous User · ‎01-03-2023

Your code is doing exactly what you are telling it to do. Replace the lengths with numbers for testing and you can see it:

for i in range(7-3): # equates to for i in 4: or in each iteration: 0, 1, 2, 3 in 4:
    print(string[i:3]) # -> is going to be: string([0:3]) string([1:3]) string([2:3]) ... etc

You're not moving the ending position so it is always at position 3, or 'C'. With your loop:

ABC -> [0:3]
BC -> [1:3]
C -> [2:3]

There are other solutions as Dan posted, but to show the difference of your code and how adding the end position 'slides' the slice:

pairs = {}
for start_index in range(len(string)): # for 1 in 7:
    end_index = start_index + 3 #<- move the end index
    if len(string[start_index:end_index]) == 3: # <- check if the length of start index and end index (+3) is more than 3.
        combo = string[start_index:end_index]
        print(combo)  # <- print it
        # added for counting the items
        pairs[combo] = pairs[combo] = 1 if not pairs.get(combo) else pairs[combo] + 1
    else:
        print(f'end of string: {string[start_index:end_index]}')

print (pairs)
print(f' {sub_string} count: {pairs[sub_string]}')

Gives you the results:

ABC
BCD
CDC
DCD
CDC

end of string: DC
end of string: C

{'ABC': 1, 'BCD': 1, 'CDC': 2, 'DCD': 1}
CDC count: 2