Select to view content in your preferred language

Python string slicing fails to return the desired data.

1155
4
01-02-2023 09:29 PM
BusinessNews
Deactivated User

I am trying to search for the sum of occurances of a substring within a string:


string = 'ABCDCDC'
sub_string = 'CDC'
for i in range(len(string)-len(sub_string)):
print(string[i:len(substring)]


I am unsure why this is my output:

ABC
BC
C

Should'nt it be:

ABC
BCD
CDC
DCD
CDC

Tags (2)
0 Kudos
4 Replies
Luke_Pinner
MVP Regular Contributor

I think you mean 

print(string[i:i+len(sub_string)])
0 Kudos
JohannesBierer
Frequent Contributor

In addition, if you want to count the number of overlapping sub_strings in your string, maybe this could work:

string = 'ABCDCDC'
sub_string = 'CDC'

results = 0
sub_len = len(sub_string)
for i in range(len(string)):
    if string[i:i+sub_len] == sub_string:
        results += 1
print (results)

source:

https://stackoverflow.com/questions/8899905/count-number-of-occurrences-of-a-substring-in-a-string#c...

0 Kudos
DanPatterson
MVP Esteemed Contributor

Love sliding windows questions

z = np.array(list('ABCDCDC'))  # -- your base string
s = np.array(list('CDC'))  # -- sub string for query
# -- numpy magic
import numpy as np
sl_window = np.lib.stride_tricks.sliding_window_view(z, 3)  # -- windowing
whr_eq = (sl_window == s[None, :]).all(-1)  # -- find out where equal
position = np.nonzero(whr_eq)[0]  # -- at what location

sl_window
array([['A', 'B', 'C'],
       ['B', 'C', 'D'],
       ['C', 'D', 'C'],
       ['D', 'C', 'D'],
       ['C', 'D', 'C']], dtype='<U1')
position
array([2, 4], dtype=int64)

sl_window[position] 
array([['C', 'D', 'C'],
       ['C', 'D', 'C']], dtype='<U1')

len(position)
2

recorded for posterity


... sort of retired...
by Anonymous User
Not applicable

Your code is doing exactly what you are telling it to do. Replace the lengths with numbers for testing and you can see it:

for i in range(7-3): # equates to for i in 4: or in each iteration: 0, 1, 2, 3 in 4:
    print(string[i:3]) # -> is going to be: string([0:3]) string([1:3]) string([2:3]) ... etc

You're not moving the ending position so it is always at position 3, or 'C'. With your loop:

ABC -> [0:3]
BC -> [1:3]
C -> [2:3]

There are other solutions as Dan posted, but to show the difference of your code and how adding the end position 'slides' the slice:

pairs = {}
for start_index in range(len(string)): # for 1 in 7:
    end_index = start_index + 3 #<- move the end index
    if len(string[start_index:end_index]) == 3: # <- check if the length of start index and end index (+3) is more than 3.
        combo = string[start_index:end_index]
        print(combo)  # <- print it
        # added for counting the items
        pairs[combo] = pairs[combo] = 1 if not pairs.get(combo) else pairs[combo] + 1
    else:
        print(f'end of string: {string[start_index:end_index]}')

print (pairs)
print(f' {sub_string} count: {pairs[sub_string]}')

 

Gives you the results:

ABC
BCD
CDC
DCD
CDC

end of string: DC
end of string: C

{'ABC': 1, 'BCD': 1, 'CDC': 2, 'DCD': 1}
CDC count: 2