Select to view content in your preferred language

Soundalike, rhyming and similar words

360
3
02-13-2024 03:34 PM
TonyAlmeida
Occasional Contributor III

I'm working on developing a script tool that allows users to input text and search against a table and layer. Currently, I've achieved functionality to return very similar words, but I need to enhance it to include soundalike, rhyming, and similar words for a given input. For instance, if the input is "Saddle," the tool should return words like "Battle" and "Cattle" as they are soundalike, rhyming, or similar matches.

 

The message return in this case is the following,

'saddle' is similar, sound-alike, or rhymes with 'a' in stname field
'saddle' is similar, sound-alike, or rhymes with 'adele' in stname field
'saddle' is similar, sound-alike, or rhymes with 'crusader' in stname field
'saddle' is similar, sound-alike, or rhymes with 'd' in stname field
'saddle' is similar, sound-alike, or rhymes with 'e' in stname field
'saddle' is similar, sound-alike, or rhymes with 'leo' in stname field
'saddle' is similar, sound-alike, or rhymes with 'saddle' in stname field
'saddle' is similar, sound-alike, or rhymes with 'saddle horn' in stname field
'saddle' is similar, sound-alike, or rhymes with 'saddle mountain' in stname field
'saddle' is similar, sound-alike, or rhymes with 'saddle peak' in stname field
'saddle' is similar, sound-alike, or rhymes with 'saddle up' in stname field
'saddle' is similar, sound-alike, or rhymes with 'saddlehorn' in stname field
'saddle' is similar, sound-alike, or rhymes with 'saddleman ranch' in stname field
'saddle' is similar, sound-alike, or rhymes with 'sand' in stname field
'saddle' is similar, sound-alike, or rhymes with 'sawtelle' in stname field
'saddle' is similar, sound-alike, or rhymes with 'southdale' in stname field
'saddle' is similar, sound-alike, or rhymes with 'southwell' in stname field
'saddle' is similar, sound-alike, or rhymes with 'steele' in stname field
'saddle' is similar, sound-alike, or rhymes with 'stella' in stname field
'saddle' is similar, sound-alike, or rhymes with 'stoll' in stname field
'saddle' is similar, sound-alike, or rhymes with 'n saddlebrook way' in fullstname field
'saddle' is similar, sound-alike, or rhymes with 'saddle ave' in fullstname field
'saddle' is similar, sound-alike, or rhymes with 'saddle horn ln' in fullstname field
'saddle' is similar, sound-alike, or rhymes with 'saddle mountain ave' in fullstname field
'saddle' is similar, sound-alike, or rhymes with 'saddle mountain way' in fullstname field
'saddle' is similar, sound-alike, or rhymes with 'saddle peak ave' in fullstname field
'saddle' is similar, sound-alike, or rhymes with 'saddle up ln' in fullstname field
'saddle' is similar, sound-alike, or rhymes with 'saddleback ln' in fullstname field
'saddle' is similar, sound-alike, or rhymes with 'saddlehorn way' in fullstname field
'saddle' is similar, sound-alike, or rhymes with 'saddleman ranch ct' in fullstname field
Similar words count: 21
Rhyming words count: 7

 

Code,

 

import arcpy
from fuzzywuzzy import fuzz
from SoundsLike.SoundsLike import Search

class Soundex:
    def __init__(self):
        self.soundex_dict = self._build_soundex_dict()

    def _build_soundex_dict(self):
        soundex_dict = {}
        for char in 'bfpv':
            soundex_dict[char] = '1'
        for char in 'cgjkqsxz':
            soundex_dict[char] = '2'
        for char in 'dt':
            soundex_dict[char] = '3'
        for char in 'l':
            soundex_dict[char] = '4'
        for char in 'mn':
            soundex_dict[char] = '5'
        for char in 'r':
            soundex_dict[char] = '6'
        return soundex_dict

    def get_soundex(self, word):
        if not word:
            return None
        word = word.lower()
        soundex_code = word[0]
        for char in word[1:]:
            soundex_char = self.soundex_dict.get(char)
            if soundex_char and soundex_char != soundex_code[-1]:
                soundex_code += soundex_char
        soundex_code = soundex_code.ljust(4, '0')[:4]
        return soundex_code

    def are_rhyming(self, word1, word2):
        soundex1 = self.get_soundex(word1)
        soundex2 = self.get_soundex(word2)
        return soundex1 == soundex2

def soundex(name, length=4):
    """ soundex module conforming to Odell-Russell algorithm """

    # digits holds the soundex values for the alphabet
    soundex_digits = '01230120022455012623010202'
    sndx = ''
    fc = ''

    # Translate letters in name to soundex digits
    for c in name.upper():
        if c.isalpha():
            if not fc: fc = c   # Remember first letter
            d = soundex_digits[ord(c)-ord('A')]
            # Duplicate consecutive soundex digits are skipped
            if not sndx or (d != sndx[-1]):
                sndx += d

    # Replace first digit with first letter
    sndx = fc + sndx[1:]

    # Remove all 0s from the soundex code
    sndx = sndx.replace('0', '')

    # Return soundex code truncated or 0-padded to length characters
    return (sndx + (length * '0'))[:length]

# Initialize Soundex
soundex_instance = Soundex()

# Get the search text as input parameter
search_text = arcpy.GetParameterAsText(0).lower()

# Set the workspace
arcpy.env.workspace = "C:/GIS Folder/Addressing.gdb"

# Input feature class and table
feature_class = "C:/GIS Folder/Addressing.gdb/Roads"
table = "C:/GIS Folder/Addressing.gdb/Road_Names_Table"

# Create a list to store the comparison results
comparison_results = []

# Create a dictionary to store the counts of similar, sound-alike, and rhyming words
word_counts = {'similar': 0, 'soundalike': 0, 'rhyming': 0}

# Get a list of unique street names from the feature class and sort them
unique_stnames = sorted(set(row[0].lower() for row in arcpy.da.SearchCursor(feature_class, "FENAME")))

# Get a list of unique full street names from the table and sort them, filtering out None values
unique_fullstnames = sorted(set(row[0].lower() for row in arcpy.da.SearchCursor(table, "FULLSTNAME") if row[0]))

# Find perfect homophones for the search text
homophones = Search.perfectHomophones(search_text)

# Compare the search text with stname values
for stname in unique_stnames:
    if soundex_instance.are_rhyming(search_text, stname) or fuzz.partial_ratio(search_text, stname) >= 70 or stname in homophones:
        comparison_results.append(f"'{search_text}' is similar, sound-alike, or rhymes with '{stname}' in stname field")
        if soundex_instance.are_rhyming(search_text, stname):
            word_counts['rhyming'] += 1
        elif fuzz.partial_ratio(search_text, stname) >= 80:
            word_counts['similar'] += 1
        elif stname in homophones:
            word_counts['soundalike'] += 1

# Compare the search text with fullstname values
for fullstname in unique_fullstnames:
    if soundex_instance.are_rhyming(search_text, fullstname) or fuzz.partial_ratio(search_text, fullstname) >= 70 or fullstname in homophones:
        comparison_results.append(f"'{search_text}' is similar, sound-alike, or rhymes with '{fullstname}' in fullstname field")
        if soundex_instance.are_rhyming(search_text, fullstname):
            word_counts['rhyming'] += 1
        elif fuzz.partial_ratio(search_text, fullstname) >= 80:
            word_counts['similar'] += 1
        elif fullstname in homophones:
            word_counts['soundalike'] += 1

# Print the comparison results using arcpy.AddMessage()
for result in comparison_results:
    arcpy.AddMessage(result)

# Print if similar, soundalike, or rhyming words were found
if word_counts['similar'] > 0:
    arcpy.AddMessage(f"Similar words count: {word_counts['similar']}")
if word_counts['soundalike'] > 0:
    arcpy.AddMessage(f"Sound-alike words count: {word_counts['soundalike']}")
if word_counts['rhyming'] > 0:
    arcpy.AddMessage(f"Rhyming words count: {word_counts['rhyming']}")

 

 

3 Replies
TonyAlmeida
Occasional Contributor III

I just realized that my question was truncated. I'm having trouble getting words that rhyme or sound similar returned by the tool. For instance, if I input "Saddle," the tool should return words like "Battle" and "Cattle" since they are soundalike, rhyming, or similar matches. However, it's not doing that. I know that in the table there are words like "Battle" and "Cattle." How can I get results returned in this manner?

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

This falls squarely in the realm of Natural Language Processing (NLP), do you really want to be rolling your own vs using existing libraries?

0 Kudos
TonyAlmeida
Occasional Contributor III

I would prefer using existing libraries.

0 Kudos