insert takes 2 arguments

In python 3.4 and 2.7 I get an error "insert takes 2 arguments". This PR fixes it, but I had to assume it was intended to be an append.
BUGFIX: Dental and Stop special keys don't match multichar sounds like tʃ
2026-07-05 17:30:28 +08:00 · 2016-10-20 08:27:40 +08:00 · 2016-07-20 16:35:44 +02:00 · 2016-07-18 17:10:22 +02:00 · 2016-07-18 17:08:02 +02:00 · 2016-07-18 17:06:50 +02:00
8 changed files with 593 additions and 169 deletions
@@ -3,6 +3,9 @@
 pysle
 ---------
 .. image:: https://img.shields.io/badge/license-MIT-blue.svg?
   :target: http://opensource.org/licenses/MIT
 Pronounced like 'p' + 'isle'.
 An interface for the ILSEX (international speech lexicon) dictionary, 
@@ -11,26 +14,96 @@ pronunciations (e.g. a list of phones someone said versus a standard or
 canonical dictionary pronunciation). 
 .. sectnum::
 .. contents::
 Common Use Cases
 ================
 What can you do with this library?
 - look up the list of phones and syllables for canonical pronunciations 
  of a word::
    pysle.isletool.LexicalTool.lookup('cat')
 - map an actual pronunciation to a dictionary pronunciation (can be used 
  to automatically find speech errors)::
    pysle.pronunciationtools.findClosestPronunciation(isleDict, 'cat', ['k', 'æ',])
 - automatically syllabify a praat textgrid containing words and phones 
  (e.g. force-aligned text) -- requires my 
  `praatIO <https://github.com/timmahrt/praatIO>`_ library::
    pysle.syllabifyTextgrid(isleDict, praatioTextgrid, "words", "phones")
 - search for words based on pronunciation::
    e.g. Words that start with a sound, or have a sound word medially, or 
    in stressed vowel position, etc.
    see /tests/dictionary_search.py
 Major revisions
 ================
 Ver 1.4 (July 9, 2016)
 - added search functionality
 - ported code to use the new unicode IPA-based isledict
  (the old one was ascii)
 Ver 1.3 (March 15, 2016)
 - added indicies for stressed vowels
 Ver 1.2 (June 20, 2015)
 - Python 3.x support
 Ver 1.1 (January 30, 2015)
 - word lookup ~65 times faster
 Ver 1.0 (October 23, 2014)
 - first public release.
 Requirements
 ================
 - Before you use this library (before or after installing it) you will need
-  to download the ILSEX dictionary.  It can be downloaded here:
+  to download the ILSEX dictionary.  It can be downloaded here under the
  section 'English' linked under the text 'English Pronlex'
  (with a file name of ISLEdict.txt):
-  `ISLEX project page <http://www.isle.illinois.edu/sst/data/dict/>`_
+  `ISLEX project page <http://isle.illinois.edu/sst/data/g2ps/>`_
  `Direct link to the ISLEX file used in this project
-  <http://www.isle.illinois.edu/sst/data/dict/islev2.txt)>`_
+  <http://isle.illinois.edu/sst/data/g2ps/English/ISLEdict.txt>`_ (ISLEdict.txt)
 - ``Python 2.7.*`` or above
 - ``Python 3.3.*`` or above
 - The `praatIO <https://github.com/timmahrt/praatIO>`_ library is required IF 
  you want to use the textgrid functionality.  It is not required 
  for normal use.
 Installation
 ================
-From a command-line shell, navigate to the directory this is located in 
+If you on Windows, you can use the installer found here (check that it is up to date though)
-and type::
+`Windows installer <http://www.timmahrt.com/python_installers>`_
-	python setup.py install
+Otherwise, to manually install, after downloading the source from github, from a command-line shell, navigate to the directory containing setup.py and type::
    python setup.py install
 If python is not in your path, you'll need to enter the full path e.g.::
@@ -45,7 +118,7 @@ Here is a typical common usage::
    from pysle import isle
    isleDict = isle.LexicalTool('C:\islev2.dict')
    print isleDict.lookup('catatonic')[0] # Get the first pronunciation
-    >> [['kh', '@,'], ['t_(', '&'], ['th', "A'"], ['n', 'I', 'kh']] [2]
+    >> [['k', 'ˌæ'], ['t˺', 'ə'], ['t', 'ˈɑ'], ['n', 'ɪ', 'k']] [2, 0]
 and another::
@@ -53,7 +126,7 @@ and another::
    from psyle import pronunciationTools
    searchWord = 'another'
-    anotherPhoneList = ['n', '@', 'th', 'r'] # Actually produced
+    anotherPhoneList = ['n', '@', 'th', 'r'] # Actually produced (ASCII or IPA ok here)
    returnList = pronunciationTools.findBestSyllabification(isleDict, 
                                                            searchWord, 
@@ -61,7 +134,27 @@ and another::
    print syllableList
    >> [["''"], ['n', '@'], ['th', 'r']]
 stressedSyllable, syllableList, syllabification, stressedIndex = returnList
-Please see \test for example usage
+Please see \\examples for example usage
 Citing pysle
 ===============
 Pysle is general purpose coding and doesn't need to be cited
 (you should cite the
 `ISLEX project <http://isle.illinois.edu/sst/data/g2ps/>`_
 instead) but if you would like to, it can be cited like so:
 Tim Mahrt. Pysle. https://github.com/timmahrt/pysle, 2016.
 Acknowledgements
 ================
 Development of Pysle was possible thanks to NSF grant **IIS 07-03624**
 to Jennifer Cole and Mark Hasegawa-Johnson, NSF grant **BCS 12-51343**
 to Jennifer Cole, José Hualde, and Caroline Smith, and
 to the A*MIDEX project (n° **ANR-11-IDEX-0001-02**) to James Sneed German
 funded by the Investissements d'Avenir French Government program, managed
 by the French National Research Agency (ANR).
@@ -1,84 +1,321 @@
 #encoding: utf-8
 '''
 Created on Oct 11, 2012
@author: timmahrt
 '''
 import io
 import re
-vowelList = ['a', '@', 'e', 'i', 'o', 'u', '^', '&', '>',]
+
 charList = [u'#', u'.', u'aʊ', u'b', u'd', u'dʒ', u'ei', u'f', u'g',
            u'h', u'i', u'j', u'k', u'l', u'm', u'n', u'oʊ', u'p',
            u'r', u's', u't', u'tʃ', u'u', u'v', u'w', u'z', u'æ',
            u'ð', u'ŋ', u'ɑ', u'ɑɪ', u'ɔ', u'ɔi', u'ə', u'ɚ', u'ɛ', u'ɝ',
            u'ɪ', u'ɵ', u'ɹ', u'ʃ', u'ʊ', u'ʒ', u'æ', u'ʌ', ]
 diacriticList = [u'˺', u'ˌ', u'̩', u'̃', ]
 vowelList = [u'aʊ', u'ei', u'i', u'oʊ', u'u', u'æ',
             u'ɑ', u'ɑɪ', u'ɔ', u'ɔi', u'ə', u'ɚ', u'ɛ', u'ɝ',
             u'ɪ', u'ʊ', u'ʌ', ]
 def isVowel(char):
    return any([vowel in char for vowel in vowelList])
 def sequenceMatch(matchChar, searchStr):
    return matchChar in searchStr
 class WordNotInISLE(Exception):
    def __init__(self, word):
        super(WordNotInISLE, self).__init__()
        self.word = word
    def __str__(self):
-        return "Word '%s' not in ISLE dictionary.  Please add it to continue." % self.word
+        return ("Word '%s' not in ISLE dictionary.  "
-
+                "Please add it to continue." % self.word)
 class LexicalTool():
    def __init__(self, islePath):
        self.islePath = islePath
-        self.data = None
+        self.data = self._buildDict()
        self.pronDict = None
    def _buildDict(self):
        '''
        Builds the isle textfile into a dictionary for fast searching
        '''
        lexDict = {}
        with io.open(self.islePath, "r", encoding='utf-8') as fd:
            wordList = [line.rstrip('\n') for line in fd]
        for row in wordList:
            word, pronunciation = row.split(" ", 1)
            word = word.split("(")[0]
            lexDict.setdefault(word, [])
            lexDict[word].append(pronunciation)
        return lexDict
    def lookup(self, word):
        '''
        Lookup a word and receive a list of syllables and stressInfo
        '''
        # All words must be lowercase with no extraneous whitespace
        word = word.lower()
        word = word.strip()
-        # Find indicies in the dictionary
+        pronList = self.data.get(word, None)
-        if self.data == None:
+        if pronList is None:
-            self.data = open(self.islePath, "r").read()
+            raise WordNotInISLE(word)
        else:
            pronList = [_parsePronunciation(pronunciationStr)
                        for pronunciationStr in pronList]
-        wordList = []
+        return pronList
-        searchIndex = 0
+
    def search(self, matchStr, numSyllables=None, wordInitial='ok',
               wordFinal='ok', spanSyllable='ok', stressedSyllable='ok',
               multiword='ok'):
        return search(self.data.items(), matchStr, numSyllables=numSyllables,
                      wordInitial=wordInitial, wordFinal=wordFinal,
                      spanSyllable=spanSyllable,
                      stressedSyllable=stressedSyllable,
                      multiword=multiword)
 def _prepRESearchStr(matchStr, wordInitial='ok', wordFinal='ok',
                     spanSyllable='ok', stressedSyllable='ok'):
    '''
    Prepares a user's RE string for a search
    '''
    # Protect sounds that are two characters
    # After this we can assume that each character represents a sound
    # (We'll revert back when we're done processing the RE)
    replList = [(u'ei', u'9'), (u'tʃ', u'='), (u'oʊ', u'~'),
                (u'dʒ', u'@'), (u'aʊ', u'%'), (u'ɑɪ', u'&'),
                (u'ɔi', u'$')]
    # Add to the replList
    currentReplNum = 0
    startI = 0
    for left, right in (('(', ')'), ('[', ']')):
        while True:
            # (The +1 skips over the "\n" which marks the start of every word)
            startIndex = self.data.find("\n"+word + "(", searchIndex) + 1
            # find() returns -1 if it does not find anything, but
            #    note that we added 1 to the return value
            try:
-                assert(startIndex != 0)
+                i = matchStr.index(left, startI)
-            except AssertionError:
+            except ValueError:
-                if searchIndex == 0:
+                break
-                    raise WordNotInISLE(word)
+            j = matchStr.index(right, i) + 1
-                else:
+            replList.append((matchStr[i:j], str(currentReplNum)))
-                    break
+            currentReplNum += 1
            startI = j
-            endIndex = self.data.find("\n", startIndex)
+    for charA, charB in replList:
        matchStr = matchStr.replace(charA, charB)
-            searchIndex = endIndex
+    # Characters to check between all other characters
-            wordList.append((startIndex, endIndex))
+    # Don't check between all other characters if the character is already
    # in the search string or
    interleaveStr = None
    stressOpt = (stressedSyllable == 'ok' or stressedSyllable == 'only')
    spanOpt = (spanSyllable == 'ok' or spanSyllable == 'only')
    if stressOpt and spanOpt:
        interleaveStr = u"\.?ˈ?"
    elif stressOpt:
        interleaveStr = u"ˈ?"
    elif spanOpt:
        interleaveStr = u"\.?"
-        returnList = []
+    if interleaveStr is not None:
-        for startIndex, endIndex in wordList:
+        matchStr = interleaveStr.join(matchStr)
            isleWord = self.data[startIndex:endIndex]
            syllableTxt = isleWord.split("#")[1].strip()
            syllableList = [x for x in syllableTxt.split(' . ')]
-            # Find stress
+    # Setting search boundaries
-            stressList = []
+    # We search on '[^\.#]' and not '.' so that the search doesn't span
-            for i, syllable in enumerate(syllableList):
+    # multiple syllables or words
-                # Primary stress
+    if wordInitial == 'only':
-                if "'" in syllable:
+        matchStr = u'#' + matchStr
-                    stressList.insert(0, i)
+    elif wordInitial == 'no':
-                # Secondary stress
+        # Match the closest preceeding syllable.  If there is none, look
-                elif '"' in syllable:
+        # for word boundary plus at least one other character
-                    stressList.append(i)
+        matchStr = u'(?:\.[^\.#]*?|#[^\.#]+?)' + matchStr
    else:
        matchStr = u'[#\.][^\.#]*?' + matchStr
-            syllableList = [x.split(" ") for x in syllableList]
+    if wordFinal == 'only':
-            returnList.append((syllableList, stressList))
+        matchStr = matchStr + u'#'
    elif wordFinal == 'no':
        matchStr = matchStr + u"(?:[^\.#]*?\.|[^\.#]+?#)"
    else:
        matchStr = matchStr + u'[^\.#]*?[#\.]'
-        return returnList
+    # For sounds that are designated two characters, prevent
    # detecting those sounds if the user wanted a sound
    # designated by one of the contained characters
    # Forward search ('a' and not 'ab')
    insertList = []
    for charA, charB in [(u'e', u'i'), (u't', u'ʃ'), (u'd', u'ʒ'),
                         (u'o', u'ʊ'), (u'a', u'ʊ|ɪ'), (u'ɔ', u'i'), ]:
        startI = 0
        while True:
            try:
                i = matchStr.index(charA, startI)
            except ValueError:
                break
            if matchStr[i + 1] != charB:
                forwardStr = u'(?!%s)' % charB
 #                 matchStr = matchStr[:i + 1] + forwardStr + matchStr[i + 1:]
                startI = i + 1 + len(forwardStr)
                insertList.append((i + 1, forwardStr))
    # Backward search ('b' and not 'ab')
    for charA, charB in [(u't', u'ʃ'), (u'd', u'ʒ'),
                         (u'a|o', u'ʊ'), (u'e|ɔ', u'i'), (u'ɑ' u'ɪ'), ]:
        startI = 0
        while True:
            try:
                i = matchStr.index(charB, startI)
            except ValueError:
                break
            if matchStr[i - 1] != charA:
                backStr = u'(?<!%s)' % charA
 #                 matchStr = matchStr[:i] + backStr + matchStr[i:]
                startI = i + 1 + len(backStr)
                insertList.append((i, backStr))
    insertList.sort()
    for i, insertStr in insertList[::-1]:
        matchStr = matchStr[:i] + insertStr + matchStr[i:]
    # Revert the special sounds back from 1 character to 2 characters
    for charA, charB in replList:
        matchStr = matchStr.replace(charB, charA)
    # Replace special characters
    replDict = {"D": u"(?:t(?!ʃ)|d(?!ʒ)|[sz])",  # dentals
                "F": u"[ʃʒfvszɵðh]",  # fricatives
                "S": u"(?:t(?!ʃ)|d(?!ʒ)|[pbkg])",  # stops
                "N": u"[nmŋ]",  # nasals
                "R": u"[rɝɚ]",  # rhotics
                "V": u"(?:aʊ|ei|oʊ|ɑɪ|ɔi|[iuæɑɔəɛɪʊʌ]):?",  # vowels
                "B": u"\.",  # syllable boundary
                }
    for char, replStr in replDict.items():
        matchStr = matchStr.replace(char, replStr)
    return matchStr
 def search(searchList, matchStr, numSyllables=None, wordInitial='ok',
           wordFinal='ok', spanSyllable='ok', stressedSyllable='ok',
           multiword='ok'):
    '''
    Searches for matching words in the dictionary with regular expressions
    wordInitial, wordFinal, spanSyllable, stressSyllable, and multiword
    can take three different values: 'ok', 'only', or 'no'.
    Special search characters:
    'D' - any dental; 'F' - any fricative; 'S' - any stop
    'V' - any vowel; 'N' - any nasal; 'R' - any rhotic
    '#' - word boundary
    'B' - syllable boundary
    '.' - anything
    For advanced queries:
    Regular expression syntax applies, so if you wanted to search for any
    word ending with a vowel or rhotic, matchStr = '(?:VR)#', '[VR]#', etc.
    '''
    # Run search for words
    matchStr = _prepRESearchStr(matchStr, wordInitial, wordFinal,
                                spanSyllable, stressedSyllable)
    compiledRE = re.compile(matchStr)
    retList = []
    for word, pronList in searchList:
        newPronList = []
        for pron in pronList:
            searchPron = pron.replace(",", "").replace(" ", "")
            # Ignore diacritics for now:
            for diacritic in diacriticList:
                if diacritic not in matchStr:
                    searchPron = searchPron.replace(diacritic, "")
            if numSyllables is not None:
                if numSyllables != searchPron.count('.') + 1:
                    continue
            # Is this a compound word?
            if multiword == 'only':
                if searchPron.count('#') == 2:
                    continue
            elif multiword == 'no':
                if searchPron.count('#') > 2:
                    continue
            matchList = compiledRE.findall(searchPron)
            if len(matchList) > 0:
                if stressedSyllable == 'only':
                    if all([u"ˈ" not in match for match in matchList]):
                        continue
                if stressedSyllable == 'no':
                    if all([u"ˈ" in match for match in matchList]):
                        continue
                # For syllable spanning, we check if there is a syllable
                # marker inside (not at the border) of the match.
                if spanSyllable == 'only':
                    if all(["." not in txt[1:-1] for txt in matchList]):
                        continue
                if spanSyllable == 'no':
                    if all(["." in txt[1:-1] for txt in matchList]):
                        continue
                newPronList.append(pron)
        if len(newPronList) > 0:
            retList.append((word, newPronList))
    retList.sort()
    return retList
 def _parsePronunciation(pronunciationStr):
    '''
    Parses the pronunciation string
    Returns the list of syllables and a list of primary and
    secondary stress locations
    '''
    syllableTxt = pronunciationStr.split("#")[1].strip()
    syllableList = [x.split() for x in syllableTxt.split(' . ')]
    # Find stress
    stressedSyllableList = []
    stressedPhoneList = []
    for i, syllable in enumerate(syllableList):
        for j, phone in enumerate(syllable):
            if u"ˈ" in phone:
                stressedSyllableList.insert(0, i)
                stressedPhoneList.insert(0, j)
                break
            elif u'ˌ' in phone:
                stressedSyllableList.append(i)
                stressedPhoneList.append(j)
    return syllableList, stressedSyllableList, stressedPhoneList
 def getNumPhones(isleDict, label, maxFlag):
@@ -94,23 +331,27 @@ def getNumPhones(isleDict, label, maxFlag):
        phoneListOfLists = isleDict.lookup(word)
        syllableCountList = []
-        for syllableList, stressIndex in phoneListOfLists:
+        for row in phoneListOfLists:
            syllableList = row[0]
            syllableCountList.append(len(syllableList))
        # In ISLE, there can be multiple pronunciations for each word
        # as we have no reason to believe one pronunciation is more
        # likely than another, we take the average of all of them
        phoneCountList = []
-        for syllableList, stressIndex in phoneListOfLists:
+        for row in phoneListOfLists:
-            phoneCountList.append(len([phon for phoneList in syllableList for phon in phoneList]))
+            syllableList = row[0]
            phoneCountList.append(len([phon for phoneList in syllableList for
                                       phon in phoneList]))
        # The average number of phones for all possible pronunciations
        #    of this word
-        if maxFlag == True:
+        if maxFlag is True:
            syllableCount += max(syllableCountList)
            phoneCount += max(phoneCountList)
        else:
-            syllableCount += sum(syllableCountList) / float(len(syllableCountList))
+            syllableCount += (sum(syllableCountList) /
                              float(len(syllableCountList)))
            phoneCount += sum(phoneCountList) / float(len(phoneCountList))
    return syllableCount, phoneCount
@@ -131,6 +372,3 @@ def findOODWords(isleDict, wordList):
    oodList.sort()
    return oodList
@@ -1,16 +1,18 @@
 #encoding: utf-8
 '''
 Created on Oct 22, 2014
@author: tmahrt
 '''
 class OptionalFeatureError(ImportError):
    def __str__(self):
        return "ERROR: You must have praatio installed to use pysle.praatTools"
 try:
-    import praatio
+    from praatio import tgio
 except ImportError:
    raise OptionalFeatureError()
@@ -34,11 +36,12 @@ def syllabifyTextgrid(isleDict, tg, wordTierName, phoneTierName,
    wordTier = tg.tierDict[wordTierName]
    phoneTier = tg.tierDict[phoneTierName]
-    if skipLabelList == None:
+    if skipLabelList is None:
        skipLabelList = []
    syllableEntryList = []
-    tonicEntryList = []
+    tonicSEntryList = []
    tonicPEntryList = []
    for start, stop, word in wordTier.entryList:
        if word in skipLabelList:
@@ -46,28 +49,43 @@ def syllabifyTextgrid(isleDict, tg, wordTierName, phoneTierName,
        subPhoneTier = phoneTier.crop(start, stop, True, False)[0]
-        phoneList = [phone for startP, endP, phone in subPhoneTier.entryList if phone != '']
+        # entry = (start, stop, phone)
        phoneList = [entry[2] for entry in subPhoneTier.entryList
                     if entry[2] != '']
        try:
            returnList = pronunciationtools.findBestSyllabification(isleDict,
                                                                    word,
                                                                    phoneList)
        except isletool.WordNotInISLE:
-            print "Word ('%s') not is isle -- skipping syllabification" % word
+            print("Word ('%s') not is isle -- skipping syllabification" % word)
            continue
        except (pronunciationtools.NullPronunciationError):
-            print "Word ('%s') has no provided pronunciation" % word
+            print("Word ('%s') has no provided pronunciation" % word)
            continue
-        stressedSyllable, syllableList, syllabification, stressIndexList = returnList
+        syllableList = returnList[1]
        stressedSyllableIndexList = returnList[3]
        stressedPhoneIndexList = returnList[4]
        flattenedPhoneIndexList = returnList[5]
        try:
            stressI = stressedSyllableIndexList[0]
            stressJ = stressedPhoneIndexList[0]
        except IndexError:
            stressI = None  # Function word probably
            stressJ = None  #
        if stressI is not None:
            syllableList[stressI][stressJ] += u"ˈ"
        i = 0
-#         print syllableList
+#         print(syllableList)
        for k, syllable in enumerate(syllableList):
            # Create the syllable tier entry
            j = len(syllable)
-            stubEntryList = subPhoneTier.entryList[i:i+j]
+            stubEntryList = subPhoneTier.entryList[i:i + j]
            i += j
            # The whole syllable was deleted
@@ -76,29 +94,32 @@ def syllabifyTextgrid(isleDict, tg, wordTierName, phoneTierName,
            syllableStart = stubEntryList[0][0]
            syllableEnd = stubEntryList[-1][1]
-            label = "-".join([phone for start, end, phone in stubEntryList])
+            label = "-".join([entry[2] for entry in stubEntryList])
-            syllableEntryList.append( (syllableStart, syllableEnd, label) )
+            syllableEntryList.append((syllableStart, syllableEnd, label))
-            # Create the tonic tier entry
+            # Create the tonic syllable tier entry
-            try:
+            if k == stressI:
-                stressIndex = stressIndexList[0]
+                tonicSEntryList.append((syllableStart, syllableEnd, 'T'))
            except IndexError:
                stressIndex = None # Function word probably
-            tonicLabel = ''
+            # Create the tonic phone tier entry
-            if k == stressIndex:
+            if k == stressI:
-                tonicLabel = 'T'
+                syllablePhoneTier = phoneTier.crop(syllableStart, syllableEnd,
                                                   True, False)[0]
-            tonicEntryList.append( (syllableStart, syllableEnd, tonicLabel) )
+                phoneList = [entry for entry in syllablePhoneTier.entryList
                             if entry[2] != '']
                phoneStart, phoneEnd = phoneList[stressJ][:2]
                tonicPEntryList.append((phoneStart, phoneEnd, 'T'))
    # Create a textgrid with the two syllable-level tiers
-    syllableTier = praatio.TextgridTier("syllable", syllableEntryList, praatio.INTERVAL_TIER)
+    syllableTier = tgio.IntervalTier("syllable", syllableEntryList)
-    tonicTier = praatio.TextgridTier('tonic', tonicEntryList, praatio.INTERVAL_TIER)
+    tonicSTier = tgio.IntervalTier('tonicSyllable', tonicSEntryList)
    tonicPTier = tgio.IntervalTier('tonicVowel', tonicPEntryList)
-    syllableTG = praatio.Textgrid()
+    syllableTG = tgio.Textgrid()
    syllableTG.addTier(syllableTier)
-    syllableTG.addTier(tonicTier)
+    syllableTG.addTier(tonicSTier)
    syllableTG.addTier(tonicPTier)
    return syllableTG
@@ -1,3 +1,4 @@
 #encoding: utf-8
 '''
 Created on Oct 15, 2014
@@ -9,10 +10,10 @@ import itertools
 from pysle import isletool
 class NullPronunciationError(Exception):
    def __init__(self, word):
        super(NullPronunciationError, self).__init__()
        self.word = word
    def __str__(self):
@@ -49,7 +50,7 @@ def _lcs(xs, ys):
        ll_b = _lcs_lens(xb, ys)
        ll_e = _lcs_lens(xe[::-1], ys[::-1])
        _, k = max((ll_b[j] + ll_e[ny - j], j)
-                    for j in range(ny + 1))
+                   for j in range(ny + 1))
        yb, ye = ys[:k], ys[k:]
        return _lcs(xb, yb) + _lcs(xe, ye)
@@ -58,14 +59,13 @@ def _prepPronunciation(phoneList):
    retList = []
    for phone in phoneList:
        if 'r' in phone:
-            phone = ['r',]
+            phone = ['r', ]
        try:
-            phone = phone[0] # Only represent the str by its first letter
+            phone = phone[0]  # Only represent the string by its first letter
            phone = phone.lower()
        except IndexError:
            raise NullPhoneError()
        phone = phone.lower()
        if phone in isletool.vowelList:
            phone = 'V'
        retList.append(phone)
@@ -85,14 +85,14 @@ def _adjustSyllabification(adjustedPhoneList, syllableList):
    retSyllableList = []
    for syllable in syllableList:
        j = len(syllable)
-        tmpPhoneList = adjustedPhoneList[i:i+j]
+        tmpPhoneList = adjustedPhoneList[i:i + j]
        numBlanks = -1
        phoneList = tmpPhoneList[:]
        while numBlanks != 0:
            numBlanks = tmpPhoneList.count("''")
            if numBlanks > 0:
-                tmpPhoneList = adjustedPhoneList[i+j:i+j+numBlanks]
+                tmpPhoneList = adjustedPhoneList[i + j:i + j + numBlanks]
                phoneList.extend(tmpPhoneList)
                j += numBlanks
@@ -116,27 +116,32 @@ def _findBestPronunciation(isleDict, wordText, aPron):
    isleWordList = isleDict.lookup(wordText)
-    aP = _prepPronunciation(aPron) # Mapping to simplified phone inventory
+    aP = _prepPronunciation(aPron)  # Mapping to simplified phone inventory
-    origPronDict = dict((newPron,oldPron) for newPron, oldPron in zip(aP, aPron))
+    origPronDict = dict((newPron, oldPron)
                        for newPron, oldPron in zip(aP, aPron))
    numDiffList = []
    withStress = []
    i = 0
    alignedSyllabificationList = []
    alignedActualPronunciationList = []
-    for syllableList, stressList in isleWordList:
+    for wordTuple in isleWordList:
        syllableList = wordTuple[0]  # syllableList, stressList
        iP = [phone for phoneList in syllableList for phone in phoneList]
        iP = _prepPronunciation(iP)
        alignedIP, alignedAP = alignPronunciations(iP, aP)
-        alignedAP = [origPronDict.get(phon, "''") for phon in alignedAP] # Remapping to actual phones
+        
        # Remapping to actual phones
        alignedAP = [origPronDict.get(phon, "''") for phon in alignedAP]
        alignedActualPronunciationList.append(alignedAP)
        # Adjusting the syllabification for differences between the dictionary
        # pronunciation and the actual pronunciation
-        alignedSyllabification = _adjustSyllabification(alignedIP, syllableList)
+        alignedSyllabification = _adjustSyllabification(alignedIP,
                                                        syllableList)
        alignedSyllabificationList.append(alignedSyllabification)
        # Count the number of misalignments between the two
@@ -147,7 +152,7 @@ def _findBestPronunciation(isleDict, wordText, aPron):
        hasStress = False
        for syllable in syllableList:
            for phone in syllable:
-                hasStress = "'" in phone or hasStress 
+                hasStress = u"ˈ" in phone or hasStress
        if hasStress:
            withStress.append(i)
@@ -164,7 +169,7 @@ def _findBestPronunciation(isleDict, wordText, aPron):
    for i, numDiff in enumerate(numDiffList):
        if numDiff != minDiff:
            continue
-        if bestIndex == None:
+        if bestIndex is None:
            bestIndex = i
            bestIsStressed = i in withStress
        else:
@@ -172,8 +177,8 @@ def _findBestPronunciation(isleDict, wordText, aPron):
                bestIndex = i
                bestIsStressed = True
-    
+    return (isleWordList, alignedActualPronunciationList,
-    return isleWordList, alignedActualPronunciationList, alignedSyllabificationList, bestIndex
+            alignedSyllabificationList, bestIndex)
 def _syllabifyPhones(phoneList, syllableList, isleStressList):
@@ -193,9 +198,9 @@ def _syllabifyPhones(phoneList, syllableList, isleStressList):
    start = 0
    syllabifiedList = []
-    for i, end in enumerate(numPhoneList):
+    for end in numPhoneList:
-        syllable = phoneList[start:start+end]
+        syllable = phoneList[start:start + end]
        syllabifiedList.append(syllable)
        start += end
@@ -212,21 +217,6 @@ def alignPronunciations(pronI, pronA):
    pronI = [char for char in pronI]
    pronA = [char for char in pronA]
    # -- allow for some flexibility in pronunciation
    correctionsTuple = (('d', 't'), ('t', 'd'), ('s', 'z'), ('z', 's'),
                        ('m', 'n'), ('n', 'm'),)
    doMatch = lambda i, a: ((i == a) or 
                            ((i, a) in correctionsTuple))
    def matchExists(targetPhone, pron):
        match = False
        for phone in pron:
            match = match or doMatch(targetPhone, phone)
        return match
    # Remove vowels
    # Remove any elements not in the other list (but maintain order)
    pronITmp = pronI
    pronATmp = pronA
@@ -254,17 +244,19 @@ def alignPronunciations(pronI, pronA):
    # Fill in any blanks such that the sequential items have the same
    # index and the two strings are the same length
-    for x in xrange(len(sequenceIndexListA)):
+    for x in range(len(sequenceIndexListA)):
        indexA = sequenceIndexListA[x]
        indexI = sequenceIndexListI[x]
-        if indexA < indexI :
+        if indexA < indexI:
-            for x in xrange(indexI - indexA):
+            for x in range(indexI - indexA):
                pronA.insert(indexA, "''")
-            sequenceIndexListA = [val + indexI - indexA for val in sequenceIndexListA]
+            sequenceIndexListA = [val + indexI - indexA
                                  for val in sequenceIndexListA]
        elif indexA > indexI:
-            for x in xrange(indexA - indexI):
+            for x in range(indexA - indexI):
                pronI.insert(indexI, "''")
-            sequenceIndexListI = [val + indexA - indexI for val in sequenceIndexListI]
+            sequenceIndexListI = [val + indexA - indexI
                                  for val in sequenceIndexListI]
    return pronI, pronA
@@ -277,19 +269,32 @@ def findBestSyllabification(isleDict, wordText, actualPronunciationList):
    the syllabification for that pronunciation and map it onto the
    input pronunciation.
    '''
-    retList = _findBestPronunciation(isleDict, wordText, actualPronunciationList)
+    retList = _findBestPronunciation(isleDict, wordText,
                                     actualPronunciationList)
    isleWordList, alignedAPronList, alignedSyllableList, bestIndex = retList
    alignedPhoneList = alignedAPronList[bestIndex]
    alignedSyllables = alignedSyllableList[bestIndex]
    syllabification = isleWordList[bestIndex][0]
-    stressedIndex = isleWordList[bestIndex][1]
+    stressedSyllableIndexList = isleWordList[bestIndex][1]
    stressedPhoneIndexList = isleWordList[bestIndex][2]
    stressedSyllable, syllableList = _syllabifyPhones(alignedPhoneList,
                                                      alignedSyllables,
-                                                      stressedIndex)
+                                                      stressedSyllableIndexList)
-    return stressedSyllable, syllableList, syllabification, stressedIndex
+    # Count the index of the stressed phones, if the stress list has
    # become flattened (no syllable information)
    flattenedStressIndexList = []
    for i, j in zip(stressedSyllableIndexList, stressedPhoneIndexList):
        k = j
        for l in range(i):
            k += len(syllableList[l])
        flattenedStressIndexList.append(k)
    return (stressedSyllable, syllableList, syllabification,
            stressedSyllableIndexList, stressedPhoneIndexList,
            flattenedStressIndexList)
 def findClosestPronunciation(isleDict, wordText, aPron):
@@ -298,9 +303,7 @@ def findClosestPronunciation(isleDict, wordText, aPron):
    '''
    retList = _findBestPronunciation(isleDict, wordText, aPron)
-    isleWordList, actualPronunciationList, bestIndex = retList
+    isleWordList = retList[0]
    bestIndex = retList[3]
    return isleWordList[bestIndex]
@@ -1,16 +1,19 @@
 #!/usr/bin/env python
 # encoding: utf-8
 '''
 Created on Oct 15, 2014
@author: tmahrt
 '''
 import codecs
 from distutils.core import setup
 setup(name='pysle',
-      version='1.0.0',
+      version='1.4.0',
      author='Tim Mahrt',
      author_email='timmahrt@gmail.com',
      package_dir={'pysle':'pysle'},
      packages=['pysle'],
      license='LICENSE',
-      long_description=open('README.rst', 'r').read(),
+      long_description=codecs.open('README.rst', 'r', encoding="utf-8").read(),
 #       install_requires=[], # No requirements! # requires 'from setuptools import setup'
      )
@@ -1,3 +1,4 @@
 #encoding: utf-8
 '''
 Created on Oct 22, 2014
@@ -12,21 +13,23 @@ from pysle import pronunciationtools
 # In this first example we look up the syllabification of a word and get it's 
 # stress information.
-searchWord = 'pumpkins'
+searchWord = 'catatonic'
-isleDict = isletool.LexicalTool('islev2.txt')
+isleDict = isletool.LexicalTool('ISLEdict.txt')
 lookupResults = isleDict.lookup(searchWord)
 firstEntry = lookupResults[0]
 firstSyllableList = firstEntry[0] 
 firstSyllableList = ".".join([u" ".join(syllable) for syllable in firstSyllableList])
 firstStressList = firstEntry[1]
-print searchWord
+print(searchWord)
-print firstSyllableList, firstStressList # 3rd syllable carries stress
+print(firstSyllableList)
 print(firstStressList) # 3rd syllable carries stress
 # Here we determine the syllabification of a word, as it was said.
 # (Of course, this is just a guess)
-print '-'*50
+print('-'*50)
 searchWord = 'another'
 anotherPhoneList = ['n', '@', 'th', 'r']
@@ -35,10 +38,14 @@ returnList = pronunciationtools.findBestSyllabification(isleDict,
                                                        searchWord, 
                                                        anotherPhoneList)
-stressedSyllable, syllableList, syllabification, stressedIndex = returnList
+(stressedSyllable, syllableList, syllabification,
-
+stressedSyllableIndexList, stressedPhoneIndexList,
-print searchWord
+flattenedStressIndexList) = returnList
-print anotherPhoneList
+print(searchWord)
-print syllableList # We can see the first syllable was elided
+print(anotherPhoneList)
-
+print(stressedSyllableIndexList) # We can see the first syllable was elided
 print(stressedPhoneIndexList)
 print(flattenedStressIndexList)
 print(syllableList)
 print(syllabification)
@@ -0,0 +1,55 @@
 #encoding: utf-8
 '''
 Created on July 08, 2016
@author: tmahrt
 Basic examples of common usage.
 '''
 import random
 from pysle import isletool
 tmpPath = r"C:\Users\Tim\Dropbox\workspace\pysle\test\ISLEdict.txt"
 isleDict = isletool.LexicalTool(tmpPath)
 def printOutMatches(matchStr, numSyllables=None, wordInitial='ok',
                    wordFinal='ok', spanSyllable='ok', stressedSyllable='ok',
                    multiword='ok', numMatches=None, matchList=None):
    if matchList is None:
        matchList = isleDict.search(matchStr, numSyllables, wordInitial,
                                    wordFinal, spanSyllable, stressedSyllable,
                                    multiword)
    else:
        matchList = isletool.search(matchList, matchStr, numSyllables, wordInitial,
                                    wordFinal, spanSyllable, stressedSyllable,
                                    multiword)
    if numMatches is not None and len(matchList) > numMatches:
        random.shuffle(matchList)
    for i, matchTuple in enumerate(matchList):
        if numMatches is not None and i > numMatches:
            break
        word, pronList = matchTuple
        print("%s: %s" % (word, ",".join(pronList)))
    print("")
    return matchList
 # 2-syllable words with a stressed syllable containing 'dV' but not word initially
 printOutMatches("dV", stressedSyllable="only", spanSyllable="no",
                wordInitial="no", numSyllables=2, numMatches=10)
 # 3-syllable word with an 'ld' sequence that spans a syllable boundary
 printOutMatches("lBd", wordInitial="no", multiword='no',
                numSyllables=3, numMatches=10)
 # words ending in 'inth'
 matchList = printOutMatches(u"ɪnɵ", wordFinal="only", numMatches=10)
 # that also start with 's'
 matchList = printOutMatches("s", wordInitial="only", numMatches=10,
                            matchList=matchList, multiword="no")
@@ -12,21 +12,25 @@ This snippet shows you how to use this function.
 from os.path import join
-import praatio
+from praatio import tgio
 from pysle import isletool
 from pysle import praattools
 path = join('.', 'files')
 path = "/Users/tmahrt/Dropbox/workspace/pysle/test/files"
-tg = praatio.openTextGrid(join(path, "pumpkins.TextGrid"))
+tg = tgio.openTextGrid(join(path, "pumpkins.TextGrid"))
-isleDict = isletool.LexicalTool('/Users/tmahrt/Dropbox/workspace/pysle/test/islev2.txt') # Needs the full path to the file
+
 # Needs the full path to the file
 islevPath = '/Users/tmahrt/Dropbox/workspace/pysle/test/islev2.txt'
 isleDict = isletool.LexicalTool(islevPath)
 # Get the syllabification tiers and add it to the textgrid
 syllableTG = praattools.syllabifyTextgrid(isleDict, tg, "word", "phone",
                                          skipLabelList=["",])
 tg.addTier(syllableTG.tierDict["syllable"])
-tg.addTier(syllableTG.tierDict["tonic"])
+tg.addTier(syllableTG.tierDict["tonicSyllable"])
 tg.addTier(syllableTG.tierDict["tonicVowel"])
Author	SHA1	Message	Date
M Clark	9e212125b1	insert takes 2 arguments In python 3.4 and 2.7 I get an error "insert takes 2 arguments". This PR fixes it, but I had to assume it was intended to be an append.	2016-10-20 08:27:40 +08:00
Tim Mahrt	1b1903bc0b	BUGFIX: Dental and Stop special keys don't match multichar sounds like tʃ So 't' will be matched but not 'tʃ'	2016-07-20 16:35:44 +02:00
Tim Mahrt	5e64deebe6	BUGFIX: Removed diacritics from strings while searching Unless the user is explicitly searching for the diacritic. Also, added some more documentation.	2016-07-18 17:10:22 +02:00
Tim Mahrt	bce3c8ff23	BUGFIX: Protect () and [] in searches	2016-07-18 17:08:02 +02:00
Tim Mahrt	4056b105c9	BUGFIX: Monophthongs searches no longer match dipthongs This was in the code but the functionality didn't work.	2016-07-18 17:06:50 +02:00
Tim Mahrt	d88ff7d8d9	FEATURE: Updated to new isledict format. Now using unicode IPA It made the code a little more complex and now the system is less typing friendly but is more intuitive (no more guessing how to pronounce a character). Update includes changes to documentation.	2016-07-16 00:49:45 +02:00
Tim Mahrt	4cc4bf85ec	DOCUMENTATION: Formatting fix	2016-07-09 17:26:47 +02:00
Tim Mahrt	4c1a26ed03	DOCUMENTATION: Ready for release v1.4	2016-07-09 17:23:44 +02:00
Tim Mahrt	ac8643678b	REFACTOR: Names follow pep008	2016-07-09 17:23:18 +02:00
Tim Mahrt	b76454f626	FEATURE: Added searching of words by pronunciation Based on regular expressions with some keyword parameters to simplify search queries. A set of examples is provided.	2016-07-09 17:22:12 +02:00
timmahrt	ea0bc5c5cd	BUGFIX: Unicode error in installation file python 2.x	2016-07-08 00:19:35 +02:00
timmahrt	5d70367bfc	BUGFIX: OS X default encoding with io.open is whack I'm changing everything to utf-8	2016-06-29 23:58:38 +02:00
Tim Mahrt	81257bdfaf	BUGFIX: Brought example file up-to-date with code	2016-06-28 20:27:35 +02:00
Tim Mahrt	88f79d63e8	BUGFIX: 'U' mode depreciated in open in favor of io.open 'U' is universal line support mode, which is the default mode in io.open	2016-06-28 20:23:29 +02:00
Tim Mahrt	2dcb92217d	BUGFIX: ASCII installation files with unicode caused PIP problems	2016-03-22 12:21:21 +01:00
Tim Mahrt	a36d7c8d17	REFACTOR: Gave the tonic vowel tier a more representative name	2016-03-16 12:00:19 +01:00
Tim Mahrt	65ac652dea	DOCUMENTATION: Version changed to 1.3 in the setup.py file	2016-03-16 11:17:16 +01:00
Tim Mahrt	ee08c347d5	DOCUMENTATION: Bolding text	2016-03-15 17:51:01 +01:00
Tim Mahrt	c16c68a6ac	DOCUMENTATION: Added to the acknowledgements.	2016-03-15 17:48:24 +01:00
Tim Mahrt	bc4f19c74c	FEATURE: Index to stressed vowel; marking of stressed vowels on textgrids - the index to the stressed syllable was provided in the past. Now the library also includes the index to the stressed vowel. This is provided with relation to the phones in the syllable and all phones in the word. - the code that marks the stressed syllables in the textgrids also now marks the stressed vowels - several variables renamed to be more informative	2016-03-15 17:42:33 +01:00
Tim Mahrt	c19cde7165	DOCUMENTATION: The link in the last update didn't work.	2016-02-18 14:21:06 +01:00
Tim Mahrt	38ebc7f3f9	BUGFIX: Python 3.x compability Changed xrange -> range Also added some documentation and changed the version number.	2016-02-18 14:17:49 +01:00
Tim Mahrt	102e8a7488	DOCUMENTATION: Removed duplicated text	2016-01-25 13:26:40 +01:00
Tim Mahrt	6b786cd00a	DOCUMENTATION: Bolding	2016-01-25 13:25:33 +01:00
Tim Mahrt	fb1e638cb8	DOCUMENTATION: Fixed link	2016-01-25 13:19:55 +01:00
Tim Mahrt	e5acdfce30	DOCUMENTATION: Corrected islex reference, bolded grant numbers.	2016-01-25 13:18:20 +01:00
Tim Mahrt	d47c312de7	DOCUMENTATION: Added requirements text about Python 3 to readme file.	2016-01-25 13:05:32 +01:00
Tim Mahrt	303d9bfcf2	DOCUMENTATION: Added revision information to pysle and more acknowledgements	2016-01-25 13:02:57 +01:00
Tim Mahrt	9c0ccd5748	DOCUMENTATION: Acknowledgements and citing information added	2016-01-25 12:39:43 +01:00
timmahrt	393182500e	REFACTOR: Syncronized changes with the praatio library Optional textgrid functionality requires praatio 2.1.0 or greater.	2015-07-28 14:30:20 -05:00
timmahrt	985d68da6c	REFACTOR: Change print statement to print function	2015-06-19 17:29:19 -05:00
timmahrt	0e53ed654e	REFACTOR: PEP 8 compliance and minor bugfix For bugfix, see last change in pronunciationtools.py	2015-06-18 19:56:15 -05:00
timmahrt	ce633d0590	BUGFIX: Reflect changes in praatio library	2015-06-16 02:27:46 -05:00
timmahrt	e2a2025f5b	Merge remote-tracking branch 'origin/master'	2015-06-11 15:46:36 -05:00
timmahrt	c10e3cf05f	BUGFIX: Was unable to read islev2.txt with trailing newline My custom islev2.txt did not have a trailing newline.	2015-06-11 15:43:27 -05:00
timmahrt	06222bf176	REFACTOR: PEP 8 compliance	2015-06-11 15:00:26 -05:00
Tim	6353e0172e	Update README.rst	2015-06-01 15:01:29 -05:00
timmahrt	fad0dd2902	SPEED BOOST: Now word lookup ~65 times faster. Used to iterate through the isle text file for each search. Now builds a dictionary of the form{word:pronunciation list,}	2015-01-29 23:02:13 -06:00
timmahrt	475053eee2	DOCUMENTATION: Moved the project description up.	2014-10-23 15:53:57 -05:00
timmahrt	08f8e859cc	DOCUMENTATION: Added link to praatio. Added table of contents. Also added some clarification about the requirements.	2014-10-23 15:51:35 -05:00
timmahrt	9cd6a7e68b	DOCUMENTATION: Added/cleaned up the readme file Added a new section 'common use cases' since I get that question a lot.	2014-10-23 15:41:02 -05:00