In my previous post, Highlighting Duplicate Sentences with PHP, I described a method for highlighting any sentences that appear multiple times within a string. Now it’s time to explore highlighting duplicate phrases. This tutorial assumes that you have a basic understanding of PHP, HTML, and CSS.
While the difference between a phrase and a sentence may seem minimal, the distinction actually adds another dimension of complexity.First of all, the definition of a phrase is not as concrete as the definition of a sentence. For our purposes, we will define a phrase as 3 to 10 consecutive words. As we will see, the smaller the range of possible phrase sizes, the faster the algorithm will run. Another complication is that phrases, unlike sentences, can overlap. Also, periods cannot split a phrase. For instance, consider the string “My name is Asa. I like bikes”
is not a single seven word phrase. Because of the period, “My name is Asa” and “I like bikes” are separate phrases.