Jump to content

Please read the Forum Rules before posting.

Photo
- - - - -

e-Sword compound / complex search [Regular Expression]


11 replies to this topic

#11 Josh Bond

Josh Bond

    Administrator

  • Administrators
  • PipPipPipPipPip
  • 2,890 posts
  • LocationGallatin, TN
Offline

Posted 26 July 2011 - 09:19 AM

I've prepared a thorough explanation of how Regular Expressions work with e-Sword based on all public knowledge for the User's Guide. I've crossed out somethings that e-Sword doesn't work with below:

Regex Search

Regex is a computer term for “regular expression”.

The most important elements1 to remember are:
  • Term Symbol Action1
  • Dot . Match single character
  • Include [-] Match any listed character
  • NEGATED [^] Match any unlisted Character
  • Wild-card ? Match optional Single character Quantifier: Match zero or one times. Ex: abc? will match ab, abc
  • Wild-card * Match any number of characters Match zero or more times. Ex: abc* will match ab, abc, abcc, abccc, etc.
  • Plus + Match at least Once Match one or more times. Ex: abc+ will match abc, abcc, abccc, etc.
  • Caret ^ Match at start of Line Technically correct, but for e-Sword's purposes to an end user, "line" should be "verse".
  • Dollar $ Match at end of Line Technically correct, but for e-Sword's purposes to an end user, "line" should be "verse".
  • Virgule Less than \< Match at start of word e-Sword does not seem to support this, unless it uses an unorthodox implementation of it
  • Virgule Greater than \> Match at end of Word e-Sword does not seem to support this, unless it uses an unorthodox implementation of it
  • Bar | Match either Side
  • Parenthesis ( ) Limit the search to terms within parenthesis
  • Virgule \
  • Virgule lower case b \b Word Boundary (Anchor)
  • Virgule lower case s \s White Space (Character Class)

Example One: St Paul

Search for all the times St Paul is mentioned in the Bible: The search term is ([S|P]aul)2; Save the results to a verse list;

Regex searches are case sensitive. A search for “paul” will not list any verses that mention “Paul”;

Example Two: Compound / Complex Search
Search for all occurrences of either tax* or tribut*.
The obvious, and wrong search term is “ tax* | tribut*
The problems with that search term include: Ignores instances when the word is capitalized; Does not look at word boundaries;
The search term should be \b[Tt]ax*\b|\b[Tt]tribut*\b \b[Tt]ax|\b[Tt]ribut if the word should begin with tax or tribut
  • The initial “\b” is a word boundary, and says to start with the next letter;
  • The “[“ indicates there is a choice of letters”;
  • The “Tt” are the letters that can be used;
  • The “]” indicates the choice of letters has ended;
  • The “ax” are required letters;
  • The “*” indicates any characters may follow;
  • The “\b” indicates white space ends the word; \W comes close to this. \W is any non-word character, anything that is not a-z, A-Z, 0-9, or the _ character.
  • The “|” indicates this is an alternative to search for;
  • The initial “\b” is a word boundary, and says to start with the next letter; If it's the opening word boundary, yes. If it's the closing no.
  • The “[“ indicates there is a choice of letters”;
  • The “Tt” are the letters that can be used;
  • The “]” indicates the choice of letters has ended;
  • The “ribut” are required letters;
  • The “*” indicates any characters may follow; Asterisks do not work this way, they act as a quantifier.
  • The “\b” indicates white space ends the word; See above for the same explanation.

Example Three: People

How many people in the Bible have a first letter of “S”, and a last letter of “N”? Search term is [\bS*n\b]; Save the result to a verse list; You will have to read each verse, to see their names3; Asterisks don't work like this in the e-Sword implementation

1: Some of these symbols have a different meaning, when used as Boolean Search Operators.
2:[S|s|P|p] would cover both letters, in both cases. However, since both Paul and Saul are proper names, only the upper case letters need to be used.
3: Alternatively, one could look at people_places.dctx looking at all names that begin with “S”.


An e-Sword specific tutorial on Regex searching can be found at http://estudysource...._shares.aspx#40 A third of this is wrong as well, possibly because the definitions of commands aren't correct. I'm not sure some of the commands ever worked exactly like the author says.


jonathon



#12 JPG

JPG

    Jon.

  • Moderators
  • 1,675 posts
Online

Posted 26 July 2011 - 09:57 AM

Caret ^ Match at start of Line Technically correct, but for e-Sword's purposes to an end user, "line" should be "verse".



It is worth noting that each verse in e-Sword, actually starts with a space, so a regex search in KJV+ for ^Jesus =0 matches, you have to remember to add a space ^ Jesus


also each line/verse ends with a space, so this needs to be accounted for when using $

Jon



Reply to this topic



  


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users




Similar Topics



Latest Blogs