 |
chamenas Wizard

Joined: 26 Mar 2008 Posts: 1547
|
Posted: Wed Jun 18, 2008 5:48 pm
Pattern Descrepincy |
CMUD says that: Yasan tells you (New Thalosian) 'ooc: test'
does not match:
Code: |
#REGEX "Tell Capture" {^([\w']+ |A masked swashbuckler |\(An Imm\) ){1,2}tells you (?:\((\w+){1,2}\) )?'(.*)'$} {#CAP Tells} {General Captures}
#REGEX "Tell Capture" {^([\w']+ |A masked swashbuckler |\(An Imm\) ){1,2}tells you (?:\((\w+){1,2}\) )?'(.*)'$} {#CAP Tells} {General Captures}
|
Whereas regex coach says it does... why? is there something I'm missing here? |
|
|
 |
Fang Xianfu GURU

Joined: 26 Jan 2004 Posts: 5155 Location: United Kingdom
|
Posted: Wed Jun 18, 2008 5:59 pm |
Your construct (?:\((\w+){1,2}\) ) is wrong. It's repeating the \w+ once or twice, but because there's no space between them, it won't match properly when the string is two words (try using (\w+){1,2} to match a string of two words). If you put that space in, though, it fails to match because there's not a space after Thalosian. It'll probably be most simply matched by changing it to (?:\([\w ]+\) )?
|
|
|
 |
chamenas Wizard

Joined: 26 Mar 2008 Posts: 1547
|
Posted: Thu Jun 19, 2008 1:29 am |
The new pattern seems to ignore the single words
Yasan tells you (elvish) 'blah' |
|
|
 |
Vijilante SubAdmin

Joined: 18 Nov 2001 Posts: 5187
|
Posted: Thu Jun 19, 2008 8:32 am |
The only reason to use the word construct with multiple repeat layers is for back tracking control. The idea behind it is that it only grabs so many words then tests the next part, which would be "tells" in this case. Should "tells" fail it would give back an entire word all at once then test again. Since the range here of {1,2} requires at least 1 word a second failure to match the subsequent "tells" immediately causes the entire pattern to fail. It is a speed optimization.
That optimization is very unnecessary when you have a beginning and ending boundary. Since your ending boundary character ')' can not be part of the text it makes more sense to match everything that is not that character.
Total list of changes: made all opening parenthesis non-capturing, moved first wildcarded word matching to the last position of the alternate list, moved the repeat count from outside of the alternate list to the inside so it only applies to the wildcard word, and changed the matching for the bounded portion.
Code: |
#REGEX "Tell Capture" {^(?:A masked swashbuckler |\(An Imm\) |(?:[\w']+ ){1,2})tells you (?:\([^\)]+\) )?'(?:.*)'$} {#CAP Tells} {General Captures} |
|
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
 |
chamenas Wizard

Joined: 26 Mar 2008 Posts: 1547
|
Posted: Thu Jun 19, 2008 6:42 pm |
Alright, so [^\)]+ means match any characters in here that are not ) ?
|
|
|
 |
Fang Xianfu GURU

Joined: 26 Jan 2004 Posts: 5155 Location: United Kingdom
|
Posted: Thu Jun 19, 2008 7:07 pm |
Yes.
|
|
|
 |
|
|