07-01-2015 02:39 AM - edited 07-01-2015 02:40 AM
Say, I have an input like this : uv tk ab cd ef ab gh jk ab lm op
I would like to extract content between the first ab pairs. So with Match pattern and expression is ab .* ab , I got the content between the first ab and the last ab which is not I want.
Any suggestions ? I try to use match pattern and not match regex for performance reasons...
07-01-2015 03:17 AM
Correction,
My input string is uv tk ab cd ef abc gh jk abc lm op and I would like to get the string between first ab and abc pair.
07-01-2015 03:48 AM
Something like this?
07-01-2015 03:50 AM - edited 07-01-2015 03:51 AM
Simple approach:
- Search fot the first "ab" in your string.
- Then search for the next "ab" in the string starting the search after the first "ab"
- get the StringSubset between both occurances…
Surely not as elegant like a RegEx MatchPattern, but realized very quickly…
Edit: Munna already shows how to do it!
07-01-2015 01:00 PM - edited 07-01-2015 01:01 PM
Yet another possibility would be to use "scan strings for tokens", a very powerful, but under-used function. Just define "ab" as delimiter and you et all the sections between "ab" quickly. If you know it is always the second one, replace the while loop with a FOR loop, set the number of iterations accordingly and don't use autoindexing on the output. (top)
You could even use "Spreadsheet string to array" with a delimiter of "ab". (middle)
For the smallest match, you could also use your code, but with a patterns of "ab[^ab]+ab", of course the matched string will also contain the ab's, which you would need to trim afterwards. There are probably better ways so some of the regex wizards will hopefully chime in. (bottom)
Here is the mentioned code:
07-01-2015 01:51 PM - edited 07-01-2015 01:54 PM
@sdfsdfsdfadgadf wrote:
Say, I have an input like this : uv tk ab cd ef ab gh jk ab lm op
I would like to extract content between the first ab pairs. So with Match pattern and expression is ab .* ab , I got the content between the first ab and the last ab which is not I want.
Any suggestions ? I try to use match pattern and not match regex for performance reasons...
You can use the following expression: ab[^(ab)]*ab
Altenbach, I think you have to put () around ab in the negated character class to avoid a match on a single a or b.
Ben64
07-01-2015 01:54 PM
@ben64 wrote:
You can use the following expression: ab[^(ab)]*ab
That still includes the ab's. He apparently only wants the stuff in-between.
07-01-2015 01:56 PM
07-01-2015 02:12 PM
ben64 wrote:Altenbach, I think you have to put () around ab in the negated character class to avoid a match on a single a or b.
Thanks! Most likely you are right. I don't typically do these things...
(Hey, I am a graphical programmer and don't handle syntax very well. :D)
07-01-2015 03:11 PM
This is the way to go apparently since .* is greedy and attempt to match the longest string. Unfortunately, I see it is 4 times slower than Spreadsheet string to array although it looks like it is more straightforward.
So in conclusion, there is no equivalent operator that attemps non-greedy match (like .+ or something, match zero or more).