I am recoqnizing this pattern
<.*>
From string
<a href="hello world">Hi Baby</a>
Now, there are several match
<a href="hello world">
is a match
<a href="hello world">Hi Baby</a>
is also a match.
However, that's very confusing. I thought regular expression are solved with determined finite automata.
So I would imagine that the definite finite automata would go to each letter. one by one. However, it would somehow branch. It would think that the first > is art of the closing > in the pattern. It can also mean part of the . pattern.
So how does it decides?
In vb.net, it seems that the pattern that's recognized is the second one. That is why I have to replace the pattern with
<[^>]*>
if I want the pattern to match the first (say I want to eliminate all html tags from a string)
And why is that? What does vb actually do to select the second string as those that match the pattern?
I've heard that vb is "greedy". It matches the longest string that match the pattern instead of the first working pattern. So uhmm...is this inherently ambiguous or is there a way we can how this is actually implemented?
*
is greedy, i.e. it tries to match as much as possible.*?
is lazy and tries to match as little as possible. – CodesInChaos Nov 09 '15 at 08:32*
matches as many repetitions as possible..
and>
are completely irrelevant for this. – CodesInChaos Nov 09 '15 at 09:35?
to the quantifier for non-greedy matching. E.g.*?
means 0 or more times, non-greedy. – Brandin Nov 09 '15 at 14:07