Simplifying backreferences

Question

This is a theoretical example for simplicity's sake. I use flat values in the regex to make it easily legible for almost anyone, since lots of {,(,[,\'s all jammed together can be a headache to read.

Suppose that I have the following code.

<input type="text" value="test" name="thing">
<input type="text" name="thing" value="test">

And I wanted to extract name and value attributes from each input, as well as the values for each of those attributes.

I could do this

(?:(name)="(thing)" (value)="(test)"|(value)="(test)" (name)="(thing)")

For the first line, the backreferences match as follows

$5: value
$6: test
$7: name
$8: thing

For the second line, the backreferences match as follows

$1: name
$2: thing
$3: value
$4: test

Picking up from the two patterns is easy enough, when it's just two patterns

{$1$5: "$2$6", $3$7: "$4$8"}

Is this the only way to do this? This is more just a curiousity. Since either the first four will match, or the second four will match, is there any kind of flag or special character to make the second set return as $1, $2, $3, $4, rather than 5-8?

In this particular case, I could do this

(?:(name|value)="([^"]*)" (name|value)="([^"]*)")

And then

{$1: "$2", $3: "$4"}

This is primarily a curiosity so that I may improve REs that I produce in the future.

I do realize that if there is anyway to do this, each side of the pipe would likely have to have an equal number of subexpressions (because many languages, if they can't match a backreference, simply display the backreference. For example.

([A-Z]*)([0-9]*)

Created backreferences $1 and $2, so using those would replace the text with the matching reference, but many languages would simply show $3 rather than replace it with nothing, or display an error.

As for the languages I use (since some support features that others don't), I'm primarily Cold Fusion (which is based on Java platform), Java, Javascript/JQuery, and I use regex frequently in Notepad++.

You appear to be using regular expressions to parse HTML elements. That is a well-known road to madness (http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Please think better of it. — Kilian Foth, Nov 15 '14 at 12:38
@KilianFoth Lol, I was simply creating a simple example. Virtually everyone reading stackoverflow can read html easily. I chose simple match criteria and a very simple (near uselessly simple, lol) regex so that anyone could read it. — Regular Jo, Nov 16 '14 at 03:58

Simplifying backreferences

0 Answers0