0

I have xml files formatted like this:

<User>
<FirstName>Foo Bar</FirstName>
<LastName>Blah</LastName>
<OtherStuff>...</OtherStuff>
<More>...</More>
<CompanyName>Foo</CompanyName>
<EmailAddress>[email protected]</EmailAddress>
</User>
<User>
...

I want to read through all xml files, creating as output <FirstName>,<CompanyName>,<EmailAddress>, so:

Foo Bar,Foo,[email protected]
Name,User2,[email protected]
FSds,Blah,[email protected]

I am using the following regex

(?si)<FirstName>(.*?)</FirstName>.*?<CompanyName>(.*?)</CompanyName>\s*<EmailAddress>(.*?)</EmailAddress>'

However, this returns also everything from the tags between FirstName and CompanyName

What am I doing wrong?

Pr0no
  • 3,910
  • 21
  • 74
  • 121

2 Answers2

4

Why not use XML processing?

C:\PS> $xml = [xml]@'
>>> <Users>
>>> <User>
>>> <FirstName>Foo Bar</FirstName>
>>> <LastName>Blah</LastName>
>>> <OtherStuff>...</OtherStuff>
>>> <More>...</More>
>>> <CompanyName>Foo</CompanyName>
>>> <EmailAddress>[email protected]</EmailAddress>
>>> </User>
>>> </Users>
>>> '@
C:\PS> "$($xml.Users.User.FirstName), $($xml.Users.User.CompanyName), $($xml.Users.User.EmailAddress)"
Foo Bar, Foo, [email protected]

You haven't shown the full XML document so I'm guessing on the top level nodes. You will need to adjust based on the structure of your XML doc.

Keith Hill
  • 194,368
  • 42
  • 353
  • 369
0

I find multi-line regex can be easier if you build it in a here-string:

$String = @'
<User>
<FirstName>Foo Bar</FirstName>
<LastName>Blah</LastName>
<OtherStuff>...</OtherStuff>
<More>...</More>
<CompanyName>Foo</CompanyName>
<EmailAddress>[email protected]</EmailAddress>
</User>
'@

$regex = @'
(?ms).+?<FirstName>(.+?)</FirstName>.*?
<CompanyName>(.+?)</CompanyName>.*?
<EmailAddress>(.+?)</EmailAddress>.+?
'@

$string -match $regex > $null
$matches[1..3] -join ','



Foo Bar,Foo,[email protected]

If it's a big file and you don't want to read it all in at once, you can use the closing tag as a delimiter:

Get-Content xmlfile.xml -Delimiter '</User>' |
 foreach {
  if ($_ -match $regex)
   {$matches[1..3] -join ','
   }
mjolinor
  • 66,130
  • 7
  • 114
  • 135