I'd use something like:
REGEX = /\.(?:mil|gov)$/
%w[
[email protected]
[email protected]
[email protected]
[email protected]
].each do |addr|
puts '"%s" %s' % [addr, (addr[REGEX] ? 'matches' : "doesn't match")]
end
# >> "[email protected]" matches
# >> "[email protected]" matches
# >> "[email protected]" matches
# >> "[email protected]" doesn't match
If you know the TLD you want is always at the end of the string, then a simple pattern that matches just that is fine.
This works because addr[REGEX] uses String's [] method which applies the pattern to the string and returns the match or nil:
'foo'[/oo/] # => "oo"
'bar'[/oo/] # => nil
If you want to capture everything before the TLD:
REGEX = /(.+)\.(?:mil|gov)$/
%w[
[email protected]
[email protected]
[email protected]
[email protected]
].map do |addr|
puts addr[REGEX, 1]
end
# >> jane.doe@navy
# >> barak.obama@whitehouse
# >> [email protected]
# >>
Using it in a more "production-worthy" style:
SELECT_PATTERN = '\.(?:mil|gov)$' # => "\\.(?:mil|gov)$"
CAPTURE_PATTERN = "(.+)#{ SELECT_PATTERN }" # => "(.+)\\.(?:mil|gov)$"
SELECT_REGEX, CAPTURE_REGEX = [SELECT_PATTERN, CAPTURE_PATTERN].map{ |s|
Regexp.new(s)
}
SELECT_REGEX # => /\.(?:mil|gov)$/
CAPTURE_REGEX # => /(.+)\.(?:mil|gov)$/
addrs = %w[
[email protected]
[email protected]
[email protected]
[email protected]
].select{ |addr|
addr[SELECT_REGEX]
}.map { |addr|
addr[CAPTURE_REGEX, 1]
}
puts addrs
# >> jane.doe@navy
# >> barak.obama@whitehouse
# >> [email protected]
Similarly, you could do it without a regular expression:
TLDs = %w[.mil .gov]
%w[
[email protected]
[email protected]
[email protected]
[email protected]
].each do |addr|
puts '"%s" %s' % [ addr, TLDs.any?{ |tld| addr.end_with?(tld) } ]
end
# >> "[email protected]" true
# >> "[email protected]" true
# >> "[email protected]" true
# >> "[email protected]" false
And:
TLDs = %w[.mil .gov]
addrs = %w[
[email protected]
[email protected]
[email protected]
[email protected]
].select{ |addr|
TLDs.any?{ |tld| addr.end_with?(tld) }
}.map { |addr|
addr.split('.')[0..-2].join('.')
}
puts addrs
# >> jane.doe@navy
# >> barak.obama@whitehouse
# >> [email protected]
end_with? returns a true/false whether the string ends with that substring, which is faster than using the equivalent regular expression. any? looks through the array looking for any matching condition and returns true/false.
If you have a long list of TLDs to check, using a well written regular expression can be very fast, possibly faster than using any?. It all depends on your data and the number of TLDs to check so you'd need to run benchmarks against a sampling of your data to see which way to go.