4

I have a old table that has applicant info including the person who interviewed them Name|Phone|Email|Addr|Interv|IntervEmail

unfortunately the interviewer emails are a mess. Some blank some have phone numbers some have spaces or other special characters like [email protected] / 999-222-5555 home or home 999-222-5555, [email protected]

What I would like is be able to get all rows with interviewer emails from the table but only the email and not the other junk.

any help appreciated

example From this

user   | home_phone | applicant_email           | addr      | intrvwr_nm | intrvwr_email
-------+------------+---------------------------+-----------+------------+---------------------------------------------------------------------
Jane   | 1234567890 | [email protected] | 123 here  | Jeff       | [email protected] / 321589364
Larry  | 1234567890 | [email protected] | 123 there | Eddie      | [email protected] / 2321.654.9872
harry  | 1234567890 | [email protected] | 456 here  | Murray     | Admissions Field Representative/[email protected]/212-222-1111
Carrie | 1234567890 | [email protected] | 456 there | Del        | [email protected]
Joe    | 1234567890 | [email protected] | 123 here  | James      | 1234567890/sometext, [email protected]

To this

user   | home_phone | applicant_email           | addr      | intrvwr_nm | intrvwr_email
-------+------------+---------------------------+-----------+------------+---------------------------------------------------------------------
Jane   | 1234567890 | [email protected] | 123 here  | Jeff       | [email protected]
Larry  | 1234567890 | [email protected] | 123 there | Eddie      | [email protected]
harry  | 1234567890 | [email protected] | 456 here  | Murray     | [email protected]
Carrie | 1234567890 | [email protected] | 456 there | Del        | [email protected]
Joe    | 1234567890 | [email protected] | 123 here  | James      | [email protected]
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
Variant
  • 51
  • 6
  • 1
    Is this a one-time thing? How many rows are you looking at? This might be something more well suited for a platform that supports regular expressions. – squillman Feb 21 '20 at 20:12
  • Yes most likely one time deal so we can copy clean data to new table. it has 1981 rows atm – Variant Feb 21 '20 at 20:16
  • however, we might need it for other tables. It would probably be easier to put it in an excel sheet and do a regexp (which I would not know how to either now) but we might need something similar for all the old tables – Variant Feb 21 '20 at 20:22
  • 3
    Can you provide a [mcve]? – Sean Lange Feb 21 '20 at 20:25
  • 1
    Do all of the email addresses in the garbage rows have a delimiting character (such as a space) on either end of the address? If so, something in t-sql could probably be created fairly easily. If not, not even sure regexp could take care of it... – squillman Feb 21 '20 at 20:28
  • I added an example of what it looks like now and what we would like the output to be – Variant Feb 21 '20 at 20:50
  • 3
    Given the total trainwreck of that data and that you only have 1,981 rows I would just manually edit this. Any query you write is going to be fraught with issues because that data is a mess. There are just too many variables with this type of data. – Sean Lange Feb 21 '20 at 20:50
  • Agreed. Just from the sample data posted you have no way of programatically consistently distinguishing an email address. – squillman Feb 21 '20 at 20:54
  • 1
    I agree with Sean Lange here. According to https://stackoverflow.com/questions/2049502/what-characters-are-allowed-in-an-email-address a slash is allowed in an email address, so it needs common sense to tell whether the email address is `'Representative/[email protected]'` or just `'[email protected]'`. – Thorsten Kettner Feb 21 '20 at 21:08
  • Old data slashes most likely not an issue – Variant Feb 21 '20 at 21:35
  • 1
    What's your timeline? This looks like an _awesome_ project for a summer intern.... – Eric Brandt Feb 21 '20 at 21:54
  • 2
    Suggest you read about this madness first - [the 100% way to validate](https://medium.com/hackernoon/the-100-correct-way-to-validate-email-addresses-7c4818f24643) – SMor Feb 21 '20 at 22:14

1 Answers1

3

Try running this query. It would catch most of the valid email address:

SELECT a.email 
FROM
(
SELECT value as email  
FROM applicant_Table  
    CROSS APPLY STRING_SPLIT(intrvwr_email, ' ') 
) as a
where a.email like '%_@__%.__%'

Please take into account that:

  1. This only works for SQL Server 2016 or above.
  2. The delimiter is a space.

You can add more columns to the query (like the row ID) so you get to know in which row is stored every email address you're getting.

  • that returns rows no emails at all. and SELECT * FROM applicants WHERE email LIKE '%_@__%.__%' returns the other junk with the emai I just need the rows with an email and the cell should only have the email itself not the phone or other stuff – Variant Feb 21 '20 at 20:19
  • that actually worked good except when there was no space between the email and junk returned [email protected]/(123) instead of [email protected]/(123) 456789 – Variant Feb 21 '20 at 21:07
  • 2
    @CrashWave Just a minor twist on the answer above (+1) https://dbfiddle.uk/?rdbms=sqlserver_2016&fiddle=c81128c5f708f20ca6a702d6a048b9aa – John Cappelletti Feb 21 '20 at 21:59
  • 1
    @john-cappelletti Your minor twist soled our problem. THANK YOU – Variant Feb 27 '20 at 14:09