Delete string between two symbols, if both symbols appear in the string

Question

I want to delete a substring between a '+' and a '@' symbol together with the '+, if the '+' exists.

d = {'1' : '[email protected]', '2' : '[email protected]', '3' : '[email protected]', '4':'[email protected]'}

test_frame = pd.Series(d)

test_frame
Out[6]: 
1    [email protected]
2            [email protected]
3      [email protected]
4               [email protected]
dtype: object

So, the result should be:

s = {'1' : '[email protected]', '2' : '[email protected]', '3' : '[email protected]', '4':'[email protected]'}

test_frame_result = pd.Series(s)

test_frame_result
Out[10]: 
1    [email protected]
2      [email protected]
3    [email protected]
4         [email protected]
dtype: object

I tried it with split, but due to the fact that only some lines contain a +, it fails.

Is there an elegant solution without looping through all the lines (in the original dataset there are quite many).

Thanks!

If you don't "loop through all the lines" how can you process all of them? — user202729, Feb 06 '18 at 15:24
Does [this](https://stackoverflow.com/questions/4444477/how-to-tell-if-a-string-contains-a-certain-character-in-javascript) solve your problem "only some lines contain a +"? — user202729, Feb 06 '18 at 15:24
Ad first comment: if I only wanted the first 5 letters I could do that without looping through: test_frame_result.str[:5] — maxtenzin, Feb 06 '18 at 15:27
What about [this](https://stackoverflow.com/questions/26577516/pandas-test-if-string-contains-one-of-the-substrings-in-a-list)? Also implicitly the slice operator is (most likely) implemented using loops. Just that a loop in C is (often) faster than a loop in a higher level language. — user202729, Feb 06 '18 at 15:28

score 1 · Accepted Answer · answered Feb 06 '18 at 15:59

Is this sufficient?

import pandas as pd
d = {'1' : '[email protected]', 
         '2' : '[email protected]', 
         '3' : '[email protected]', 
         '4':'[email protected]'}

test_frame = pd.Series(d)
test_frame
print test_frame

found = test_frame[test_frame.str.contains(r'\+')]
test_frame[found.index] = found.str.replace(r'\+[^@]*', "")
print test_frame

Output:

(Before)

1    [email protected]
2            [email protected]
3      [email protected]
4               [email protected]
dtype: object

(After)

1    [email protected]
2      [email protected]
3    [email protected]
4         [email protected]
dtype: object

glad it was helpful – Dmitry Duplyakin Feb 06 '18 at 16:02 — Dmitry Duplyakin, Feb 06 '18 at 16:02

score 0 · Answer 2 · answered Feb 06 '18 at 15:59

Found a solution - probably not the most elegant though:

import pandas as pd

test_frame = pd.DataFrame({'email':['[email protected]','[email protected]','[email protected]','[email protected]']})

test_frame
Out[22]: 
                      email
0  [email protected]
1          [email protected]
2    [email protected]
3             [email protected]

test_frame.loc[test_frame.email.str.contains('\+'),'email'] = test_frame[test_frame.email.str.contains('\+')].email.str.partition('+')[0] + '@' + test_frame[test_frame.email.str.contains('\+')].email.str.partition('+')[2].str.partition('@')[2]

test_frame
Out[24]: 
                email
0  [email protected]
1    [email protected]
2  [email protected]
3       [email protected]

Delete string between two symbols, if both symbols appear in the string

2 Answers2