2

I have this column in MS Excel 2010 - which has a combination of 'zip code' and 'email-ids'

I am trying to extract these zip-codes(20530, 90012-3308 etc.) from this column.

    20530 [email protected]
    20530 [email protected]
    20530 [email protected]
    20530 [email protected]
    20004 [email protected]
    20530 [email protected]
    90012-3308  [email protected]
    90012-3308  [email protected]
    90012 [email protected]

I tried Python's re module.

import re


for i in range(1, 9):
     Cell(i, 4).value = re.findall(r'\d+', Cell(i, 1).value) #storing result in column4

I ran the regex on that column and I got this result:

[u'20530']
[u'20530']
[u'20530']
[u'20530']
[u'20004', u'9']
[u'20530', u'8']
[u'90012', u'3308']
[u'90012', u'3308']
[u'90012']

How can I extract the results, into the human readable zip-code form?

hky404
  • 1,039
  • 3
  • 17
  • 35
  • And what were the results of your regex experiment? Why not just `.split()[0]`? – jonrsharpe May 15 '14 at 19:04
  • `[u'20530'] [u'20530'] [u'20530'] [u'20530'] [u'20004', u'9'] [u'20530', u'8'] [u'90012', u'3308'] [u'90012', u'3308'] [u'90012']` – hky404 May 15 '14 at 22:49

3 Answers3

6

Why can't you just split?

>>> '20530 [email protected]'.split()
['20530', '[email protected]']

Then just grab the first element.

>>> '20530 [email protected]'.split()[0]
'20530'

For all your data:

l = ['20530 [email protected]',
     '20530 [email protected]',
     '20530 [email protected]',
     '20530 [email protected]',
     '20004 [email protected]',
     '20530 [email protected]',
     '90012-3308  [email protected]',
     '90012-3308  [email protected]',
     '90012 [email protected]']

[entry.split()[0] for entry in l]

Result

['20530', '20530', '20530', '20530', '20004', '20530', '90012-3308', '90012-3308', '90012']
Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
  • Actually, I am working with 400 rows here, and manually adding commas and quotations around a particular list item is not feasible, I am talking about your **list l** – hky404 May 15 '14 at 21:35
  • You can take `str(Cell(i).value)` for example to turn it into a string. – Cory Kramer May 15 '14 at 22:42
2

The following regular expression will match each string and extract the postal code as group 1:

([\d\-]+)\s+[\w@\.]+

Here's the Python code to extract all of the postal codes at once:

import re
text = r'''    20530 [email protected]
    20530 [email protected]
    20530 [email protected]
    20530 [email protected]
    20004 [email protected]
    20530 [email protected]
    90012-3308  [email protected]
    90012-3308  [email protected]
    90012 [email protected]'''
re.compile(r'([\d\-]+)\s+[\w@\.]+').findall(text)
Jonathan Benn
  • 2,908
  • 4
  • 24
  • 28
1

just an additional note making answer specific to your original question on DataNitro.

Have done lots of DataNitro loopinfg like that and the most efficiant way of reading in a whole column is:

l = Cell("A1").vertical
# returns a list of all values starting in A1 going down to 1st blank cell

combining with @cyber's solution two liner will give you your answer:

l = Cell("A1").vertical
[entry.split()[0] for entry in l]

or if you prefer flexibility of regex Johnathan Benn answer becomomes:

l = Cell("A1").vertical
[re.compile(r'([\d\-]+)\s+[\w@\.]+').findall(entry) for entry in l]
Joop
  • 7,840
  • 9
  • 43
  • 58