python - Help on Regular Expression problem -
i wonder if it's possible make regex following data pattern:
'152: ashkenazi a, benlifer a, korenblit j, silberstein sd.'
string = '152: ashkenazi a, benlifer a, korenblit j, silberstein sd.'
i using regular expression (using python's re module) extract these names:
re.findall(r'(\d+): (.+), (.+), (.+), (.+).', string, re.m | re.s)
result:
[('152', 'ashkenazi a', 'benlifer a', 'korenblit j', 'silberstein sd')]
now trying different number (less 4 or more 4) of name data pattern doesn't work anymore because regex expects find 4 of them:
(.+), (.+), (.+), (.+).
i can't find way generalize pattern.
this should trick if want stuff after numbers:
re.findall(r'\d+: (.+)(?:, .+)*\.', input, re.m | re.s)
and if want everything:
re.findall(r'(\d+): (.+)(?:, .+)*\.', input, re.m | re.s)
and if want them separated out list of matches, nested regex it:
re.findall(r'[^,]+,|[^,]+$', re.findall(r'\d+: (.+)(?:, .+)*\.', input, re.m | re.s)[0],re.m|re.s)
Comments
Post a Comment