1. 反斜杠的困扰(The Backslash)
2. match的几种methods
match(): 判断RE是否从第一个字符开始匹配
search(): 判断是否从任意一个位置匹配(不管是否从第一个字符开始)
findall(): 找到RE匹配的所有子字符串,并以list的形式返回
finditer(): 找到RE匹配的所有子字符串,并以iterator的形式返回
1 >>> p = compile(‘[a-z]+‘)
2 >>> print p.match("")
3 None
5 >>> m = p.match(‘tempo‘)
6 >>> m.group()
7 ‘tempo‘
8 >>> m.start(), m.end()
9 (0, 5)
10 >>> m.span()
11 (0, 5)
13 >>> m = p.search(‘::: message‘)
14 >>> m.group()
15 ‘message‘
17 >>> p = re.compile(‘\d+‘)
18 >>> it = p.finditer(‘12 drummers drumming, 11 pipers piping, 10 lords a-leaping‘)
19 >>> for match in it:
20 ... print match.group()
21 ...
22 (0, 2)
23 (22, 24)
24 (29, 31)
‘|‘: “或”,A|B表示匹配A或者B
‘^‘: 匹配行的开头,一般情况下只会匹配字符串的开头,如果在MULTILINE模式下则可以匹配每行的开头
1 >>> print re.search(‘^From‘, ‘From Here to Eternity‘)
2 <_sre.SRE_Match object at 0x...>
3 >>> print re.search(‘^From‘, ‘Reciting From Memory‘)
4 None
‘\A‘: 只匹配字符串的开头
‘$‘: 匹配行的末尾,一般情况下只会匹配字符串的末尾,如果在MULTILINE模式下则可以匹配每个换行符
‘Z‘: 只匹配字符串的末尾
‘\b‘: 单词的边界,即单词的开头或者末尾,可以是空格或者非字母数字的字符
1 >>> p = re.compile(r‘\bclass\b‘)
2 >>> print p.search(‘no class at all‘)
3 <_sre.SRE_Match object at 0x...>
4 >>> print p.search(‘the declassified algorithm‘)
5 None
6 >>> print p.search(‘one subclass is‘)
7 None
1 >>> p = re.compile(‘\bclass\b‘)
2 >>> print p.search(‘no class at all‘)
3 None
4 >>> print p.search(‘\b‘ + ‘class‘ + ‘\b‘)
5 <_sre.SRE_Match object at 0x...>
‘\B‘: 匹配不是单词边界
4. Grouping
1 >>> p = re.compile(‘(a(b)c)d‘)
2 >>> m = p.match(‘abcd‘)
3 >>> m.group(0)
4 ‘abcd‘
5 >>> m.group(1)
6 ‘abc‘
7 >>> m.group(2)
8 ‘b‘
5. Non-capturing Groups
()里面跟着?:的是non-capturing group,一般用于一些分组后面不需要用到的时候
1 >>> m = re.match("([abc])+", "abc")
2 >>> m.groups()
3 (‘c‘,)
4 >>> m = re.match("(?:[abc])+", "abc")
5 >>> m.groups()
6 ()
6. Named Groups
1 >>> p = re.compile(r‘((?P<word>\b\w+\b))‘)
2 >>> m = p.search( ‘(((( Lots of punctuation )))‘ )
3 >>> m.group(‘word‘)
4 ‘Lots‘
5 >>> m.group(1)
6 ‘Lots‘
1 >>> p = re.compile(r‘(?P<word>\b\w+)\s+(?P=word)‘)
2 >>> p.search(‘Paris in the the spring‘).group()
3 ‘the the‘
7. Lookahead Assertions
(?=...) 需要在这个位置匹配括号里面之后的内容,匹配成功即可,不会消耗字符串的内容。
(?!...) 与前一个相反,不能匹配括号里面之后的内容才能匹配成功,不会消耗字符串的内容。
如在匹配带有扩展名的文件的时候,如果想要排除掉带有后缀.bat和.exe的文件,可以用 .*[.](?!bat$|exe$).*$
8. Splitting Strings
1 >>> p = re.compile(r‘\W+‘)
2 >>> p2 = re.compile(r‘(\W+)‘)
3 >>> p.split(‘This... is a test.‘)
4 [‘This‘, ‘is‘, ‘a‘, ‘test‘, ‘‘]
5 >>> p2.split(‘This... is a test.‘)
6 [‘This‘, ‘... ‘, ‘is‘, ‘ ‘, ‘a‘, ‘ ‘, ‘test‘, ‘.‘, ‘‘]
1 >>> p = re.compile(r‘\W+‘)
2 >>> p.split(‘This is a test, short and sweet, of split().‘)
3 [‘This‘, ‘is‘, ‘a‘, ‘test‘, ‘short‘, ‘and‘, ‘sweet‘, ‘of‘, ‘split‘, ‘‘]
4 >>> p.split(‘This is a test, short and sweet, of split().‘, 3)
5 [‘This‘, ‘is‘, ‘a‘, ‘test, short and sweet, of split().‘]
9. Search and Replace
p = re.compile(‘...‘) , p.sub(replacement, string, [count = 0])
1 >>> p = re.compile( ‘(blue|white|red)‘)
2 >>> p.sub( ‘colour‘, ‘blue socks and red shoes‘)
3 ‘colour socks and colour shoes‘
4 >>> p.sub( ‘colour‘, ‘blue socks and red shoes‘, count=1)
5 ‘colour socks and red shoes‘
1 >>> p = re.compile( ‘(blue|white|red)‘)
2 >>> p.subn( ‘colour‘, ‘blue socks and red shoes‘)
3 (‘colour socks and colour shoes‘, 2)
4 >>> p.subn( ‘colour‘, ‘no colours at all‘)
5 (‘no colours at all‘, 0)
1 >>> p = re.compile(‘section{ (?P<name> [^}]* ) }‘, re.VERBOSE)
2 >>> p.sub(r‘subsection{\1}‘,‘section{First}‘)
3 ‘subsection{First}‘
4 >>> p.sub(r‘subsection{\g<1>}‘,‘section{First}‘)
5 ‘subsection{First}‘
6 >>> p.sub(r‘subsection{\g<name>}‘,‘section{First}‘)
7 ‘subsection{First}‘
1 >>> def hexrepl(match):
2 ..."Return the hex string for a decimal number"
3 ...value = int(match.group())
4 ...return hex(value)
5 ...
6 >>> p = re.compile(r‘\d+‘)
7 >>> p.sub(hexrepl, ‘Call 65490 for printing, 49152 for user code.‘)
8 ‘Call 0xffd2 for printing, 0xc000 for user code.‘
Using Regular Expressions in Python,布布扣,bubuko.com
Using Regular Expressions in Python