python使用re模块操作正则表达式

马谦马谦马谦

615
文章

17
评论

2017年10月14日23:33:14Python2948字数 1109阅读3分41秒阅读模式

一、概述

re 模块是python官方提供的正则表达式模块，一些常用的方法如下：

`re.match(pattern, string, pos, endpos)`

在string中匹配pattern规则，返回一个匹配对象。

`re.search(pattern, string, pos, endpos)`

在string中查找第一个满足规则pattern的字符串，返回一个匹配对象。

`re.findall(pattern, string, pos, endpos)`

查找所有满足规则pattern的字符串，结果将返回一个元组。

`re.finditer(pattern, string, pos, endpos)`

查找所有满足条件的字符串，并以匹配对象的形式返回一个元组。

`re.sub(pattern, repl, string, count)`

把string中符合规则的字符都替换成repl，count表示替换的数量，默认匹配所有，返回被替换后的字符串。

`re.subn(pattern, repl, string, count)`

和sub函数功能一直，只是subn在返回的同时会带上被替换的字符串数量。

以上所有函数中的pos和endpos均表示在[pos, endpos)下标范围内匹配，下标索引从0开始，默认省略表示匹配整个字符串。

二、匹配对象<type "_sre.SRE_Match">

re.match()和re.search()方法都返回一个匹配对象<type "_sre.SRE_Match">，常用的方法为：

2.1 group()

返回匹配成功的字符串。

2.2 start()和end()

匹配成功后返回匹配到的字符串的开始下标和结束下标。

2.3 span()

以元组的方式返回开始下标和结束下标。

2.4 groups()

返回所有匹配到的分组。

三、示例

3.1 match方法和匹配对象

p = re.compile(r"maqian")
t = "Hellomaqian"
rs = p.match(t)
if rs is not None:
    print type(rs)
    print rs.group()
else:
    print "no match"  # no match

rs = p.match(t, 5)  # 从索引为5的下标开始匹配
if rs is not None:
    print rs.group()  # maqian
    print rs.groups()  # () 没有任何分组
    print rs.start(), rs.end()  # 5, 11
    print rs.span()  # (5, 11) 返回元组
else:
    print "no match"

p = re.compile(r"maqian")

t = "Hellomaqian"

rs = p.match(t)

if rs is not None:

print type(rs)

print rs.group()

else:

print "no match" # no match

rs = p.match(t, 5) # 从索引为5的下标开始匹配

if rs is not None:

print rs.group() # maqian

print rs.groups() # () 没有任何分组

print rs.start(), rs.end() # 5, 11

print rs.span() # (5, 11) 返回元组

else:

print "no match"

3.2 search方法

p = re.compile(r"maqian")
t = "hellomaqian"
rs = p.search(t)
if rs is not None:
    print rs.group()  # maqian
else:
    print "no match"

p = re.compile(r"maqian")

t = "hellomaqian"

rs = p.search(t)

if rs is not None:

print rs.group() # maqian

else:

print "no match"

3.3 find_all方法

p = re.compile(r"d{3}")
t = "123abc456def789"
rs = p.findall(t)
if rs is not None:
    print rs  # ["123", "456", "789"]
else:
    print "no match"

p = re.compile(r"d{3}")

t = "123abc456def789"

rs = p.findall(t)

if rs is not None:

print rs # ["123", "456", "789"]

else:

print "no match"

3.4 finditer方法

p = re.compile("d{3}")
t = "123abc456def789"
rs = p.finditer(t)
if rs is not None:  # 返回匹配对象元组
    for i in rs:
        print i.group()  # 分别打印 123 345 789
else:
    print "no match"

p = re.compile("d{3}")

t = "123abc456def789"

rs = p.finditer(t)

if rs is not None: # 返回匹配对象元组

for i in rs:

print i.group() # 分别打印 123 345 789

else:

print "no match"

3.5 分组

x = r"(?P<id>d{3})(w*).*(?P=id)"
p = re.compile(x)
t = "123abc456123"
rs = p.match(t)
if rs is not None:
    print rs.group()  # 完整匹配到的字符串123abc456123
    print rs.groups()  # 匹配到的分组("123", "abc456")
else:
    print "no matched"

x = r"(?P<id>d{3})(w*).*(?P=id)"

p = re.compile(x)

t = "123abc456123"

rs = p.match(t)

if rs is not None:

print rs.group() # 完整匹配到的字符串123abc456123

print rs.groups() # 匹配到的分组("123", "abc456")

else:

print "no matched"

3.6 替换

p = re.compile("d{3}")
t = "123abc456def789ghi888"
rs = p.sub("000", t)  # 把所有满足条件的字符串替换成000
if rs is not None:
    print rs  # 返回被替换后的字符串 000abc000def000ghi000
else:
    print "no match"

rs = p.subn("000", t, 2)
if rs is not None:
    print rs  # 返回被替换后的字符串以及替换的个数 ("000abc000def789ghi888", 2)
else:
    print "no match"

p = re.compile("d{3}")

t = "123abc456def789ghi888"

rs = p.sub("000", t) # 把所有满足条件的字符串替换成000

if rs is not None:

print rs # 返回被替换后的字符串 000abc000def000ghi000

else:

print "no match"

rs = p.subn("000", t, 2)

if rs is not None:

print rs # 返回被替换后的字符串以及替换的个数 ("000abc000def789ghi888", 2)

else:

print "no match"

其中，要替换的字符也可以是一个函数，将会把匹配到的字符串以匹配对象类型为形参调用指定函数：

p = re.compile("d{3}")
t = "123abc456def789ghi888"
rs = p.subn(rep_func, t)  # 替换的内容是一个函数
if rs is not None:
    print rs
else:
    print "no matched"


def rep_func(match_obj):  # 替换函数
    match_str = match_obj.group()  # 传入的是匹配对象，通过group方法获取匹配到的内容
    if match_str == "123":
        return "000"
    elif match_str == "456":
        return "111"
    elif match_str == "789":
        return "222"
    else:
        return "999"