1. 一般来说,正则表达式就是以某种方式来描述字符串。
在其他语言中,\\表示“我想要在正则表达式中插入一个普通的(字面上的)反斜线,请不要给它任何特殊的意义。”而在Java中,\\的意思是“我要插入一个正则表达式的反斜线,所以其后的字符具有特殊的意义。”例如,如果你想表示一位数字,那么正则表达式应该是\\d。如果你想插入一个普通的反斜线,则应该这样\\\\。不过换行和制表符之类的东西只需使用单反斜线:\n\t。
?表示可能有某个字符。如-?表示可能有一个负号在前面。
+表示一个或多个之前的表达式。
2. String类自带正则表达式工具:
1)matches , 检查string是否匹配正则表达式
"-1234".matches("-?\\d+"); //true2) split , 将字符串从正则表达式匹配的地方切开(匹配的部分被删除)
"you must do it".split("\\W+");//you, must, do, it
3) replaceFirst ,replaceAll , 替换
"you found it".replaceFirst("f\\w+","located"); //"you located it" "you found it".replaceAll("f\\w+","located"); //"you located it"3. 创建正则表达式(导入java.util.regex)
Construct | Matches |
---|---|
Characters | |
x | The character x |
\\ | The backslash character |
\0n | The character with octal value 0n (0 <= n <= 7) |
\0nn | The character with octal value 0nn (0 <= n <= 7) |
\0mnn | The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7) |
\xhh | The character with hexadecimal value 0xhh |
\uhhhh | The character with hexadecimal value 0xhhhh |
\x{h...h} | The character with hexadecimal value 0xh...h (Character.MIN_CODE_POINT <= 0xh...h <= Character.MAX_CODE_POINT ) |
\t | The tab character (‘\u0009‘) |
\n | The newline (line feed) character (‘\u000A‘) |
\r | The carriage-return character (‘\u000D‘) |
\f | The form-feed character (‘\u000C‘) |
\a | The alert (bell) character (‘\u0007‘) |
\e | The escape character (‘\u001B‘) |
\cx | The control character corresponding to x |
Character classes | |
[abc] |
a , b , or c (simple class) |
[^abc] |
Any character except a , b , or
c (negation) |
[a-zA-Z] |
a through z or A throughZ , inclusive (range) |
[a-d[m-p]] |
a through d , or m throughp :[a-dm-p] (union) |
[a-z&&[def]] |
d , e , or f (intersection) |
[a-z&&[^bc]] |
a through z , except for b andc :[ad-z] (subtraction) |
[a-z&&[^m-p]] |
a through z , and not m throughp :[a-lq-z] (subtraction) |
Predefined character classes | |
. | Any character (may or may not match line terminators) |
\d | A digit: [0-9] |
\D | A non-digit: [^0-9] |
\h | A horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000] |
\H | A non-horizontal whitespace character: [^\h] |
\s | A whitespace character: [ \t\n\x0B\f\r] |
\S | A non-whitespace character: [^\s] |
\v | A vertical whitespace character: [\n\x0B\f\r\x85\u2028\u2029] |
\V | A non-vertical whitespace character: [^\v] |
\w | A word character: [a-zA-Z_0-9] |
\W | A non-word character: [^\w] |
interface CharSequence { charAt(int i); length(); subSequence(int start, int end); toString(); }6. Pattern和Matcher
Pattern p=Pattern.compile("abc+"); Matcher m=p.matcher("abcabcac"); while(m.find){ println("Match \""+m.group()+"\" at positions "+m.start()+"-"+(m.end()-1)); //Match "abc" at positions 0-2 //Match "abc" at positions 3-5 }Matcher还有matches,
原文:http://blog.csdn.net/libinjlu/article/details/23875201