Tuesday, July 29, 2008

Regular Expression


/^.+[^\.].*\.[a-z]{2,}$/i

It is used to check FQHN (fully qualified host name). It is understood as -

^ begin with
.+ one or more characters
[^\.] one character not '.' dot (that is why a.ca wrong, and ab.ca correct)

.* zero or more characters (to ignore all dots in the middle and focus on last one only)
\. dot (actually last dot)
[a-z]{2,} two or more letters in the alphabet
$ end of the string

For instance, a.ca (NO), ab.ca (YES), ab.a.ca (YES)

Parsing Regexp is recursive for best match, not just parsed once


String str = "serial 5555555555555555 roger test5 AP3610";
Pattern pattern = Pattern.compile("serial[ ](\\S+)[ ](.*)[ ](\\S+)");
Matcher matcher = pattern.matcher(str);

if (matcher.find()) {
for (int i = 0; i <= matcher.groupCount(); i++) {
System.out.println("Group" + i + ":" + matcher.group(i));
}
}

// The result is:
// Group0:serial 5555555555555555 roger test5 AP3610
// Group1:5555555555555555
// Group2:roger test5
// Group3:AP3610

\S A non-whitespace character
Note that group2 can contains whitespace.

No comments: