Regular Expressions


Oracle’s implementation of the regular expression functionality is wrapped up in the following functions: REGEXP_LIKE, REGEXP_INSTR, REGEXP_SUBSTR, REGEXP_REPLACE and REGEXP_COUNT, and they’re like the superhero alter egos of the more mundane LIKE, INSTR, SUBSTR, REPLACE and COUNT functions.


 REGEXP_LIKE
(source, regexp, modes)
 is probably the one you'll use most. You can use it in the WHERE and HAVING clauses of a SELECT statement. In a PL/SQL script, it returns a Boolean value. You can also use it in a CHECK constraint. The source parameter is the string or column the regex should be matched against. The regexp parameter is a string with your regular expression. The modes parameter is optional. It sets the matching modes.
SELECT * FROM mytable WHERE REGEXP_LIKE(mycolumn, 'regexp', 'i');
IF REGEXP_LIKE('subject', 'regexp') THEN /* Match */ ELSE /* No match */ END IF;
ALTER TABLE mytable ADD (CONSTRAINT mycolumn_regexp CHECK (REGEXP_LIKE(mycolumn, '^regexp$')));

REGEXP_SUBSTR(source, regexp, position, occurrence, modes) returns a string with the part of source matched by the regular expression. If the match attempt fails, NULL is returned. You can use REGEXP_SUBSTR with a single string or with a column. You can use it in SELECT clauses to retrieve only a certain part of a column. The position parameter specifies the character position in the source string at which the match attempt should start. The first character has position 1. The occurrence parameter specifies which match to get. Set it to 1 to get the first match. If you specify a higher number, Oracle will continue to attempt to match the regex starting at the end of the previous match, until it found as many matches as you specified. The last match is then returned. If there are fewer matches, NULL is returned. Do not confuse this parameter with backreferences. Oracle does not provide a function to return the part of the string matched by a capturing group. The last three parameters are optional.

SELECT REGEXP_SUBSTR(mycolumn, 'regexp') FROM mytable;
match := REGEXP_SUBSTR('subject', 'regexp', 1, 1, 'i')

REGEXP_REPLACE(source, regexp, replacement, position, occurrence, modes) returns the source string with one or all regex matches replaced. If no matches can be found, the original string is replaced. If you specify a positive number for occurrence (see the above paragraph) only that match is replaced. If you specify zero or omit the parameter, all matches are replaced. The last three parameters are optional. The replacementparameter is a string that each regex match will be replaced with. You can use the backreferences \1 through \9 in the replacement text to re-insert text matched by a capturing group. You can reference the same group more than once. There's no replacement text token to re-insert the whole regex match. To do that, put parentheses around the whole regexp, and use \1 in the replacement. If you want to insert \1 literally, use the string '\\1'. Backslashes only need to be escaped if they're followed by a digit or another backslash. To insert \\ literally, use the string '\\\\'. While SQL does not require backslashes to be escaped in strings, the REGEXP_REPLACE function does.

SELECT REGEXP_REPLACE(mycolumn, 'regexp', 'replacement') FROM mytable;
result := REGEXP_REPLACE('subject', 'regexp', 'replacement', 1, 0, 'i');

REGEXP_INSTR(source, regexp, position, occurrence, return_option, modes) returns the beginning or ending position of a regex match in the source string. This function takes the same parameters as REGEXP_SUBSTR, plus one more. Set return_option to zero or omit the parameter to get the position of the first character in match. Set it to one to get the position of the first character after the match. The first character in the string has position 1. REGEXP_INSTR returns zero if the match cannot be found. The last 4 parameters are optional.

SELECT REGEXP_INSTR(mycolumn, 'regexp', 1, 1, 0, 'i') FROM mytable;

REGEXP_COUNT(source, regexp, position, modes) returns the number of times the regex can be matched in the source string. It returns zero if the regex finds no matches at all. This function is only available in Oracle 11g and later.

SELECT REGEXP_COUNT(mycolumn, 'regexp', 1, 'i') FROM mytable;

[The possible match parameters are as follows: ‘c’, which stands for ‘case sensitive’; ‘i’, which stands for case insensitive; ‘m’, which means treat the source string as multiple lines; ‘n’ which means allow a period (.) to match new line; and ‘x’, which means ignore whitespace.]
The true power of regular expressions, however, lies not in these functions but in the amazing versatility of the available metacharacters.

. Matches any single character
x? Matches ‘x’ 0 or 1 times
x* Matches ‘x’ 0 or more times
x+ Matches ‘x’ 1 or more times
x|y Matches either ‘x’ or ‘y’
x{n} Matches ‘x’ exactly n times.
x{n,m} Matches ‘x’ at least n times, but no more than m times
x{n,} Matches ‘x’ at least n times
[xyz] Matches ‘x’,’y’ or ‘z’
[^xyz] Matches any single character other than ‘x’,’y’ and ‘z’
(xyz) Takes ‘xyz’ as a group, rather than individual characters
a-z Matches any character from ‘a’ to ‘z’

The regular expression matching information. It can be a combination of the following:

Value Description
^ Matches the beginning of a string. If used with a match_parameter of 'm', it matches the start of a line anywhere within expression.
$ Matches the end of a string. If used with a match_parameter of 'm', it matches the end of a line anywhere within expression.
* Matches zero or more occurrences.
+ Matches one or more occurrences.
? Matches zero or one occurrence.
. Matches any character except NULL.
| Used like an "OR" to specify more than one alternative.
[ ] Used to specify a matching list where you are trying to match any one of the characters in the list.
[^ ] Used to specify a nonmatching list where you are trying to match any character except for the ones in the list.
( ) Used to group expressions as a subexpression.
{m} Matches m times.
{m,} Matches at least m times.
{m,n} Matches at least m times, but no more than n times.
\n n is a number between 1 and 9. Matches the nth subexpression found within ( ) before encountering \n.
[..] Matches one collation element that can be more than one character.
[::] Matches character classes.
[==] Matches equivalence classes.
\d Matches a digit character.
\D Matches a nondigit character.
\w Matches a word character.
\W Matches a nonword character.
\s Matches a whitespace character.
\S matches a non-whitespace character.
\A Matches the beginning of a string or matches at the end of a string before a newline character.
\Z Matches at the end of a string.
*? Matches the preceding pattern zero or more occurrences.
+? Matches the preceding pattern one or more occurrences.
?? Matches the preceding pattern zero or one occurrence.
{n}? Matches the preceding pattern n times.
{n,}? Matches the preceding pattern at least n times.
{n,m}? Matches the preceding pattern at least n times, but not more than m times.
match_parameter

Optional. It allows you to modify the matching behavior for the REGEXP_LIKE condition. It can be a combination of the following:

Value Description
'c' Perform case-sensitive matching.
'i' Perform case-insensitive matching.
'n' Allows the period character (.) to match the newline character. By default, the period is a wildcard.
'm' expression is assumed to have multiple lines, where ^ is the start of a line and $ is the end of a line, regardless of the position of those characters in expression. By default, expression is assumed to be a single line.
'x' Whitespace characters are ignored. By default, whitespace characters are matched like any other character.

The list is much longer – this introductory article will not teach you all you need to know about regular expressions – and, as with every new skill, adeptness will only come with practice, through encountering real-world problems and surmounting them. Practically every string pattern can be matched using regular expressions.
An example: imagine you’re looking for a drink. However, you want to return every complexion of the word: drink, drank, drunk, drinking and drunkard. A regular expression is perfect for this, using the rules above:

SELECT col1
FROM table1
WHERE REGEXP_LIKE (col1,  ‘dr(ink(ing)?|(unk)|(ank)|(unkard))’,’i’);

The simple pattern above uses the following rules: it uses numerous pairs of parentheses to group characters, the ‘?’ to indicate that the grouping immediately before it – ‘(ing)’ – may appear 0 or 1 times, and the ‘|’ to indicate that the pattern should match any of the groupings. And the match parameter ‘i’ indicates that we want our pattern to be case insensitive.

However, before you run off and re-engineer your whole database and every application you have ever built, peppering them with regular expressions, it is worth mentioning that they are not the solution to every problem and they do have their drawbacks.

Firstly, there is their complexity. Believe me, no one is completely fluent in regular expressions; they will always be harder to read than simpler matching functions. So if LIKE, INSTR, SUBSTR and REPLACE meet your needs, you should consider sticking with them.

Also, be aware that like the functions they replace, Oracle’s regular expression functions can result in full-table scans that lead to horribly inefficient queries. Ordinary indexes cannot handle their awesomeness. Fortunately, it is possible to use regular expressions in function-based indexes, speeding up queries that use regular expressions – and even those that do not – considerably.

Oh, and by the way, the following regular expression will match those email addresses for you.

REGEXP_LIKE (email_address, '[a-zA-Z0-9._%-]+@[a-zA-Z0-9._%-]+\.[a-zA-Z]{2,4}');

And, how rude of me, I don’t think I properly introduced myself. My name is:

REGEXP_LIKE(first_name, 'Dav(e|id)');


More Examples:

Example - Match on more than one alternative

The first Oracle REGEXP_LIKE condition example that we will look at involves using the | pattern.

Let's explain how the | pattern works in the Oracle REGEXP_LIKE condition. For example:

SELECT last_name
FROM contacts
WHERE REGEXP_LIKE (last_name, 'Anders(o|e|a)n');

This REGEXP_LIKE example will return all contacts whose last_name is either Anderson, Andersen, or Andersan. The | pattern tells us to look for the letter "o", "e", or "a".

Example - Match on beginning

Next, let's use the REGEXP_LIKE condition to match on the beginning of a string. For example:

SELECT last_name
FROM contacts
WHERE REGEXP_LIKE (last_name, '^A(*)');

This REGEXP_LIKE example will return all contacts whose last_name starts with 'A'.

Example - Match on end

Next, let's use the REGEXP_LIKE condition to match on the end of a string. For example:

SELECT last_name
FROM contacts
WHERE REGEXP_LIKE (last_name, '(*)n$');

This REGEXP_LIKE example will return all contacts whose last_name ends with 'n'.



http://psoug.org/reference/regexp.html

https://oracle-base.com/articles/misc/regular-expressions-support-in-oracle