In JavaScript, regular expressions are represented by RegExp objects. RegExp objects can be created using the RegExp() constructor, but more often they are created using a special literal syntax. Just as string literals are specified as characters enclosed in quotation marks, regular expression literals are specified as characters enclosed in a pair of slash characters / .

/pattern/flags new RegExp("pattern"[, search options])

pattern is a regular expression for the search (more on replacement later), and flags are a string of any combination of the characters g (global search), i (case is not important) and m (multi-line search). The first method is used often, the second - sometimes. For example, two such calls are equivalent.

Search options

When creating a regular expression, we can specify additional search options

Characters in JavaScript Regular Expressions Symbol Correspondence
Alphanumeric charactersCorrespond to themselves
\0 NUL character (\u0000)
\tTab (\u0009)
\nLine feed (\u000A)
\vVertical tab (\u000B)
\fPage translation (\u000C)
\rCarriage return (\u000D)
\xnnLatin character set hexadecimal number nn; for example, \x0A is the same as \n
\uxxxxUnicode character specified by hexadecimal number xxxx; for example, \u0009 is the same as \t
\cXThe control character "X", for example, the sequence \cJ is equivalent to the newline character \n
\ For regular characters - makes them special. For example, the expression /s/ simply looks for the character "s". And if you put \ before s, then /\s/ already denotes a space character. And vice versa, if the character is special, for example *, then \ will make it just a regular “asterisk” character. For example, /a*/ searches for 0 or more consecutive "a" characters. To find a with an asterisk "a*" - put \ in front of the special. symbol: /a\*/ .
^ Indicates the beginning of the input data. If the multiline search flag ("m") is set, it will also fire on the start of a new line. For example, /^A/ will not find the "A" in "an A", but will find the first "A" in "An A."
$ Indicates the end of the input data. If the multiline search flag is set, it will also work at the end of the line. For example, /t$/ will not find "t" in "eater", but will find it in "eat".
* Indicates repetition 0 or more times. For example, /bo*/ will find "boooo" in "A ghost booooed" and "b" in "A bird warbled", but will find nothing in "A goat grunted".
+ Indicates repetition 1 or more times. Equivalent to (1,). For example, /a+/ will match the "a" in "candy" and all the "a" in "caaaaaaandy".
? Indicates that the element may or may not be present. For example, /e?le?/ will match "el" in "angel" and "le" in "angle." If used immediately after one of the quantifiers * , + , ? , or () , then specifies a "non-greedy" search (repeating the minimum number of times possible, to the nearest next pattern element), as opposed to the default "greedy" mode, which maximizes the number of repetitions, even if the next pattern element also matches. Additionally , ? used in the preview, which is described in the table under (?=) , (?!) , and (?:) .
. (Decimal point) represents any character other than a newline: \n \r \u2028 or \u2029. (you can use [\s\S] to search for any character, including newlines). For example, /.n/ will match "an" and "on" in "nay, an apple is on the tree", but not "nay".
(x)Finds x and remembers. This is called "memory brackets". For example, /(foo)/ will find and remember "foo" in "foo bar." The found substring is stored in the search result array or in the predefined properties of the RegExp object: $1, ..., $9. In addition, the parentheses combine what is contained in them into a single pattern element. For example, (abc)* - repeat abc 0 or more times.
(?:x)Finds x, but does not remember what it finds. This is called "memory parentheses". The found substring is not stored in the results array and RegExp properties. Like all brackets, they combine what is in them into a single subpattern.
x(?=y)Finds x only if x is followed by y. For example, /Jack(?=Sprat)/ will only match "Jack" if it is followed by "Sprat". /Jack(?=Sprat|Frost)/ will only match "Jack" if it is followed by "Sprat" or "Frost". However, neither "Sprat" nor "Frost" will appear in the search result.
x(?!y)Finds x only if x is not followed by y. For example, /\d+(?!\.)/ will only match a number if it is not followed by a decimal point. /\d+(?!\.)/.exec("3.141") will find 141, but not 3.141.
x|yFinds x or y. For example, /green|red/ will match "green" in "green apple" and "red" in "red apple."
(n)Where n is a positive integer. Finds exactly n repetitions of the preceding element. For example, /a(2)/ will not find the "a" in "candy," but will find both a's in "caandy," and the first two a's in "caaandy."
(n,)Where n is a positive integer. Finds n or more repetitions of an element. For example, /a(2,) will not find "a" in "candy", but will find all "a" in "caandy" and in "caaaaaaandy."
(n,m)Where n and m are positive integers. Find from n to m repetitions of the element.
Character set. Finds any of the listed characters. You can indicate spacing by using a dash. For example, - the same as . Matches "b" in "brisket" and "a" and "c" in "ache".
[^xyz]Any character other than those specified in the set. You can also specify a span. For example, [^abc] is the same as [^a-c] . Finds "r" in "brisket" and "h" in "chop."
[\b]Finds the backspace character. (Not to be confused with \b .)
\bFinds a (Latin) word boundary, such as a space. (Not to be confused with [\b]). For example, /\bn\w/ will match "no" in "noonday"; /\wy\b/ will find "ly" in "possibly yesterday."
\BIt does not indicate a word boundary. For example, /\w\Bn/ will match "on" in "noonday", and /y\B\w/ will match "ye" in "possibly yesterday."
\cXWhere X is a letter from A to Z. Indicates a control character in a string. For example, /\cM/ represents the Ctrl-M character.
\dfinds a number from any alphabet (ours is Unicode). Use to find only regular numbers. For example, /\d/ or // will match the "2" in "B2 is the suite number."
\DFinds a non-numeric character (all alphabets). [^0-9] is the equivalent for regular numbers. For example, /\D/ or /[^0-9]/ will match the "B" in "B2 is the suite number."
\sFinds any whitespace character, including space, tab, newline, and other Unicode whitespace characters. For example, /\s\w*/ will match "bar" in "foo bar."
\SFinds any character except whitespace. For example, /\S\w*/ will match "foo" in "foo bar."
\vVertical tab character.
\wFinds any word (Latin alphabet) character, including letters, numbers and underscores. Equivalent. For example, /\w/ will match "a" in "apple," "5" in "$5.28," and "3" in "3D."
\WFinds any non-(Latin) verbal character. Equivalent to [^A-Za-z0-9_] . For example, /\W/ and /[^$A-Za-z0-9_]/ will equally match "%" in "50%."
Working with Regular Expressions in Javascript

Working with regular expressions in Javascript is implemented using methods of the String class

exec(regexp) - finds all matches (entries in the regular pattern) in a string. Returns an array (if there is a match) and updates the regexp property, or null if nothing is found. With the g modifier - each time this function is called, it will return the next match after the previous one found - this is implemented by maintaining an offset index of the last search.

match(regexp) - find part of a string using a pattern. If the g modifier is specified, then match() returns an array of all matches or null (rather than an empty array). Without the g modifier, this function works like exec();

test(regexp) - the function checks a string for matching a pattern. Returns true if there is a match, and false if there is no match.

split(regexp) - Splits the string it is called on into an array of substrings, using the argument as a delimiter.

replace(regexp, mix) - the method returns a modified string in accordance with the template (regular expression). The first parameter to regexp can also be a string rather than a regular expression. Without the g modifier, the method in the line replaces only the first occurrence; with the modifier g - a global replacement occurs, i.e. all occurrences in a given line are changed. mix - replacement template, can accept the values ​​of a string, replacement template, function (function name).

Special characters in the replacement stringReplacement via function

If you specify a function as the second parameter, it is executed for each match. A function can dynamically generate and return a substitution string. The first parameter of the function is the found substring. If the first argument to replace is a RegExp object, then the next n parameters contain nested parentheses matches. The last two parameters are the position in the line where the match occurred and the line itself.

Regular expressions allow you to perform a flexible search for words and expressions in texts with the aim of removing, extracting or replacing them.

Syntax:

//The first option for creating a regular expression var regexp=new RegExp(template, modifiers); //Second option for creating a regular expression var regexp=/pattern/modifiers ;

template allows you to specify a pattern of characters to search for.

modifiers allow you to customize search behavior:

  • i - search without taking into account the case of letters;
  • g - global search (all matches in the document will be found, not just the first);
  • m - multi-line search.
Search for words and expressions

The simplest use of regular expressions is to search for words and expressions in various texts.

Here is an example of using search using modifiers:

//Set the regular expression rv1 rv1=/Russia/; //Specify the regular expression rv2 rv2=/Russia/g; //Specify the regular expression rv3 rv3=/Russia/ig; //Bold indicates where matches will be found in the text when using //the expression rv1: Russia is the largest state in the world. Russia borders on 18 countries. RUSSIA is a successor state of the USSR. //Bold indicates where matches will be found in the text when using //the expression rv2: Russia is the largest state in the world. Russia borders on 18 countries. RUSSIA is a successor state of the USSR."; //Bold font indicates where matches will be found in the text when using //the expression rv3: Russia is the largest state in the world. Russia borders on 18 countries. RUSSIA is a successor state of the USSR.";

Special characters

In addition to regular characters, special characters (metacharacters) can be used in regular expression patterns. Special characters with descriptions are shown in the table below:

Special character Description
. Matches any character except the end of line character.
\w Matches any alphabetic character.
\W Matches any non-alphabetic character.
\d Matches characters that are numbers.
\D Matches characters that are not numbers.
\s Matches whitespace characters.
\S Matches non-whitespace characters.
\b Matches will only be found at word boundaries (beginning or ending).
\B Matches will be searched only on non-word boundaries.
\n Matches the newline character.

/* The reg1 expression will find all words starting with two arbitrary letters and ending with "vet". Since the words in the sentence are separated by a space, we will add a special character \s at the beginning and at the end */ reg1=/\s..vet\s/g; txt="hello covenant corduroy closet"; document.write(txt.match(reg1) + "
"); /* The reg2 expression will find all words starting with three arbitrary letters and ending with "vet" */ reg2=/\s...vet\s/g; document.write(txt.match(reg2) + "
"); txt1=" hi2hello hi 1hello "; /* The reg3 expression will find all words that start with "at" followed by 1 digit and end with "vet" */ var reg3=/at\dvet/g; document .write(txt1.match(reg3) + "
"); // The expression reg4 will find all the numbers in the text var reg4=/\d/g; txt2="5 years of study, 3 years of sailing, 9 years of shooting." document.write(txt2.match(reg4) + "
");

Quick view

Symbols in square brackets

Using square brackets[keyu] You can specify a group of characters to search for.

The ^ symbol before a group of characters in square brackets [^квг] indicates that you need to search for all characters of the alphabet except the specified ones.

By using a dash (-) between characters in square brackets [a-z] you can specify a range of characters to search for.

You can also search for numbers using square brackets.

//Set the regular expression reg1 reg1=/\sko[tdm]\s/g; //Set a text string txt1 txt1=" cat braid code chest of drawers com carpet "; //Using the regular expression reg1, search for the string txt1 document.write(txt1.match(reg1) + "
"); reg2=/\sslo[^tg]/g; txt2=" slot elephant syllable "; document.write(txt2.match(reg2) + "
"); reg3=//g; txt3="5 years of study, 3 years of swimming, 9 years of shooting"; document.write(txt3.match(reg3));

Quick view

Quantifiers

A quantifier is a construct that allows you to specify how many times the preceding character or group of characters must appear in a match.

Syntax:

//The preceding character must occur x - times (x) //The preceding character must occur between x and y times inclusive (x,y) //The preceding character must occur at least x times (x,) //Indicates that the preceding character must occur 0 or more times * //Indicates that the preceding character must occur 1 or more times + //Indicates that the preceding character must occur 0 or 1 time ?


//Specify the regular expression rv1 rv1=/ko(5)shka/g //Specify the regular expression rv2 rv2=/ko(3,)shka/g //Specify the regular expression rv3 rv3=/ko+shka/g //Specify regular expression rv4 rv4=/ko?shka/g //Set the regular expression rv5 rv5=/ko*shka/g //Bold font shows where in the text matches will be found when using //the expression rv1: kshka cat kooshka koooshka kooooshka kooooshka kooooooshka kooooooshka //Bold shows where in the text matches will be found when using //the expression rv2: kshka cat kooshka koooshka kooooshka koooooshka kooooooshka kooooooshka //Bold font shows where in the text matches will be found when using //the expression rv3: koshka cat Kooshka Koooshoshka coooo -oooo cooooo cooooooo // Breeds shows where in the text the coincidences will be found when using // Expressions RV4: kshka Koshka Kooshka Kooooo Koooooooooooooo // with a fat fiber, where it will be found in the text Denia when using // Expressions rv5: kshka cat koooshka koooshka kooooshka koooooshka koooooshka koooooshka koooooshka

Please note: if you want to use any special character (such as . * + ? or ()) as a regular character, you must precede it with a \.

Using parentheses

Enclosing the regular expression pattern part in parentheses You tell the expression to remember the match found by that part of the pattern. The saved match can be used later in your code.

For example, the regular expression /(Dmitry)\sVasiliev/ will find the string “Dmitry Vasiliev” and remember the substring “Dmitry”.

In the example below, we use the replace() method to change the order of words in the text. We use $1 and $2 to access stored matches.

Var regexp = /(Dmitry)\s(Vasiliev)/; var text = "Dmitry Vasiliev"; var newtext = text.replace(regexp, "$2 $1"); document.write(newtext);

Quick view

Parentheses can be used to group characters before quantifiers.

Modifiers

The minus symbol (-) placed next to a modifier (except for U) creates its negation.

Special characters Analogue Description
() subpattern, nested expression
wildcard
(a,b) number of occurrences from "a" to "b"
| logical "or", in the case of single-character alternatives use
\ escape special character
. any character except line feed
\d decimal digit
\D[^\d]any character other than a decimal digit
\f end (page break)
\n line feed
\pL letter in UTF-8 encoding when using the u modifier
\r carriage return
\s[\t\v\r\n\f]space character
\S[^\s]any character except the flashing one
\t tabulation
\w any number, letter or underscore
\W[^\w]any character other than a number, letter, or underscore
\v vertical tab
Special characters within a character class Position within a string Example Compliance Description
^ ^aaaaaaastart of line
$ a$aaa aaaend of line
\A\Aaaaaaaa
aaa aaa
beginning of the text
\za\zaaa aaa
aaa aaa
end of text
\ba\b
\ba
aaa aaa
a aa a aa
word boundary, statement: the previous character is verbal, but the next one is not, or vice versa
\B\Ba\Baa aa aano word boundary
\G\Gaaaa aaaPrevious successful search, the search stopped at the 4th position - where a
Download in PDF, PNG. Anchors

Anchors in regular expressions indicate the beginning or end of something. For example, lines or words. They are represented by certain symbols. For example, a pattern matching a string starting with a number would look like this:

Here the ^ character denotes the beginning of the line. Without it, the pattern would match any string containing a digit.

Character classes

Character classes in regular expressions match a certain set of characters at once. For example, \d matches any number from 0 to 9 inclusive, \w matches letters and numbers, and \W matches all characters other than letters and numbers. The pattern identifying letters, numbers and space looks like this:

POSIX

POSIX is a relatively new addition to the regular expression family. The idea, as with character classes, is to use shortcuts that represent some group of characters.

Statements

Almost everyone has trouble understanding affirmations at first, but as you become more familiar with them, you'll find yourself using them quite often. Assertions provide a way to say, “I want to find every word in this document that includes the letter “q” and is not followed by “werty.”

[^\s]*q(?!werty)[^\s]*

The above code starts by searching for any characters other than space ([^\s]*) followed by q . The parser then reaches a forward-looking assertion. This automatically makes the preceding element (character, group, or character class) conditional—it will match the pattern only if the statement is true. In our case, the statement is negative (?!), that is, it will be true if what is being sought in it is not found.

So, the parser checks the next few characters against the proposed pattern (werty). If they are found, then the statement is false, which means the character q will be “ignored”, that is, it will not match the pattern. If werty is not found, then the statement is true, and everything is in order with q. Then the search continues for any characters other than space ([^\s]*).

Quantifiers

Quantifiers allow you to define a part of a pattern that must be repeated several times in a row. For example, if you want to find out whether a document contains a string of 10 to 20 (inclusive) letters "a", then you can use this pattern:

A(10,20)

By default, quantifiers are “greedy”. Therefore, the quantifier +, meaning "one or more times", will correspond to the maximum possible meaning. Sometimes this causes problems, in which case you can tell the quantifier to stop being greedy (become "lazy") by using a special modifier. Look at this code:

".*"

This pattern matches the text enclosed in double quotes. However, your source line could be something like this:

Hello World

The above template will find the following substring in this line:

"helloworld.htm" title="Hello World" !}

He turned out to be too greedy, grabbing the largest piece of text he could.

".*?"

This pattern also matches any characters enclosed in double quotes. But the lazy version (notice the modifier?) looks for the smallest possible occurrence, and will therefore find each double-quoted substring individually:

"helloworld.htm" "Hello World"

Escaping in regular expressions

Regular expressions use some characters to represent various parts template. However, a problem arises if you need to find one of these characters in a string, just like a regular character. A dot, for example, in a regular expression means “any character other than a line break.” If you need to find a point in a string, you can't just use " . » as a template - this will lead to finding almost anything. So, you need to tell the parser that this dot should be considered a regular dot and not "any character". This is done using an escape sign.

An escape character preceding a character such as a dot causes the parser to ignore its function and treat it as a normal character. There are several characters that require such escaping in most templates and languages. You can find them in the lower right corner of the cheat sheet (“Meta Symbols”).

The pattern for finding a point is:

\.

Other special characters in regular expressions match unusual elements in text. Line breaks and tabs, for example, can be typed on the keyboard but are likely to confuse programming languages. The escape character is used here to tell the parser to treat the next character as a special character rather than a regular letter or number.

Special escaping characters in regular expressions String substitution

String substitution is described in detail in the next paragraph, “Groups and Ranges,” but the existence of “passive” groups should be mentioned here. These are groups that are ignored during substitution, which is very useful if you want to use an "or" condition in a pattern, but do not want that group to take part in the substitution.

Groups and Ranges

Groups and ranges are very, very useful. It's probably easier to start with ranges. They allow you to specify a set of suitable characters. For example, to check whether a string contains hexadecimal digits (0 to 9 and A to F), you would use a range like this:

To check the opposite, use a negative range, which in our case fits any character except numbers from 0 to 9 and letters from A to F:

[^A-Fa-f0-9]

Groups are most often used when an "or" condition is needed in a pattern; when you need to refer to part of a template from another part of it; and also when substituting strings.

Using "or" is very simple: the following pattern looks for "ab" or "bc":

If in a regular expression it is necessary to refer to one of the previous groups, you should use \n , where instead of n substitute the number of the desired group. You may want a pattern that matches the letters "aaa" or "bbb" followed by a number and then the same three letters. This pattern is implemented using groups:

(aaa|bbb)+\1

The first part of the pattern looks for "aaa" or "bbb", combining the letters found into a group. This is followed by a search for one or more digits (+), and finally \1. The last part of the pattern references the first group and looks for the same thing. It looks for a match with the text already found by the first part of the pattern, not a match to it. So "aaa123bbb" will not satisfy the above pattern since \1 will look for "aaa" after the number.

One of the most useful tools in regular expressions is string substitution. When replacing text, you can reference the found group using $n . Let's say you want to highlight all the words "wish" in a text in bold. To do this, you should use a regular expression replace function, which might look like this:

Replace(pattern, replacement, subject)

The first parameter will be something like this (you may need a few extra characters for this particular function):

([^A-Za-z0-9])(wish)([^A-Za-z0-9])

It will find any occurrences of the word "wish" along with the previous and next characters, as long as they are not letters or numbers. Then your substitution could be like this:

$1$2$3

It will replace the entire string found using the pattern. We start replacing with the first character found (that is not a letter or a number), marking it $1 . Without this, we would simply remove this character from the text. The same goes for the end of the substitution ($3). In the middle we added HTML tag for bold (of course, you can use CSS or instead), highlighting the second group found from the template ($2).

Template modifiers

Template modifiers are used in several languages, most notably Perl. They allow you to change how the parser works. For example, the i modifier causes the parser to ignore cases.

Regular expressions in Perl are surrounded by the same character at the beginning and at the end. This can be any character (most often “/” is used), and it looks like this:

/pattern/

Modifiers are added to the end of this line, like this:

/pattern/i

Meta characters

Finally, the last part of the table contains meta characters. These are characters that have special meaning in regular expressions. So if you want to use one of them as a regular character, then it needs to be escaped. To check for the presence of a parenthesis in the text, use the following pattern:

The cheat sheet is a general guide to regular expression patterns without taking into account the specifics of any language. It is presented in the form of a table that fits on one printed sheet of A4 size. Created under a Creative Commons license based on a cheat sheet authored by Dave Child. Download in PDF, PNG.


The syntax of regular expressions is quite complex and requires serious effort to learn. The best guide to regular expressions today is J. Friedl's book "Regular Expressions", which, in the author's words, allows you to "learn to think in regular expressions."

Basic Concepts

Regular expression is a means of processing strings or a sequence of characters that defines a text pattern.

Modifier - is intended to “instruct” the regular expression.

Metacharacters are special characters that serve as commands in the regular expression language.

A regular expression is set as a regular variable, only a slash is used instead of quotes, for example: var reg=/reg_expression/

By the simplest templates we mean those templates that do not require any special characters.

Let's say our task is to replace all letters "r" (small and capital) with the Latin capital letter "R" in the phrase Regular Expressions.

Create a template var reg=/р/ and using the method replace we carry out our plans



var reg=/р/

document.write(result)

As a result, we get the line - Regular expressions, the replacement occurred only on the first occurrence of the letter “p”, taking into account the case.

But this result does not fit the conditions of our task... Here we need the modifiers “g” and “i”, which can be used both separately and together. These modifiers are placed at the end of the regular expression pattern, after the slash, and have the following meanings:

modifier "g" - sets the search in the line as "global", i.e. in our case, the replacement will occur for all occurrences of the letter “p”. Now the template looks like this: var reg=/р/g , substituting it in our code


var str="Regular expressions"
var reg=/р/g
var result=str.replace(reg, "R")
document.write(result)

we get the string - Regular expressions.

modifier "i" - specifies a case-insensitive search in a string. By adding this modifier to our template var reg=/р/gi, after executing the script we will get the desired result of our task - regular expressions.

Special characters (metacharacters)

Metacharacters specify the type of characters of the searched string, the way the searched string is surrounded in the text, as well as the number of characters of a particular type in the viewed text. Therefore, metacharacters can be divided into three groups:

  • Metacharacters for searching for matches.
  • Quantitative metacharacters.
  • Positioning metacharacters.
Metacharacters for matching

Meaning

Description

word boundary

specifies a condition under which the pattern should be executed at the beginning or end of a word

/\ber/ matches error, does not match hero or with player
/er/ matches player, does not match hero or with error
/\ber\b/ does not match hero or with player or with error, can only coincide with er

not a word limit

specifies a condition under which the pattern is not executed at the beginning or end of a word

/\Ber/ matches hero or with player, does not match error
/er\B/ matches error or with player, does not match hero
/\Ber\B/ matches hero, does not match player or with error

number from 0 to 9

/\d\d\d\d/ matches any four-digit number

/\D\D\D\D/ will not match 2005 or 05.g or №126 etc.

single empty character

matches the space character

\over\sbyte\ matches only over byte

single non-blank character

any single character except space

\over\Sbyte\ matches over-byte or with over_byte, does not match over byte or over-byte

letter, number or underscore

/A\w/ matches A1 or with AB, does not match A+

not a letter, number or underscore

/A\W/ does not match A1 or with AB, coincides with A+

any character

any signs, letters, numbers, etc.

/.../ matches any three characters ABC or !@4 or 1 q

character set

specifies a condition under which the pattern must be executed for any match of characters enclosed in square brackets

/WERTY/ matches QWERTY, With AWERTY

set of non-included characters

specifies a condition under which the pattern should not be executed for any match of characters enclosed in square brackets

/[^QA]WERTY/ does not match QWERTY, With AWERTY

The characters listed in the "Match Search Metacharacters" table should not be confused with the sequence of escape characters used in strings, such as \\t - tab, \\n - newline, etc.

Quantitative metacharacters

Number of matches

Zero or more times

/Ja*vaScript/ matches JavaScript or with JavaScript or with JavaScript, does not match JovaScript

Zero or one time

/Ja?vaScript/ matches only JavaScript or with JavaScript

One or more times

/Ja+vaScript/ matches JavaScript or with JavaScript or with JavaScript, does not match JavaScript

exactly n times

/Ja(2)vaScript/ matches only JavaScript

n or more times

/Ja(2,)vaScript/ matches JavaScript or with JavaScript, does not match JavaScript or with JavaScript

at least n times, but not more than m times

/Ja(2,3)vaScript/ matches only JavaScript or with JavaScript

Each character listed in the Quantitative Metacharacters table applies to one preceding character or metacharacter in the regular expression.

Positioning metacharacters

The last set of metacharacters are intended to indicate whether to look for (if important) the substring at the beginning of the line or at the end.

Some methods for working with templates

replace - this method we already used it at the very beginning of the article, it is designed to search for a pattern and replace the found substring with a new substring.

exec - this method performs a string match against the pattern specified by the template. If pattern matching fails, null is returned. Otherwise, the result is an array of substrings matching the given pattern. /*The first element of the array will be equal to the source string that satisfies the given pattern*/

For example:


var reg=/(\d+).(\d+).(\d+)/
var arr=reg.exec("I was born on September 15, 1980")
document.write("Date of birth: ", arr, "< br>")
document.write("Birthday: ", arr, "< br>")
document.write("Birth month: ", arr, "< br>")
document.write("Year of birth: ", arr, "< br>")

As a result, we get four lines:
Date of birth: 09/15/1980
Birthday: 15
Birth month: 09
Year of birth: 1980

Conclusion

The article does not show all the capabilities and delights of regular expressions; for a deeper study of this issue, I advise you to study the RegExp object. I also want to draw your attention to the fact that the syntax of regular expressions is no different in both JavaScript and PHP. For example, to check whether an e-mail is entered correctly, the regular expression for both JavaScript and PHP will look the same /+@+.(2,3)/i .

JavaScript - Lesson 14. Regular expressions The topic of regular expressions is quite extensive and cannot be covered in one lesson, but the purpose of our lessons is to give you a basic understanding of the javascript language and its capabilities, therefore it is impossible to ignore regular expressions.

First, let's figure out what it is.
Regular expression- this is an instruction that describes in a specially developed language (RegExp) the law of “similarity” of the desired string with the pattern.

What is this for? For example:

  • To organize a search for something in the text.
  • To replace one part of substrings with others.
  • To check the correctness of user input (you've probably come across a situation more than once when you entered your email address into some form and received an error like "Invalid email").

We won’t go into details, but let’s look at how to set regular expressions. There are two ways, in this lesson we will look at one (creation in literal notation):

Var p=/pattern/flags;

Where
pattern- a pattern is the basis of a regular expression that defines string matching criteria. Consists of literals and metacharacters.
flags- flags (modifiers), specify additional options pattern matching.

Var par=/+/i;

Here + - a template that literally means the following “any number of numbers and letters 1 or more times” (we’ll see how to set a template below).

i

To make it clearer what we’re talking about, let’s look at an example. Let's assume that we have a form where the user enters his email and password. We want that when you click on the "Register" button, the input is checked for correctness.

The HTML page code will be as follows:

Javascript regular expressions

Registration form

So what should the function do? prov_address()? To begin, we need two variables in which we will place the values ​​entered by the user:

function prov_adress(obj) ( var adr=obj.mail.value; var par=obj.pas.value; )

Now we need to set patterns (regular expressions) with which we will compare what the user entered. Here, I’ll just give them; we’ll talk about how to compose them later:

function prov_adress(obj) ( var adr=obj.mail.value; var par=obj.pas.value; var adr_pattern=/+@+\.(2,5)/i; var par_pattern=/+/i; )

Now we check for pattern matching. To do this, we will use the method test object RegExp:

function prov_adress(obj) ( var adr=obj.mail.value; var par=obj.pas.value; var adr_pattern=/+@+\.(2,5)/i; var par_pattern=/+/i; var prov=adr_pattern.test(adr); var prov1=par_pattern.test(par);

Line adr_pattern.test(adr) means the following: check for existence in a string adr sequence matching the regular expression adr_pattern. Method test returns a boolean value (true or false).

All we have to do is indicate in our function what to do in case of a successful (or unsuccessful) check:

function prov_adress(obj) ( var adr=obj.mail.value; var par=obj.pas.value; var adr_pattern=/+@+\.(2,5)/i; var par_pattern=/+/i; var prov=adr_pattern.test(adr); var prov1=par_pattern.test(par); if (prov==true && prov1==true) ( ​​alert("You are registered!"); ) else ( alert("The entered data is incorrect !"); ) )

Done, I hope you understand the essence of what we are doing. But before we check the operation of our script, let's see what our regular expressions consist of.

Let's take a regular expression for our password - /+/i:

  • /+/ - a template in which:
    • 0-9 - any number.

    • a-z- any lowercase letter from a to z.

    • - square brackets mean that the pattern may contain any of the literals listed in them (in our case, numbers and lowercase letters)

    • + - indicates that this part of the pattern (i.e., what is in square brackets) can be repeated one or more times.


  • i- a flag indicating that character case does not matter.

In other words, our regular expression specifies that the password can contain any number of numbers and letters 1 or more times (i.e. it can consist of one number, one letter, many numbers, many letters, numbers and letters) .

For example, if the user enters "2", "a3b" or "leopard" in the password field, then such a password will be considered correct. And if he enters “ab&s” or “24?”, then such a password will not be considered correct, because it contains special characters, and we did not allow them in the regular expression.

I hope it is now clear how and why you can use regular expressions, all that remains is to learn the principles of their composition. Strictly speaking, the task of composing a regular expression comes down to creating its template. And the template, as you remember, can consist of literals and metacharacters.

Let's start with the simplest thing - literals:

  • Each of these symbols represents itself. For example, /abc/ - only the string "abc" matches this pattern.

  • a-z- all lowercase letters from a to z. For example, /a-z/ - this pattern matches 26 strings: "a", "b", "c"... "z"

  • A-Z- all capital letters from A to Z.

  • 0-9 - all numbers.

If we want to indicate that there can be several numbers or letters, we will have to use control characters:
  • * - indicates that the character (or part of the pattern, if enclosed in square brackets) can be repeated 0 or more times. For example, /ab*c/ - means that the line begins with the character a, then there can be any number of b characters, followed by the character c. Those. these could be, for example, the following strings: "ac", "abc", "abbbbbbc", etc.

  • + - indicates that the character (or part of the pattern, if enclosed in square brackets) can be repeated 1 or more times. For example, /ab+c/ - means that the line begins with the character a, then there can be any number of b characters (but not less than 1), followed by the character c. Those. these could be, for example, the following strings: "abc", "abbbbbbc", etc.

  • . - indicates that this place can contain any single character except a newline character. For example, for the pattern /ab.c/ the following strings are comparable: "ab6c", "abxc", "ab=c", etc.

  • ? - indicates that the character (or part of the pattern, if enclosed in square brackets) can be repeated 0 or 1 time. For example, /ab?c/ - means that the line begins with the character a, then there may or may not be one character b, followed by the character c. Those. these could be the following strings: "ac", "abc"

  • (n)- indicates that the character (or part of the pattern, if enclosed in square brackets) can be repeated exactly n times. For example, /ab(3)c/ - means that the line begins with the character a, then there are 3 characters b, followed by the character c. Those. this will be the string "abbbc".

  • (n,)- indicates that the character (or part of the pattern, if enclosed in square brackets) can be repeated n or more times. For example, /ab(3,)c/ - means that the line begins with the character a, then there are 3 or more characters b, followed by the character c. Those. these could be the following strings: "abbbc", "abbbbbbbc", etc.

  • (n, m)- indicates that the character (or part of the pattern, if enclosed in square brackets) can be repeated n to m times. For example, /ab(1,3)c/ - means that the line begins with the character a, then there are 1 to 3 characters b, followed by the character c. Those. these could be the following strings: "abc", "abbc", "abbc".

  • - such a pattern is comparable to any single character belonging to the set defined in parentheses. A set is specified by an enumeration or by specifying a range. For example, the pattern // can be matched by the following strings: "a", "b", "c".

  • [^] - such a pattern is comparable to any single character that does not belong to the set defined in parentheses. For example, the pattern /[^abc]/ can match the strings: "f", "x", "Z", but cannot match the strings: "a", "b", "c".

  • ^ - indicates that the characters are comparable to the beginning of the line. For example, the pattern /^abc/ can match the strings: "abcd", "abcfh", but cannot match the strings: "dabc", "cbabc", etc.

  • $ - indicates that the characters match the end of the line. For example, the pattern /abc$/ can match the strings: "dabc", "fhabc", but cannot match the strings: "abcd", "abccb", etc.

  • | - indicates several alternative patterns. For example, the pattern /ab|c/ will match the following strings: "ab" and "c".

  • \ - serves to escape special characters, i.e. a backslash before a character indicates that it should be interpreted as special. For example:
    • \d- matches any number from 0 to 9.

    • \D- everything matches except the number.

    • \s- corresponds to a space.

    • \S- matches everything except a space.

    • \w- matches a letter, number or underscore.

    • \W- matches everything except a letter, number or underscore.


  • For example, the pattern /x\d\d/ will match the strings: "x01", "x25", etc., but will not match the strings: "A15", "x0A"...

    The backslash is also used to make a special character literal. For example, if we need to find the string "a*b", then we will specify the following pattern /a\*b/.

Using the above literals and metacharacters, you can create any patterns you like (think regular expressions). Let's, for example, see what we wrote for the e-mail template in our example:

Var adr_pattern=/+@+\.(2,5)/i;

So, we indicated that in the address email followed by numbers, letters, and underscores 1 or more times, followed by @, followed by numbers, letters, and underscores 1 or more times, followed by a period, followed by letters 2 to 5 times. This is roughly what email addresses look like.

Now, knowing what exactly we specified in the sample, you can check the example’s operation:


Close