Technology

How to Find Pattern Matches Across Multiple Lines With grep

grep is a command line text searching utility that is able to find patterns and strings in files and other types of input. Most matches will match on one line only, but it’s often useful to match across multiple new lines.

Matching Across Multiple New Lines With grep

Handling multi-line matches is something grep struggles with. A better tool for the job is awk or sed, which both handle multi-line input naturally. Using two expressions with a comma in between them will match everything in between those two patterns.

awk '/from/,/to/' file
sed -n '/from/,/to/p' file

It’s still possible to handle this in grep, however, the command is very clunky.

grep -Pz '(?s)from.*n.*to' test

This does a few things:

  • -P Turns on Perl Compatible Regex.
  • -z feeds the entire file as one line, with “zero bytes” instead of a newline. This allows grep to process the whole thing as one line.
  • (?s) turns on PCRE_DOTALL, which makes the . character match any character, including newlines.
  • from is the starting match.
  • .*n.* will match everything up until to, which is the ending match.

Overall, this will get it done for scripting purposes, but it’s quite a lot to remember if you’re typing this out yourself. Also, using the -o flag to print just the match will also print out a trailing zero byte character, which can cause additional problems.

Using pcre2grep Instead (Perl-Compatible grep)

Regular grep isn’t the best tool for the job, and there’s an alternative called pcre2grep that packs in support for Perl Compatible Regular Expressions out of the box, and is able to match multiline Regex very easily.

It’s probably installed on your system, but if it isn’t, you can get it from your package manager:

sudo apt install pcre2-utils

Then, you just need to run it with the -M parameter.

pcre2grep -M 'from(n|.)*to' file

Note that this still requires you to match “newline or any character” manually with (n|.)* . Alternatively, you can use the (?s) trick to turn on PCRE_DOTALL and make the dot character match newlines as well.

pcre2grep -M '(?s)from.*to' file



File source

Tags
Show More

Related Articles

Back to top button
Close