Java Regular Expressions: A Beginner’s Guide

/

Introduction

Regular Expressions (Regex) are a powerful tool in Java for pattern matching and string manipulation. This tutorial is designed to guide you through the basics of Java Regex with simple explanations and practical examples.

What is Regex?

A Regular Expression (regex) is a sequence of characters that forms a search pattern. This pattern can be used to search, edit, or manipulate text. In Java, the java.util.regex package provides the classes required for regex operations.

Key Components of Java Regex

  1. Pattern: Defines the regex pattern.
  2. Matcher: Applies the pattern to a string and checks for matches.
  3. PatternSyntaxException: Thrown if the syntax of the regex pattern is invalid.

Why Learn Regex?

Before diving into the syntax, let’s understand when you might use regex in real life:

Form Validation

  • Checking if an email address is valid
  • Verifying phone number formats
  • Ensuring passwords meet security requirements

Data Cleaning

  • Removing special characters from text
  • Standardizing date formats
  • Extracting specific information from logs

Text Processing

  • Finding and replacing specific patterns
  • Parsing structured data
  • Extracting information from strings

Data Masking

  • Hiding sensitive information in logs
  • Masking credit card numbers
  • Protecting personal identification numbers
  • Securing database outputs

Basic Syntax

Here are some commonly used symbols in regex:

SymbolDescription
.Matches any character
*Matches 0 or more repetitions
+Matches 1 or more repetitions
?Matches 0 or 1 repetition
\dMatches any digit (0-9)
\wMatches any word character (a-z, A-Z, 0-9, _)
\sMatches any whitespace
^Matches the start of the string
$Matches the end of the string

Simple Character Matching

Let’s start with the basics. In regex, most characters match themselves.

String pattern = “cat”;
String text = “I have a cat”;
boolean matches = text.matches(“.*” + pattern + “.*”); // true

The .* means “match any character (.) zero or more times (*)” – we’ll cover this in detail later.

Character Classes

Square brackets [] let you match any single character from a set.

// Match any vowel
String pattern = “[aeiou]”;

String text = “hello”;
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(text);

while (m.find()) {
System.out.println(“Found vowel: ” + m.group());
}

// Output:
// Found vowel: e
// Found vowel: o

How to Use Regex in Java

Here’s the basic structure for working with regex in Java:

import java.util.regex.*;

public class RegexExample {
    public static void main(String[] args) {
        // Define the regex pattern
        String pattern = "\\d+"; // Matches one or more digits
        String text = "The order number is 12345.";

        // Create a Pattern object
        Pattern compiledPattern = Pattern.compile(pattern);

        // Create a Matcher object
        Matcher matcher = compiledPattern.matcher(text);

        // Find matches
        while (matcher.find()) {
            System.out.println("Found: " + matcher.group());
        }
    }
}

Output:

Found: 12345

Using Named Groups

To store a matching word in a variable, you can use the group() method of the Matcher class:

import java.util.regex.*;

public class StoreMatchExample {
    public static void main(String[] args) {
        String text = "Your code is ABC123.";
        String pattern = "(?<code>[A-Z]{3}\\d{3})"; // Matches three letters followed by three digits and stores it in a named group 'code'

        Pattern compiledPattern = Pattern.compile(pattern);
        Matcher matcher = compiledPattern.matcher(text);

        if (matcher.find()) {
            String matchedWord = matcher.group("code"); // Store the match in a variable using the named group
            System.out.println("Matched Word: " + matchedWord);
        } else {
            System.out.println("No match found.");
        }
    }
}

Output:

Matched Word: ABC123

Common Patterns

Email Validation

String emailPattern = "^[A-Za-z0-9+_.-]+@(.+)$";
String email = "user@example.com";
boolean isValid = email.matches(emailPattern);

Phone Number Validation

String phonePattern = "^\\(?(\\d{3})\\)?[-.\\s]?(\\d{3})[-.\\s]?(\\d{4})$"; 

String phone = "(123) 456-7890"; 
boolean isValid = phone.matches(phonePattern);

Replace All Digits with #

String text = "Account number: 123456";
String replaced = text.replaceAll("\\d", "#");
System.out.println(replaced);

Data Masking

String sensitiveData = "Credit Card: 1234-5678-9012-3456";

String maskedData = sensitiveData.replaceAll("\\d{4}-(\\d{4})-(\\d{4})-(\\d{4})", "****-****-****-$4");

System.out.println("Masked Data: " + maskedData);

Special Characters and Their Meanings

Quantifiers

SymbolDescription
*Zero or more times
+One or more times
?Zero or one time
{n} Exactly n times
{n,}n or more times
{n,m}Between n and m times

Character Classes

ConstructDescription
[abc]a, b, or c (simple class)
[^abc]Any character except a, b, or c (negation)
[a-zA-Z]a through z, or A through Z, inclusive (range)
[a-d[m-p]]a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]]d, e, or f (intersection)
[a-z&&[^bc]]a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]]a through z, and not m through p: [a-lq-z] (subtraction)

Boundary Matchers

SymbolDescription
^The beginning of a line
$The end of a line
\bA word boundary
\BA non-word boundary
\AThe beginning of the input
\GThe end of the previous match
\ZThe end of the input but for the final terminator, if any
\zThe end of the input

Real-World Examples

Password Validation

public class PasswordValidator {
    public static boolean isValidPassword(String password) {
        // Regex for password that requires:
        // - At least 8 characters
        // - At least one uppercase letter
        // - At least one lowercase letter
        // - At least one number
        // - At least one special character
        String pattern = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=])(?=\\S+$).{8,}$";
        return password.matches(pattern);
    }

    public static void main(String[] args) {
        String password = "Test@123";
        System.out.println("Is password valid? " + isValidPassword(password));
    }
}

Log File Parser

public class LogParser {
    public static void parseLogLine(String logLine) {
        // Pattern to match log format: [DATE] [LEVEL] Message
        String pattern = "\\[(\\d{4}-\\d{2}-\\d{2})\\]\\s*\\[(INFO|ERROR|WARN)\\]\\s*(.*)";

        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(logLine);

        if (m.find()) {
            String date = m.group(1);
            String level = m.group(2);
            String message = m.group(3);

            System.out.println("Date: " + date);
            System.out.println("Level: " + level);
            System.out.println("Message: " + message);
        }
    }

    public static void main(String[] args) {
        String log = "[2024-01-18] [ERROR] Database connection failed";
        parseLogLine(log);
    }
}

Data Masking Examples (Credit Card Masking)

public class DataMasker {
    public static String maskCreditCard(String creditCard) {
        // Keep first 6 and last 4 digits, mask the rest
        String pattern = "(\\d{6})\\d+(\\d{4})";
        return creditCard.replaceAll(pattern, "$1********$2");
    }

    public static String maskEmail(String email) {
        // Show first 2 chars and domain, mask the rest
        String pattern = "([^@]{2})[^@]*(@.+)";
        return email.replaceAll(pattern, "$1***$2");
    }

    public static String maskPhoneNumber(String phone) {
        // Keep last 4 digits, mask the rest
        String pattern = "\\d(?=\\d{4})";
        return phone.replaceAll(pattern, "*");
    }

    public static void main(String[] args) {
        // Credit Card masking
        String creditCard = "4532015112830366";
        System.out.println("Masked CC: " + maskCreditCard(creditCard));
        // Output: 453201********0366

        // Email masking
        String email = "john.doe@example.com";
        System.out.println("Masked Email: " + maskEmail(email));
        // Output: jo***@example.com

        // Phone number masking
        String phone = "1234567890";
        System.out.println("Masked Phone: " + maskPhoneNumber(phone));
        // Output: ******7890
    }
}

Log Data Masking

public class LogMasker {
    public static String maskSensitiveData(String logMessage) {
        // Mask various sensitive data patterns
        String maskedMessage = logMessage
            // Mask SSN
            .replaceAll("\\d{3}-\\d{2}-\\d{4}", "XXX-XX-XXXX")
            // Mask Credit Cards (various formats)
            .replaceAll("\\b(?:\\d[ -]*?){13,16}\\b", "XXXX-XXXX-XXXX-XXXX")
            // Mask API Keys (example format)
            .replaceAll("api_key\\s*[:=]\\s*['\"](\\w+)['\"]", "api_key=\'***MASKED***\'")
            // Mask Passwords
            .replaceAll("password\\s*[:=]\\s*['\"](\\w+)['\"]", "password=\'***MASKED***\'");

        return maskedMessage;
    }

    public static void main(String[] args) {
        String log = "User data: SSN=123-45-6789, CC=4532015112830366, " +
                    "api_key='abc123xyz', password='secretpass123'";

        System.out.println("Original log: " + log);
        System.out.println("Masked log: " + maskSensitiveData(log));
        // Output: User data: SSN=XXX-XX-XXXX, CC=XXXX-XXXX-XXXX-XXXX, 
        // api_key='***MASKED***', password='***MASKED***'
    }
}

Conclusion

Regular expressions are incredibly powerful once you understand them. Start with simple patterns and gradually build up to more complex ones. Remember to:

  • Test thoroughly
  • Comment your regex patterns
  • Break complex patterns into smaller, manageable pieces
  • Use built-in pattern flags when needed (case insensitive, multiline, etc.)
  • Be cautious with sensitive data and always verify your masking patterns

Practice is key to mastering regex. Start with simple patterns and gradually work your way up to more complex ones as you become comfortable with the syntax and concepts.


Frequently Asked Questions

What is a Regular Expression (Regex)?

A regex is a sequence of characters that defines a search pattern, used for matching, searching, or manipulating text.

What is the difference between find() and matches() in regex?

find() searches for the next match, while matches() checks if the entire string matches the pattern.

How do I extract a specific part of a match in Java?

Use capturing groups with parentheses () and retrieve them using the group() method.

What are named groups, and how do I use them?

Named groups assign a name to a capturing group for easier access. Syntax: (?<name>pattern)

Can regex handle multiline input?

Yes, use flags like Pattern.MULTILINE to match patterns across multiple lines.

How do I handle special characters in regex patterns?

Special characters need to be escaped with a backslash (\). In Java strings, you need two backslashes since the first one is for the string literal.
String pattern = “www\\.example\\.com”;

Leave a Reply