Introduction
Regular Expressions (Regex) are a powerful tool in Java for pattern matching and string manipulation. This tutorial is designed to guide you through the basics of Java Regex with simple explanations and practical examples.
What is Regex?
A Regular Expression (regex) is a sequence of characters that forms a search pattern. This pattern can be used to search, edit, or manipulate text. In Java, the java.util.regex
package provides the classes required for regex operations.
Key Components of Java Regex
- Pattern: Defines the regex pattern.
- Matcher: Applies the pattern to a string and checks for matches.
- PatternSyntaxException: Thrown if the syntax of the regex pattern is invalid.
Why Learn Regex?
Before diving into the syntax, let’s understand when you might use regex in real life:
Form Validation
- Checking if an email address is valid
- Verifying phone number formats
- Ensuring passwords meet security requirements
Data Cleaning
- Removing special characters from text
- Standardizing date formats
- Extracting specific information from logs
Text Processing
- Finding and replacing specific patterns
- Parsing structured data
- Extracting information from strings
Data Masking
- Hiding sensitive information in logs
- Masking credit card numbers
- Protecting personal identification numbers
- Securing database outputs
Basic Syntax
Here are some commonly used symbols in regex:
Symbol | Description |
---|---|
. | Matches any character |
* | Matches 0 or more repetitions |
+ | Matches 1 or more repetitions |
? | Matches 0 or 1 repetition |
\d | Matches any digit (0-9) |
\w | Matches any word character (a-z, A-Z, 0-9, _) |
\s | Matches any whitespace |
^ | Matches the start of the string |
$ | Matches the end of the string |
Simple Character Matching
Let’s start with the basics. In regex, most characters match themselves.
String pattern = “cat”;
String text = “I have a cat”;
boolean matches = text.matches(“.*” + pattern + “.*”); // true
The .*
means “match any character (.) zero or more times (*)” – we’ll cover this in detail later.
Character Classes
Square brackets []
let you match any single character from a set.
// Match any vowel
String pattern = “[aeiou]”;
String text = “hello”;
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(“Found vowel: ” + m.group());
}
// Output:
// Found vowel: e
// Found vowel: o
How to Use Regex in Java
Here’s the basic structure for working with regex in Java:
import java.util.regex.*;
public class RegexExample {
public static void main(String[] args) {
// Define the regex pattern
String pattern = "\\d+"; // Matches one or more digits
String text = "The order number is 12345.";
// Create a Pattern object
Pattern compiledPattern = Pattern.compile(pattern);
// Create a Matcher object
Matcher matcher = compiledPattern.matcher(text);
// Find matches
while (matcher.find()) {
System.out.println("Found: " + matcher.group());
}
}
}
Output:
Found: 12345
Using Named Groups
To store a matching word in a variable, you can use the group()
method of the Matcher
class:
import java.util.regex.*;
public class StoreMatchExample {
public static void main(String[] args) {
String text = "Your code is ABC123.";
String pattern = "(?<code>[A-Z]{3}\\d{3})"; // Matches three letters followed by three digits and stores it in a named group 'code'
Pattern compiledPattern = Pattern.compile(pattern);
Matcher matcher = compiledPattern.matcher(text);
if (matcher.find()) {
String matchedWord = matcher.group("code"); // Store the match in a variable using the named group
System.out.println("Matched Word: " + matchedWord);
} else {
System.out.println("No match found.");
}
}
}
Output:
Matched Word: ABC123
Common Patterns
Email Validation
String emailPattern = "^[A-Za-z0-9+_.-]+@(.+)$";
String email = "user@example.com";
boolean isValid = email.matches(emailPattern);
Phone Number Validation
String phonePattern = "^\\(?(\\d{3})\\)?[-.\\s]?(\\d{3})[-.\\s]?(\\d{4})$";
String phone = "(123) 456-7890";
boolean isValid = phone.matches(phonePattern);
Replace All Digits with #
String text = "Account number: 123456";
String replaced = text.replaceAll("\\d", "#");
System.out.println(replaced);
Data Masking
String sensitiveData = "Credit Card: 1234-5678-9012-3456";
String maskedData = sensitiveData.replaceAll("\\d{4}-(\\d{4})-(\\d{4})-(\\d{4})", "****-****-****-$4");
System.out.println("Masked Data: " + maskedData);
Special Characters and Their Meanings
Quantifiers
Symbol | Description |
---|---|
* | Zero or more times |
+ | One or more times |
? | Zero or one time |
{n} | Exactly n times |
{n,} | n or more times |
{n,m} | Between n and m times |
Character Classes
Construct | Description |
---|---|
[abc] | a, b, or c (simple class) |
[^abc] | Any character except a, b, or c (negation) |
[a-zA-Z] | a through z, or A through Z, inclusive (range) |
[a-d[m-p]] | a through d, or m through p: [a-dm-p] (union) |
[a-z&&[def]] | d, e, or f (intersection) |
[a-z&&[^bc]] | a through z, except for b and c: [ad-z] (subtraction) |
[a-z&&[^m-p]] | a through z, and not m through p: [a-lq-z] (subtraction) |
Boundary Matchers
Symbol | Description |
---|---|
^ | The beginning of a line |
$ | The end of a line |
\b | A word boundary |
\B | A non-word boundary |
\A | The beginning of the input |
\G | The end of the previous match |
\Z | The end of the input but for the final terminator, if any |
\z | The end of the input |
Real-World Examples
Password Validation
public class PasswordValidator {
public static boolean isValidPassword(String password) {
// Regex for password that requires:
// - At least 8 characters
// - At least one uppercase letter
// - At least one lowercase letter
// - At least one number
// - At least one special character
String pattern = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=])(?=\\S+$).{8,}$";
return password.matches(pattern);
}
public static void main(String[] args) {
String password = "Test@123";
System.out.println("Is password valid? " + isValidPassword(password));
}
}
Log File Parser
public class LogParser {
public static void parseLogLine(String logLine) {
// Pattern to match log format: [DATE] [LEVEL] Message
String pattern = "\\[(\\d{4}-\\d{2}-\\d{2})\\]\\s*\\[(INFO|ERROR|WARN)\\]\\s*(.*)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(logLine);
if (m.find()) {
String date = m.group(1);
String level = m.group(2);
String message = m.group(3);
System.out.println("Date: " + date);
System.out.println("Level: " + level);
System.out.println("Message: " + message);
}
}
public static void main(String[] args) {
String log = "[2024-01-18] [ERROR] Database connection failed";
parseLogLine(log);
}
}
Data Masking Examples (Credit Card Masking)
public class DataMasker {
public static String maskCreditCard(String creditCard) {
// Keep first 6 and last 4 digits, mask the rest
String pattern = "(\\d{6})\\d+(\\d{4})";
return creditCard.replaceAll(pattern, "$1********$2");
}
public static String maskEmail(String email) {
// Show first 2 chars and domain, mask the rest
String pattern = "([^@]{2})[^@]*(@.+)";
return email.replaceAll(pattern, "$1***$2");
}
public static String maskPhoneNumber(String phone) {
// Keep last 4 digits, mask the rest
String pattern = "\\d(?=\\d{4})";
return phone.replaceAll(pattern, "*");
}
public static void main(String[] args) {
// Credit Card masking
String creditCard = "4532015112830366";
System.out.println("Masked CC: " + maskCreditCard(creditCard));
// Output: 453201********0366
// Email masking
String email = "john.doe@example.com";
System.out.println("Masked Email: " + maskEmail(email));
// Output: jo***@example.com
// Phone number masking
String phone = "1234567890";
System.out.println("Masked Phone: " + maskPhoneNumber(phone));
// Output: ******7890
}
}
Log Data Masking
public class LogMasker {
public static String maskSensitiveData(String logMessage) {
// Mask various sensitive data patterns
String maskedMessage = logMessage
// Mask SSN
.replaceAll("\\d{3}-\\d{2}-\\d{4}", "XXX-XX-XXXX")
// Mask Credit Cards (various formats)
.replaceAll("\\b(?:\\d[ -]*?){13,16}\\b", "XXXX-XXXX-XXXX-XXXX")
// Mask API Keys (example format)
.replaceAll("api_key\\s*[:=]\\s*['\"](\\w+)['\"]", "api_key=\'***MASKED***\'")
// Mask Passwords
.replaceAll("password\\s*[:=]\\s*['\"](\\w+)['\"]", "password=\'***MASKED***\'");
return maskedMessage;
}
public static void main(String[] args) {
String log = "User data: SSN=123-45-6789, CC=4532015112830366, " +
"api_key='abc123xyz', password='secretpass123'";
System.out.println("Original log: " + log);
System.out.println("Masked log: " + maskSensitiveData(log));
// Output: User data: SSN=XXX-XX-XXXX, CC=XXXX-XXXX-XXXX-XXXX,
// api_key='***MASKED***', password='***MASKED***'
}
}
Conclusion
Regular expressions are incredibly powerful once you understand them. Start with simple patterns and gradually build up to more complex ones. Remember to:
- Test thoroughly
- Comment your regex patterns
- Break complex patterns into smaller, manageable pieces
- Use built-in pattern flags when needed (case insensitive, multiline, etc.)
- Be cautious with sensitive data and always verify your masking patterns
Practice is key to mastering regex. Start with simple patterns and gradually work your way up to more complex ones as you become comfortable with the syntax and concepts.
Frequently Asked Questions
What is a Regular Expression (Regex)?
A regex is a sequence of characters that defines a search pattern, used for matching, searching, or manipulating text.
What is the difference between find()
and matches()
in regex?
find()
searches for the next match, while matches()
checks if the entire string matches the pattern.
How do I extract a specific part of a match in Java?
Use capturing groups with parentheses ()
and retrieve them using the group()
method.
What are named groups, and how do I use them?
Named groups assign a name to a capturing group for easier access. Syntax: (?<name>pattern)
Can regex handle multiline input?
Yes, use flags like Pattern.MULTILINE
to match patterns across multiple lines.
How do I handle special characters in regex patterns?
Special characters need to be escaped with a backslash (\). In Java strings, you need two backslashes since the first one is for the string literal.
String pattern = “www\\.example\\.com”;
Leave a Reply
You must be logged in to post a comment.