UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

PERG-Rx : an FPGA-based pattern-matching engine with limited regular expression support for large pattern databases Ho, Johnny Tsung Lin


Pattern-matching is a fundamental technique found in applications like a network intrusion detection system (NIDS) and antivirus. By comparing an input data stream against a set of predefined byte-string patterns, malicious activities can be detected. Due to the increasing pattern database size, hardware engines are frequently being used to accelerate pattern-matching because software-based solution can no longer keep up. This thesis presents PERG, a FPGA-based pattern-matching engine targeted for the large pattern database found in Clam AntiVirus (ClamAV). Previous hardware pattern-matching engines have targeted the Snort database used in NIDS. PERG is the first hardware pattern-matching engine designed for the ClamAV pattern database, which is several times larger than Snort in both the number of patterns and the range of pattern lengths. Among hash-based pattern-matching hardware, PERG is the first to integrate limited regular-expression support. In ClamAV, this feature is needed for polymorphic virus detection. Finally among hash-based hardware pattern-matching engines, PERG is the first to implement Bloomier filters. With Bloomier filters, the process of exact matching to verify a false positive in PERG is much simpler and faster than other hash-based pattern-matching engines which use traditional Bloom filters. To reduce hardware resources, PERG uses a pattern compiler to transform the pattern database into an optimized hardware implementation. The compiler packs as many pattern strings as possible into each Bloomier filter unit, a hardware module, by consolidating several different pattern lengths into the same filter unit. To further save on hardware resources, limited regular-expression support for wildcard patterns is provided using a lossy scheme, which trades off a slight increase in false-positive probability to gain significant savings in hardware resources. False-negative probability in PERG remains zero. Using the ClamAV antivirus database, PERG fits 84,387 patterns containing over 8,645,488 characters into a single modest FPGA chip with a small (4 MB) off-chip memory. It uses just 26 filter units, resulting in roughly 24.7x improved density (characters per memory bit) compared to the next-best NIDS pattern-matching engine which fits only 1/126th the characters. PERG can scan at roughly 170MB/s and match the speed of most network or disk interfaces.

Item Media

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International