Lately I have been working on some exploration surrounding multi-string matching and regular expression matching against large data sets. As an exercise to put some of this exploration into practice I've begun implementing a version of grep to implement various matching algorithms and see what I can come up with for my own use requirements.

Here are a few papers I've come across in my search:

Aho, A. V., & Corasick, M. J. (1975). Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18(6), 333–340. doi:10.1145/360825.360855

Faro, S., & Külekci, M. O. (2012). Fast Multiple String Matching Using Streaming SIMD Extensions Technology. In Lecture Notes in Computer Science (Vol. 7608, pp. 217–228). Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/978-3-642-34109-0_23