PDF text search- Windows – CYGWIN

You’ve likely been where I am, and usually there quite often, looking for a specific PDF which contains information you need to reference, but you can’t find the darn PDF! Unless you’re incredibly detail oriented and place everything where you’ll find it 100% of the time, you’re likely to lose it. However, for those on Windows who have CYGWIN installed, or even if you’re using Linux and just want to know how to do the same thing, allow me to enlighten you on how to use two, well three if you think about it, tools for searching your entire machine(s) for that specific file.

On a Windows machine you’ll want the following:

  • Base utilities packages (to ensure you have find and xargs)
  • pdfgrep

So, you know you remember a specific keyword, or phrase, and you just need to search for all of your PDF files and return on the terminal the output with the filename. I recommend using xargs in conjunction with find, instead of exec. Why? Well, exec will, for each file the find command returns, run a separate process; thus, its slow and eats processor time, xargs does not. So, lets say you want to search your entire computer’s C: drive in CYGWIN:

find /cygdrive/c -type f  -name "*.pdf" -print0 | xargs -0 pdfgrep -i "symmetrical"

That is it, it may take a long time, but for any .pdf document it finds, xargs will search for the string “symmetrical”. Your mileage will vary based on your machines resources, but if you can close everything and leave only that running, say at lunch time, it should execute quickly and, hopefully, you’ll find your long lost PDF document.