Advanced search query parsing with Ruby

Making advanced search forms for your backend kinda sucks, right? I ran into this problem with a project and decided to come up with something similar to how Google handles search queries. No dropdowns, no checkboxes, no radios, none of that stuff. I just wanted to be able to type status:active user:rob [term] and get what I asked for. I came up with a pretty nice utility class that can handle some basic query syntaxes and give you back hashes or arrays and the left over query.

With this class I can make links that perform my basic “advanced” queries, without littering the query string. For example, if I wanted to ‘get a list of all guest users with the name “jake” in NJ, NY, or FL’ I could search for the following: guest:true state:nj,ny,fl jake. I would then take that in the controller’s params and parse it out using terms = SearchTerms.new(params[:q]) which would give me the elements broken down:

>> terms = SearchTerms.new("guest:true state:nj,ny,fl jake")
=> #<SearchTerms:0x007fc239072008 @query="jake", @parts={"guest"=>true, "state"=>["nj", "ny", "fl"]}, @split=true>
>> terms.guest
=> true
>> terms.state
=> ["nj", "ny", "fl"]
>> terms.query
=> "jake"

Then we can use those tokens with ActiveRecord’s scopes.

@user = @user.registered(false) if terms.guest

The Code

# Search term parser from https://gist.github.com/1477730
# Modified to allow periods (and other non-letter chars) in unquoted field values
# and field names.
#
# Helper class to help parse out more advanced saerch terms
# from a form query
#
# Note: all hash keys are downcased, so ID:10 == {'id' => 10}
#       you can also access all keys with methods e.g.: terms.id = terms['id'] = 10
#       this doesn't work with query as thats reserved for the left-over pieces
#
# Usage:
#   terms = SearchTerms.new('id:10 search terms here')
#   => @query="search terms here", @parts={"id"=>"10"}
#   => terms.query = 'search terms here'
#   => terms['id'] = 10
#
#   terms = SearchTerms.new('name:"support for spaces" state:pa')
#   => @query="", @parts={"name"=>"support for spaces", "state"=>"pa"}
#   => terms.query = ''
#   => terms['name'] = 'support for spaces'
#   => terms.name = 'support for spaces'
#
#   terms = SearchTerms.new('state:pa,nj,ca')
#   => @query="", @parts={"state"=>["pa","nj","ca"]}
#
#   terms = SearchTerms.new('state:pa,nj,ca', false)
#   => @query="", @parts={"state"=>"pa,nj,c"}
#
# Useful to drive custom logic in controllers
class SearchTerms
  attr_reader :query, :parts

  # regex scanner for the parser
  SCANNER = %r{
    (?:
      ([\w\.]+)               # look for any word
    )
    (?:                       # check if it has a value attached
      :                       # find the value delimiter
      (
        [\w,\-]+              # match any word-like values
        |                     # -or-
        (?:"(?:.+|[^\"])*")   # match any quoted values
      )
    )?
  }x

  # query:: this is what you want tokenized
  # split:: if you'd like to split values on "," then pass true
  def initialize(query, split = true)
    @query = query
    @parts = {}
    @split = split
    parse_query!
  end

  def [](key)
    @parts[key]
  end

private

  def parse_query!
    tmp = []

    @query.scan(SCANNER).map do |key,value|
      if value.nil?
        tmp << key
      else
        key.downcase!
        @parts[key] = clean_value(value)
        define_metaclass_method(key){ @parts[key] } unless key == 'query'
      end
    end

    @query = tmp.join(' ')
  end

  def clean_value(value)
    return value.tr('"', '') if value.include?('"')
    return value.split(',') if @split && value.include?(',')
    return true if value == 'true'
    return false if value == 'false'
    return value.to_i if value =~ /^[1-9][0-9]*$/
    value
  end

  def define_metaclass_method(method, &block)
    (class << self; self; end).send :define_method, method, &block
  end
end

if $0 == __FILE__
  require 'test/unit'

  class SearchTermsTest < Test::Unit::TestCase
    TEST_CASES = {
      "simple" => ["foo","foo",{}],
      "simple_field" => ["one:two","",{"one" => "two"}],
      "quotes" => [%{foo:"quoted value"}, "", {"foo" => "quoted value"}],
      "term_with_period" => ["1.5","1.5",{}],
      "multiple_fields" => ["one:two three:four","",{"one" => "two", "three" => "four"}],
      "int_parse" => ["id:123","",{"id" => 123}],
      "int_parse_leading_letter" => ["id:a01","","id" => "a01"],
      "int_parse_leading_zero" => ["id:001","","id" => "001"],
      "mixed_fields_terms" => ["one two:three four five:six","one four",{"two" => "three", "five" => "six"}]
    }

    TEST_CASES.each do |name, (input, query, parts)|
      define_method("test_#{name}") do
        terms = SearchTerms.new(input)
        assert_equal query, terms.query
        assert_equal parts, terms.parts
      end
    end
  end
end

Usage

# basic usage to search users from your #index action
class UsersController < ApplicationController
  def index
    if params[:q]
      terms = SearchTerms.new(params[:q])

      if terms['id']
        return redirect_to user_path(terms['id'])
      else
        @users = @users.search_by_name(terms.query) unless terms.query.blank?
        @users = @users.with_role(terms['role']) if terms['role']
        @users = @users.registered(false) if terms['guest']
      end
    end
  end
end

View Gist