Tuesday, August 21, 2012

Solr Acceptance Tests: introducing rspec-solr (and sw_index_tests)

I've released the rspec-solr ruby gem, which applies RSpec custom matchers to Solr responses.  Rdoc is at http://rubydoc.info/github/sul-dlss/rspec-solr  and the source code is at https://github.com/sul-dlss/rspec-solr/ .

It's pretty simple:  once you have a ruby Solr response, you wrap it in RSpecSolr:
 
 resp = RSpecSolr::SolrResponseHash.new(yer_solr_resp)

and then you can make useful assertions for acceptance testing:
  
 resp.should include({'id'=>'81234'})
 resp.should include({'title'=>'Harry Potter'}).in_first(3).results
 resp.should include('111'}).before('222')

So you might write specs like this:

it "q of 'Buddhism' should get 8,500-10,500 results" do
  resp = solr_resp_doc_ids_only({'q'=>'Buddhism'})
  resp.should have_at_least(8500).documents
  resp.should have_at_most(10500).documents
end

 it "q of 'Two3' should have excellent results", :jira => 'VUF-386' do
   resp = solr_resp_doc_ids_only({'q'=>'Two3'})
   resp.should have_at_most(10).documents
   resp.should include("5732752").as_first_result
   resp.should include("8564713").in_first(2).results
   resp.should include("5732752").before("8564713")
   resp.should_not include("5727394")
   resp.should have_the_same_number_of_results_as(solr_resp_doc_ids_only({'q'=>'two3'}))
   resp.should have_fewer_results_than(solr_resp_doc_ids_only({'q'=>'two 3'}))
 end

 it "Traditional Chinese chars 三國誌 should get the same results as simplified chars 三国志" do
   resp = solr_resp_doc_ids_only({'q'=>'三國誌'})  
   resp.should have_at_least(240).documents
   resp.should have_the_same_number_of_results_as(solr_resp_doc_ids_only({'q'=>'三国志'})) 
 end
  


Note that these examples utilize a couple of helper methods.  See the README for more details.

The gem is only at release 0.1.0, but I'm finding it useful already.  You'll see some FIXME and TODO comments, and I suspect there's plenty that can be improved.   I'm happy to take your pull requests.

If it looks too much like code ...
If you can get non-coding colleagues to write your tests, then making the testing syntax easier for them is probably worthwhile.  You could certainly use Cucumber on top of rspec-solr  to write your Solr acceptance tests in more natural language.


Tip:

For most of my tests, I realized I don't check anything but the Solr document id in the results.   It makes it a lot easier to look through RSpec error messages when the Solr response doesn't have extraneous fields or the facet counts ... and it's also a much smaller http response.  So I rigged up a method that adds {'fl'=>'id', 'facet'=>'false'} to the request params I send to Solr.   My spec errors now read like this:

expected {"responseHeader"=>{"status"=>0, "QTime"=>10, "params"=>{"facet"=>"false", "fl"=>"id", "qt"=>"search_author", "wt"=>"ruby", "q"=>"契沖"}}, "response"=>{"numFound"=>3, "start"=>0, "docs"=>[{"id"=>"6675613"}, {"id"=>"6675393"}, {"id"=>"6274534"}]}} to include ["6675613", "6675393", "7191966", "6274534", "4783602"]
Diff:
@@ -1,2 +1,14 @@
-[["6675613", "6675393", "7191966", "6274534", "4783602"]]
+{"responseHeader"=>
+  {"status"=>0,
+   "QTime"=>10,
+   "params"=>
+    {"facet"=>"false",
+     "fl"=>"id",
+     "qt"=>"search_author",
+     "wt"=>"ruby",
+     "q"=>"契沖"}},
+ "response"=>
+  {"numFound"=>3,
+   "start"=>0,
+   "docs"=>[{"id"=>"6675613"}, {"id"=>"6675393"}, {"id"=>"6274534"}]}}
 

and they could have even less output, if I turned off "diffable" -- but I am currently finding it helpful.




Okay, but what good is this, really?

My current project is to improve search results for CJK (Chinese, Japanese and Korean) queries in SearchWorks.   It's nearly impossible for a CJK-ignorant coder such as myself to write good tests.  It's pretty darn hard for our non-coder CJK experts to write good tests, too.  So we have to iterate to figure out a set of acceptance tests.  Doing this without coding repeatable, automatable tests is ludicrous.**

We already have search tests, but our current search tests are slow.  They use Cucumber to mimic a user interacting with the web page to do a search, send the request to Solr, then the SearchWorks Blacklight Rails stack prepares the html that would be served by the application to present the search results from Solr.   The assertions are made against the html.   Given that for search acceptance testing, we don't care about the rails stack, this is a lot of extra processing. 

So it's time to take Rails out of the picture.  With some help from my colleague Chris Beer, we conceived a way to make it really simple -- let's write rspec style language on Solr response objects!   That spawned rspec-solr.

I am already using rspec-solr for our CJK acceptance tests.  All I needed was the rsolr gem, a spec_helper file, and some simple configuration stuff - 4 very small files.  (See rspec-solr README)

I've got CJK tests like this:

  it "should parse out 中国 (china)  经济 (economic)  政策 (policy)" do
    resp = solr_resp_doc_ids_only({'q'=>'中国经济政策'}) 
    resp.should have_at_least(85).documents
    resp.size.should be_within(5).of(solr_resp_doc_ids_only({'q'=>'中国  经济  政策'}).size) 
  end
 
  it "Traditional chars 三國誌 should get the same results as simplified chars 三国志" do
    resp = solr_resp_doc_ids_only({'q'=>'三國誌'}) 
    resp.should have_at_least(240).documents
    resp.should have_the_same_number_of_results_as(solr_resp_doc_ids_only({'q'=>'三国志'}))
  end

  it "hangul  광주 should get results for hancha  光州" do
    resp = solr_resp_doc_ids_only({'q'=>'광주'})
    resp.should include(["7763372", "7773313"]) # hancha  光州
    resp.should have_at_least(110).documents
  end


I'm also migrating our cucumber search regression tests to the rspec-solr approach -- obviously, I want a full suite of regression tests as I make changes for CJK searching.

A sample regression test:
  it "q of 'Two3' should have excellent results", :jira => 'VUF-386' do
    resp = solr_resp_doc_ids_only({'q'=>'Two3'})
    resp.should have_at_most(10).documents
    resp.should include("5732752").as_first_result
    resp.should include("8564713").in_first(2).results
    resp.should_not include("5727394")
    resp.should have_the_same_number_of_results_as(solr_resp_doc_ids_only({'q'=>'two3'}))
    resp.should have_fewer_results_than(solr_resp_doc_ids_only({'q'=>'two 3'}))
  end
 

Both types are very much works in progress, but I've deliberately put the tests up on github as the sw_index_tests repository so you can leverage them however you see fit.

I think it's pretty slick.


** In fact, they already DID do this for our ILS without repeatable, automatable tests ... and without records of their manual tests ... so we're starting from scratch.  How annoying and wasteful!


Tuesday, March 13, 2012

Upgrading from Solr 1.4 to Solr 3.5 - hiccups

Stanford SearchWorks has been due for a Solr upgrade for a loooong time -- we've been using Solr 1.4 since ... well, forever.   Bob Haschart upgraded SolrMarc to work with Solr 3.5, so I figured I would upgrade Solr as I refactored SolrMarc for the stanford-solr-marc fork.  (See also previous blog entry).
  In the course of upgrading from Solr 1.4 to Solr 3.5, a number of our tests were failing.  Usually the problem was a mistake in my configuration files for Solr 3.5;  sometimes the tests were too brittle.  It took a pass or two to start using the ICU library for unicode normalization, rather than SolrMarc's unicodeNormalizer.  I managed to get most of the failing tests to pass, but a handful stumped me.

Here's what I learned:

I.  (Hyphens) and WordDelimiterFilterFactory

Solr 3.2 (?) added a new setting for field analysis:  autoGeneratePhraseQueries, that defaults to "false".  In Solr 1.4, this setting was always true.  The difference is important for certain settings of WordDelimiterFilterFactory.  Let's say we have a query with a value of  "red-rose" (no quotes).

in Solr 1.4:

<fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <filter class="schema.UnicodeNormalizationFilterFactory" version="icu4j"
        composed="false" remove_diacritics="true" remove_modifiers="true" fold="true"/>
     <filter class="solr.WordDelimiterFilterFactory"
        splitOnCaseChange="1" generateWordParts="1" catenateWords="1"
        splitOnNumerics="0" generateNumberParts="1" catenateNumbers="1"
        catenateAll="0" preserveOriginal="0" stemEnglishPossessive="1"/>
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
</fieldtype>

With debugQuery=true, we find the following query fragment being generated by dismax:
   text_field:"red (rose redros)"

in Solr 3.5:

<fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <filter class="solr.ICUFoldingFilterFactory"/>
     <filter class="solr.WordDelimiterFilterFactory"         splitOnCaseChange="1" generateWordParts="1" catenateWords="1"
        splitOnNumerics="0" generateNumberParts="1" catenateNumbers="1"
        catenateAll="0" preserveOriginal="0" stemEnglishPossessive="1"/>
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
 </fieldtype>
debugQuery=true shows us this query fragment:
   (text_field:red text_field:rose text_field:redros) -- including the parens.

Thus, a match on just "rose" is good enough in Solr 3.5, but not so in Solr 1.4's analysis.

How to fix this?

Add the attribute autoGeneratePhraseQueries="true" to the field type declaration:

  <fieldtype name="text" class="solr.TextField" positionIncrementGap="100"
       autoGeneratePhraseQueries="true"

2. StreamingUpdateServer and Binary Updates

In the most recent release of SolrJ (3.5), the streaming update server was not processing binary fields properly.  Two solutions:  1)  use the SolrJ jar provided in Bob Haschart's SolrMarc, as he has modified it to address this problem.  2) use a nightly jar, as this has been fixed in the SolrJ trunk and the SolrJ 3.6 branch.

3. Phrase Slop and Queries with Repeated Terms

Ultimately, I managed to get our tests passing except for two.  I couldn't figure out the difficulty - I looked at debugQuery results on Solr 1.4 and Solr 3.5;  I compared using the analysis debugger from the admin interface - nothing looked different.

Jonathan Rochkind pointed out that both phrases had repeated words;  these were both phrase searches as well.

It turns out that there was a bug in Lucene (that crept in sometime between Solr 1.4 and Solr 3.5).  If there was a non-zero slop setting in a phrase query with repeated terms, then results were incorrect.

https://issues.apache.org/jira/browse/LUCENE-3821

Thanks to Doron Cohen and Robert Muir, a fix was found and a patch was applied to Lucene, which was picked up in the Solr trunk and Solr 3.6 branch as of March 10, 2012.

Wednesday, February 15, 2012

stanford-solr-marc fork of SolrMarc

In the interests of reducing my ongoing work for Stanford's SearchWorks index, I have, with Bob Haschart's blessing, forked the SolrMarc code and made my fork available via the (new) SolrMarc space on github:

http://github.com/solrmarc/stanford-solr-marc


Specifics of how my fork digresses are below.


This is an experiment:  I believe my personal efforts will be reduced by using this pared down derivative of SolrMarc.  I am NOT committing to supporting all the use cases that Bob supports with SolrMarc.  Bob is doing a great job of juggling VuFind needs, Blacklight needs, UVa needs, less savvy consumers' needs, and maintaining backward compatibility with earlier versions of Solr.  I cannot make those kinds of commitments on Stanford's dollar or on my own time.   
One goal of the fork is to simplify the code and the build scripts for development purposes.  This creates a slightly higher expectation of users:  they will be presumed to have expertise to do what they need downstream.  (e.g. edit the build.properties file, set up analogous directories for their local site code and/or their local versions of Solr, substitute their own java customizations, set their own version up for bean shell, etc).


If anyone likes what I've done or any part of it, feel free to grab it, fork it, mimic it or whatever.   I am happy to add committers if they write test code for any changes they want to push up.

I have created hudson builds for the core code and the site specific code in stanford-solr-marc on the projectblacklight hudson server.  These builds will kick off after each commit to the stanford-solr-marc github repository, and they create javadoc and test coverage reports (see the hudson pages below for links to these).


http://hudson.projectblacklight.org/hudson/job/stanford-solr-marc%20CORE%20code/
http://hudson.projectblacklight.org/hudson/job/stanford-solr-marc%20SITE%20code/

I can add emails to the hudson build notifications, and can probably figure out how to have github send emails upon commits, if folks desire.

It would be awesome if the fork converges with SolrMarc future development to the point of re-combining the code base.  Meanwhile, as Bob and I have discussed, this fork may help Bob with some of his refactoring plans, and I can forge ahead with Stanford specific needs more easily.
Significant Differences between my fork and the SolrMarc on GoogleCode:
  1. git  
  2. reorg of the directory structures for clarity and to reduce nesting.
  3. complete rewrite of the ant builds.
    • a single build.xml file
      • no macros
    • a single build.properties file -- it should be straightforward to change build.properties as desired.
    • the build process does not result in a single jar, but instead creates a dist directory with all the files and folder structure as needed to execute the code.
  4. the wonderful scripts written by Bob are not "localized" by the build process
  5. strives to use "vanilla" versions of Solr and Marc4j, with version clearly indicated
  6. the utility class has been refactored into smaller pieces
  7. the only exemplar site code is Stanford SearchWorks
  8. functionality not used by Stanford is often stripped out, such as
    • bean shell scripting capability (it could be added back in easily, if desired)
    • notion of running under windows (could be added back in)
    • unused code placeholders, such as z39.50
  9. embedded solrj update options are not exercised - this code will be stripped out soon
  10. core tests have been largely rewritten to adhere to junit common practices:  ant calls a junit class which executes the java code and asserts the correct results.
  11. current intent is to move away from using java reflection to simultaneously support multiple versions of Solr -- I will create a tag/branch for a Solr version if a Solr upgrade isn't backwards compatible, and I make no promise to keep that branch up to date.
I have not written or rewritten the type of documentation available on the googlecode SolrMarc wiki - much of that documentation is directly applicable (settings for xxx_config.properties, settings for xxx_index.properties …).

Note that the SITE code for Stanford SearchWorks will lag behind our actual production code, as the copy of record is *not* the github repository.  
a.  avoids commit messages for every commit for local work
b.  allows our copy-of-record to be behind the Stanford firewall.
c.  I will update the github repository to the current Stanford production code from time to time.

Let me repeat:  I'm not promising to keep this project backwards compatible with older versions of Solr or of xx_index.properties files, as those progress.  The main audience for this codebase is me.  Others are welcome to the code, and will probably be welcomed as committers … but consumers of this codebase will be presumed to have enough expertise to do what they need downstream.  (e.g. substitute their own java customizations, or set their own version up for bean shell, or for a different version of Solr).

There is plenty more work to do.  Just a few examples:
  • More tests of core code
  • More refactoring of core code
  • Documentation

Thursday, December 22, 2011

How to Configure Hudson to Monitor Test Coverage Stats

Goal:   configure a Hudson project so it will squawk if the test coverage stats drop below the current coverage levels.

I researched this a while ago, and perhaps this will spare a few folks some effort.

It turns out there are two separate conditions that are related:

1.  job states:    successful / unstable / broken / disabled
 this is displayed as the color of the dot next to an individual build.

2.   job stability (weather icon):
"While a job may build to completion and generate the target artifacts without issue, Hudson will assign a stability score to the build (from 0-100) based on the post-processor tasks, implemented as plugins, that you have set up to implicitly evaluate stability."  These can include unit tests (JUnit, etc.), coverage (Cobertura, Rcov, etc.), and static code analysis (FindBugs). The higher the score, the more stable the build.

settings:
 bright sun (80-100)
 partly cloudy (60-79)
 cloudy (40-59)
 raining (20-39)
 stormy (0-19)


Now for the details about coverage metric settings:

 If you go into "configure" on your project, and have "Publish (coverage) report" turned on, you'll see there are rows (in Cobertura, for things like "classes" "methods" "lines") and then there are three columns.  Here's what they mean:

bright sun (left column):
 the minimum coverage level required for a bright sunny weather indicator on the dashboard.

stormy (middle column):
 the minimum coverage level to avoid stormy icon.

plain sun (rightmost column)
 the minimum test coverage required for a stable build.
 so you should put your current coverage HERE, and your build will be marked unstable if you go below your current coverage percentage.


My interpretation is the first two columns affect your weather icon (job stability), and the third column affects the job state (color of the dot by an individual build).


- Naomi

sources:

http://www.javaworld.com/javaworld/jw-12-2008/jw-12-hudson-ci.html?page=7

http://books.google.com/books?id=YoTvBpKEx5EC&pg=PA369&lpg=PA369&dq=hudson+setting+cobertura+coverage+metrics+targets&source=bl&ots=eJw1L5oit9&sig=6fnE54EDRICZsN6nNcYXKbF8cXQ&hl=en&ei=5wvCTOy3MYXEsAOn9dhB&sa=X&oi=book_result&ct=result&resnum=3&ved=0CCUQ6AEwAg#v=onepage&q&f=false

Friday, December 16, 2011

Stopwords in SearchWorks - to be or not to be?

We've been examining whether or not to restore stopwords to Stanford's SearchWorks index (http://searchworks.stanford.edu).

Stopwords are words ignored by a search engine when matching queries to results. Any list of terms can be a stopword list; most often the stopwords comprise the most commonly occurring words in a language, occasionally limited to certain functions (articles, prepositions vs. verbs, nouns).

The original usage of stopwords in search engines was to improve index performance (query matching time and disk usage) without degrading result relevancy (and possibly improving it!). It is common practice for search engines to employ stopwords; in fact Solr (http://lucene.apache.org/solr), the search engine behind SearchWorks, has English stopwords turned on as the default setting. We had no compelling reason to change most of the default Solr settings.  Thus, since SearchWorks's inception we have been using the following stopword list:

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, s, such, t, that, the, their, then, there, these, they, this, to, was, will, with.

What follows is an analysis of how stopwords are currently affecting SearchWorks, and what might happen if we restore stopwords to SearchWorks, making every query term significant.

 

Executive Summary

We believe that restoring stopwords to SearchWorks could improve results in up to 18% of the searches, and will degrade results only in the small number of searches with more than 6 terms.

 

How Many Terms are there in User Queries?

Over 50% of the query strings for SearchWorks are 1 or 2 terms.
Over 75% of the query strings are 1, 2 or 3 terms.
Over 90% of the query strings for SearchWorks have 6 or fewer terms.

This is strictly query strings; it does not include facet values or other parameters.  Here is a histogram showing the number of terms in our queries for October 2011.  Note that single term queries are split into "alphanum" and "numeric".


Source: (from Google Analytics for Oct 2011, analyzed by Casey Mullin)

 

What Percentage of Query Strings have Stopwords?

In November 2011, there were 142,869 searches.  Stopwords appeared 26,076 searches. Thus, stopwords appeared in roughly 18% of searches.



(Per analysis of November 2011 usage statistics by Casey Mullin, sent in email on Dec 14, 2011).

 

Do the Stopwords Currently Used in Queries Imply the Users are Trying Boolean Searches?

The 10 stopwords appearing most often in queries are (for November 2011):

Stopwordoccurrences in queries
the7578
of6582
and4106
in2298
a1137
to1033
for695
on685
an289
with231

or and not do not appear in many queries, while and is not the most frequent stopword, nor close to it in occurrences. I interpret this to mean stopwords in queries are NOT intended as boolean operators.

(per analysis of November 2011 usage statistics by Casey Mullin, sent in email on Dec 14, 2011).

 

What About Minimum Must Match?

Restoring stopwords could hugely degrade precision, since stopwords occur so often.  Solr's mm setting (minimum must match) gives us a way to mitigate this problem.  In our index employing stopwords, our mm threshold is 4:  queries with up to 4 terms must match all 4 terms;  for 5 or more query terms, 90% must match.   Given that over 90% of queries have 6 or fewer terms, 6 seems an appropriate threshold for an index that includes all words.

As it happens, increasing our mm threshold was proposed a while back, distinct from the idea of restoring stopwords to the index. 


What is Improved by Restoring Stopwords to the Index?

  1. Searches comprised only of stopwords now retrieve results (improved recall) 
    • to be or not to be (with or without quotes) 
  2. Precision is greatly improved for short searches that include stopwords 
    • pearl vs. the pearl
    • the one
    • A Zukofsky (author Zukofsky, title "A")
    • there will be blood  (3 stopwords, so huge improvement)
    • OR spectrum (a periodical)
    • Jazz: an Introduction
  3. Subject links distinguish "in" from "and", etc. 
    • Archaeology in Literature is no longer conflated with Archaeology and Literature
  4. Improved results for languages having words overlapping English stopwords

 

What is Degraded by Restoring Stopwords to the Index?

  1. long queries (over 6 terms) with a lot of stopwords have reduced precision ...  BUT the words occurring as a phrase do float to the top. 
    • Lectures on the Calculus of Variations and Optimal Control Theory

 

What Else Have Testers Reported?

  • Known Item Searches: 
    • restoring stopwords tied or improved our testers' known item searches. 
    • one exception: 
      • a search for dorothy and the wizard OF oz did not retrieve the desired title, which was actually dorothy and the wizard IN oz. 
  • Series Searches, and Uniform Title:
    • "A potential problem of the stopword change is that title access points (aka uniform title) constructed according to AACR2 are without initial articles. So, for instance, the access point for the series "The NASA history series" is "NASA history series". A query that includes the initial article will not affect the search result in current production SW because "the" is eliminated as a stopword, but will affect the search result when stopwords are treated as significant words. On searchworks-test, a phrase title search for "The NASA history series" retrieves 76 records. The same search on production retrieves 125 records. The test search still retrieves some of the records that belong to this series because the transcribed series statement, which is in the 490 field, includes the initial article, but not all of them do. The series access points in the 830 field are all without the initial article. [Symphony browse series retrieves 94 results.]"
    • my reaction: in the metadata advisory group, many of the records we examined had the "wrong" information in the field (it included the initial article, and it shouldn't have). Sooo … our data is dirty -- shocking, but true. It would also be nice to know how often the affected searches are exercised, especially by end-users.

 

Additional Comments

Everything is Imperfect. 
  • SearchWorks employing stopwords gives imperfect search results. 
  • SearchWorks restoring stopwords, so that every term is signficant, gives different imperfect search results.
  • Socrates (our OPAC from our ILS, Sirsi) gives yet different imperfect search results. 
The back end algorithms for determining what results match a query will always be fairly opaque to the end users - the algorithms are complicated. Moreover, users will have typos and other mistakes in their queries no matter what we do, and it seems unlikely we can consistently rescue them from themselves.

Everything Can be Changed.

Solr gives us incredible control over our search engine's algorithm. There are many many knobs we can twiddle in our quest to improve the relevancy of search results. A few of the possibilities include:
  • mm -- require a higher percentage of matching terms when there are more than 6 terms in the query
  • phrase boosting -- this floats result with the query terms occurring close together (and presumably in the same order) to the top.  Currently it seems high enough, but we have never performed any empirical tests.
  • phrase slop -- how close words must occur to each other in the results.  Our current setting is 3; it is not clear to me exactly how phrase boosting and phrase slop interact.
  • adjust the relative boosting of fields -- give even more weight to title field matches, etc.  Again, we've never performed any empirical tests.
  • indexed string length doesn't always have to matter -- adjust the situations where the length of the indexed string affects the score of matches.  E.g. query "my cat" can score higher for title "my cat" than for "my cat and dog."

 

So Where Are We Now?

The data is in, and a decision will be made soon.  I'm guessing stopwords are going to be left in our past.

Tuesday, September 27, 2011

Cucmber Step Definition with inline comment

Have you ever wanted to put a comment on the same line as a cucumber step?

    And I should see "M666" # local_id
    And I should see "1977-1997" # create date


It just occurred to me that I could create a step definition to allow this:


  # 'I should see "text"' step  with comment at end of line
  Then /^I should see "([^"]*)"(?: +\#.*)$/ do |text|
      Given "I should see \"#{text}\"" 
  end

If your text could include escaped quotes, you can use this step definition:

   # 'I should see "text"' step  with comment at end of line
   Then /^I should see "(.*?)"(?: +\#.*)$/ do |text|
      text.gsub!(/\\"/, '"')
      assert page.has_content?(text)
   end




Tuesday, February 8, 2011

Expressing (Search Result) Expectations as Cucumber Scenarios

As many of you know, SearchWorks is Stanford's Blacklight instance providing a "next generation" User Interface for materials at the Stanford Library.  What follows is a document I wrote for internal use so that motivated staff could provide feedback in the form of cucumber scenarios.  This blog post might make more sense in the context of my presentation at Code4Lib 2011 ... but this seemed a worthy blog post nevertheless.  Don't be put off by the length - a lot of what follows is examples.   

   

How to Write SearchWorks Search Result Expectations as Cucumber Scenarios

Sometimes we ask folks to check something new in SearchWorks. (thanks for your help!)
Sometimes people notice problems and report them: via the feedback form (thanks!), or a direct JIRA ticket (thanks!) or via email (less optimal, thanks!)
Occasionally people tell us specifics of something that is working "correctly."
When you ask yourself questions like these, then you are doing a "manual" test:
  • "Is SearchWorks getting the right search results?"
    • "SearchWorks is getting the right results because ..."
    • "The results in SearchWorks aren't ordered correctly. They should be ..."
    • "I know SearchWorks is wrong because ..."
  • "Are things displaying correctly?"
  • "The vernacular title should be ..."
Whenever you test something ("manually") in SearchWorks, we would like to capture your expectations as a cucumber scenario so we can run the test repeatedly and automate it.
Benefits:
  1. we won't have to keep asking you to check the same things over and over. Imagine never having to perform a given test search again!
  2. we can ensure that applying a fix for one problem won't inadvertently break something we've already fixed.
  3. we can automate running a large suite of tests nightly so we keep checking that we haven't broken anything.
  4. as we add specific searches and expected results against our own (meta)data corpus, we are accruing relevancy tests for our own data, based on human review of search results.
Sadly, this does not mean we can make all tests pass – sometimes, we can't achieve the ideal. There may be unacceptable tradeoffs, or it might just be too difficult technically to warrant pursuit. We do have a way to hang on to tests that should pass but are not passing at the current time, so these sorts of tests are welcome as well.
We would still like JIRA issues filed for FAILING cuke tests, and the JIRA issue identifier put in the scenario description.
The tests are easy to write.
Here are some sample cucumber scenarios:
Scenario: Query for "cooking" should have exact word matches before stemmed ones (VUF-123)
  Given a SOLR index with Stanford MARC data
  And I go to the home page
  When I fill in "q" with "cooking"
  And I press "search"
  Then I should get ckey 4779910 in the first 1 result
  And I should get result titles that contain "cooking" as the first 20 results

Scenario: relevance sort explicitly selected: score, then pub date desc, then title asc (SW-175)
  Given a SOLR index with Stanford MARC data
  When I go to the home page
  And I follow "Newspaper"
  And I select "author" from "sort"
  And I press "sort_submit"
  And I select "relevance" from "sort"
  And I press "sort_submit"
  # alpha for 2007
  Then I should get ckey 7141368 before ckey 7097229
  # newer year (2007) before older year (2005)
  And I should get ckey 8214257 before ckey 5985299
There are more samples at the bottom of this page.

Basic concepts

We use Cucumber (http://cukes.info) to automatically test the behavior of SearchWorks. It matches specific language in the "scenario" (cucumber parlance) with actions to perform, like filling in the search box, and then hitting return. Or clicking a facet link. Or going to a particular record to ensure information is properly displayed.
Cucumber does string matching (using regular expressions) to turn the natural language expressing expected behavior into executable test code. But you don't need to worry about it - just follow the specific language rules and you'll be supplying tests to the grateful engineers.

Scenarios

Each cucumber test is called a "scenario." It can have multiple actions taken, and what is displayed after each step can be examined. The idea is to capture how you're interacting with the web page (clicking buttons, selecting from pull downs, typing in text boxes) and what you expect to be displayed.

Be as precise as possible, BUT

We want the tests to be as useful as possible. Relevancy of search results can sometimes be as clear as
  • "record 666 should be the first result"
  • "the first 4 results should be"
  • "record 777 should be before record 999"
    There are more possibilities given in the "statements" section below.

Try to leave wiggle room for changes to our collection.

If a test is too rigid in its expectations, then small changes can make the test fail. These are the sorts of questions that help determine if the test is too brittle:
  • Are we likely to get more resources of the exact title you expect as the first result?
  • Are we likely to get more resources for the subject heading?
  • Are we likely to get resources that are a better match to the search terms?

Statements

  1. The quotes or absence of quotes in the statements below is important.
  2. When, Then, and And at the beginning of the statements are interchangeable.
  3. If you can't express your expectations with the statements below, please file a JIRA ticket telling us what you are writing a test for. We may be able to add more statements to enable the scenario.
(Step Definition Code for the following is available at
http://www.stanford.edu/~ndushay/code4lib2011/search_result_steps.rb)

All Scenarios must start:
All Scenarios must start:
Scenario: (free text description, keep it short) (JIRA issue identifier)
  Given a SOLR index with Stanford MARC data

Indicate Your Starting Point

  • When I go to the home page
    • Use this for searching scenarios.
  • When I go to the advanced search page
  • When I am on the show page for "________"
    • Use this when you are talking about a particular record.
    • Fill in the blank with an id (ckey).

You're At Your Starting Page; Now Do Something.

Fill in a Text Box
  • When I fill in "q" with "___________________"
    • Use: searches without quotes.
    • Fill in the blank with any string for the search text box (no quotes allowed).
      • When I fill in "q" with "gobblety gook"
      • When I fill in "q" "under the sea-wind"
      • When I fill in "q" with "Shindy AND Delilah"
  • When I fill in the search box with "_________________"
    • Use: searches containing quotes.
    • Fill in the blank with any string for the search text box, and if there are quotes, escape them with a backslash
      • When I fill in the search box with "\"under the sea-wind\""
Pressing a Button
  • And I press "________"
    • Use: pressing a button
      • And I press "search"
      • And I press "per_page_submit"
Selecting from a Pulldown
  • And I select "_____" from "______"
    • Use: selecting a value from a pull-down. (If you don't know the official name of the pulldown, we'll figure it out.
    • Fill in the first blank with the selected value; fill in the second blank with the name of the pulldown.
      • And I select "Title" from "search_field"
      • When I select "author" from "sort"
      • And I select "100" from "per_page"
Following a Link
  • And I follow "____________"
    • Fill in the blank with the link text (NOT the url it goes to)
    • This is how we select facets in testing.
      • And I follow "Journal/Periodical"
      • And I follow "Hoover Library"
      • And I follow "Chinese"
Checkbox Selection and Un-Selection
  • I check "____________"
  • I uncheck "____________"
    • Fill in the blank with label of the checkbox (the text displayed next to it)
Radio Buttons
  • I choose "____________"
    • Fill in the blank with label of the selected radio button (the text displayed next to it)

Look at What You Got Back

  • Then I should get results
  • Then I should not get results
    • Use this only if
      1. you can't provide at least one id (ckey) OR
      2. you can't provide a ballpark number of expected results
  • Then I should get (at least|at most) ___ results
    • Use when the number of results is less than the default number per page (currently 20)
    • Pick the appropriate qualifier; fill in the blank with a number
      • Then I should get at least 2 results
      • Then I should get at most 19 results
  • Then I should get (at least|at most) ___ total results
    • Use when the number of results is more than the default number per page (currently 20)
    • Pick the appropriate qualifier; fill in the blank with a number
      • Then I should get at least 250 total results
      • Then I should get at most 50 total results
  • Then I should get ckey _______ in the results
  • Then I should not get ckey _______ in the results
    • Fill in the blank with a single ckey expected (or not expected) in the first page of results.
    • The latter can be used to exclude false positives.
  • Then I should get ckey _______ in the first ___ results
  • Then I should not get ckey _______ in the first ___ results
    • These are good statements when a particular record should be "above the fold" or should clearly be the first result, or when you want to ensure a particular false positive isn't polluting the top search results.
    • Fill in the first blank with a single ckey, and the second blank with a number lower than the default number per page (currently 20). The last word may be result or results.
      • Then I should get ckey 12345 in the first 1 result
      • Then I should get ckey 12345 in the first 3 results
  • Then I should get ckey _______ before ckey _______
    • Use this to specify result ordering, such as after a particular sort.
  • Then I should get (the same number of|fewer|more) results (than|as) (a|an) (.)search for "_______"*
    • Use: compare number of results with different search
    • "than" and "as" are interchangeable, as are "a" and "an"
    • "title" "author" "subject" may be put before search to indicate a specialized search.
    • query string may contain quotes - but they must be escaped with a backslash
      • Then I should get fewer results than a search for "wonderbread"
      • Then I should get more results than an author search for "\"James Herriot\""
      • Then I should get the same number of results as a title search for "jack in the beanstalk"
  • Then I should get at least ____ of these ckeys in the first ___ results: "______________"
    • fill in the first two blanks with positive integers, fill in the blank with a list of ckeys separated by comma-space: "1234, 23324, 1523"
      • Then I should get at least 4 of these ckeys in the first 4 results: "7637875, 336046, 6634054, 2130330"
  • Then I should get ckey _______ and ckey _______ within ___ positions of each other
    • Then I should get ckey 6974167 and ckey 5757985 within 2 positions of each other
  • Then I should get result titles that contain "______________" as the first ___ results
    • Use when you think a term or phrase in the title will be a less brittle test than ckeys. (originally used to detect if exact matches sort higher than stemmed matches.)
      • Then I should get result titles that contain "arabic" as the first 20 results
  • Then I should see "______________"
  • Then I should not see "______________"
  • Then I should see "______________" (at least|at most|exactly) ___ times
    • Use when you want to find visible text somewhere on the page. Generally too vague for search tests.
      • Then I should see "Carnoy, Martin"
      • Then I should see "Refine" exactly 2 times

Facet Expectations

  • Then the facet "______________" should display
  • Then the facet "______________" should not display
    • Then the facet "Russian" should display
    • Then the facet "Choctaw" should not display
  • Then I should get facet "_____________" before facet "_____________"
    • Then I should get facet "Croatian" before facet "Czech"

Call Number ordering in show view

  • Then I should get callnumber "_____________" before callnumber "_____________"
    • Then I should get callnumber "505 .S343 V.20 1972" before callnumber "505 .S343 V.21:1 1973"

Example Scenarios

Examples: Simple Searches
Scenario: Query for "cooking" should have exact word matches before stemmed ones (VUF-321)
  Given a SOLR index with Stanford MARC data
  And I go to the home page
  When I fill in "q" with "cooking"
  And I press "search"
  Then I should get ckey 4779910 in the first 1 result
  And I should get result titles that contain "cooking" as the first 20 results

Scenario: Expect specific match and non-match for "french beans food scares" without quotes (VUF-123)
  Given a SOLR index with Stanford MARC data
  And I go to the home page
  When I fill in "q" with "french beans food scares"
  And I press "search"
  Then I should get ckey 7716344 in the first 1 result
  And I should NOT get ckey 6955556 in the results
Examples: Specialized Searches
Scenario: Single Author Title search matches Socrates results (SW-5)
  Given a SOLR index with Stanford MARC data
  When I go to the advanced search page
  And I fill in "author" with "McRae"
  And I fill in "title" with "Jazz"
  And I press "advanced_search_button"
  Then I should get at least 4 of these ckeys in the first 4 results: "7637875, 336046, 6634054, 2130330"

Scenario: Search for non-existent author should yield zero results (VUF-5)
  Given a SOLR index with Stanford MARC data
  When I go to the home page
  And I fill in "q" with "jill kerr conway"
  And I select "Author" from "search_field"
  And I press "search"
  Then I should get at most 0 results

Scenario: Stopwords in title searches should be ignored - 3 terms total (SW-14)
  Given I am on the home page
  When I fill in "q" with "alice in wonderland"
  And I select "Title" from "search_field"
  And I press "search"
  Then I should get at least 100 total results
  And I should get the same number of results as a title search for "alice wonderland"

Scenario: Thesis advisors (720 fields) should be included in author search (SW-3)
  Given a SOLR index with Stanford MARC data
  And I go to the home page
  When I fill in "q" with "Zare"
  And I select "Author" from "search_field"
  And I press "search"
  Then I should get at least 10 results
  And I should see "Thesis"
Example: Multi-Button Presses
Scenario: relevance sort explicitly selected: score, then pub date desc, then title asc (SW-666)
  Given a SOLR index with Stanford MARC data
  When I go to the home page
  And I follow "Newspaper"
  And I select "author" from "sort"
  And I press "sort_submit"
  And I select "relevance" from "sort"
  And I press "sort_submit"
  # alpha for 2007
  Then I should get ckey 7141368 before ckey 7097229
  # newer year (2007) before older year (2005)
  And I should get ckey 8214257 before ckey 5985299

Scenario: Call Number
  Given a SOLR index with Stanford MARC data
  When I am on the home page
  Then I should see "Archive of Recorded Sound"
  When I follow "Archive of Recorded Sound"
  Then I should see "[remove]"
  And I should get at least 10 results
Example: Non-Latin Script, Per Page selection
Scenario: Cyrillic (VUF-22)
  Given a SOLR index with Stanford MARC data
  And I go to the home page
  When I fill in "q" with "пушкин pushkin"
  And I select "Title" from "search_field"
  And I press "search"
  And I select "50" from "per_page"
  And I press "per_page_submit"
  Then I should get at least 12 results
  And I should get ckey 216398 in the results
  And I should get ckey 7898778 in the results
Example: Selecting Facet
Scenario: japanese journal of applied physics PAPERS - 780t, 785t indexed (VUF-11)
  Given a SOLR index with Stanford MARC data
  And I go to the home page
  When I fill in "q" with "japanese journal of applied physics papers"
  And I select "Title" from "search_field"
  And I press "search"
  Then I should get at least 7 of these ckeys in the first 8 results: "365562, 491322, 491323, 7519522, 7519487, 460630, 787934"
  When I follow "Journal/Periodical"
  Then I should get at least 5 of these ckeys in the first 5 results: "7519522, 365562, 491322, 491323, 7519522"
Examples: Call Number Sorting in Record
Scenario: The show view call numbers should be in volume reverse sort order for serials (VUF-666)
  Given a SOLR index with Stanford MARC data
  When I go to the show page for "370790"
  Then I should get callnumber "570.5 .N287 V.25-26 1935" before callnumber "570.5 .N287