Query Generator

Introduction

The query generator module contains functions that can be used to generate a set of queries that follow the heuristics proposed by Rainer and Williams in IST‘18

Usage

To use the query_generator module:

>>> import coast_search
>>> coast_search.query_generator.function(to_use)

or: .. code-block:: console

>>> from coast_search import query_generator
>>> query_generator.function(to_use)

Functions

Contains the functions required for generating the multiple queries from the config, given n number of dimensions and any constraints

coast_search.query_generator.add_api_config_to_queries(generated_query_strings, search_engines)

Merges the two parameters and returns a list of dicts that include the api config.

If only 1 API key is provided, it is assumed this is valid for many searches and is used for all queries If more than 1 is provided, then the number of keys provided needs to match the number of queries

Args:
generated_query_strings: The output from the generate_query_strings
function.
search_engines: The search engines list that is found in the
api_config file. See the documentation for usage guidelines (http://coast_search.readthedocs.io/).
Returns:
result_list: Updated list of query data now including search engine/api info
coast_search.query_generator.add_to_result_list(result_list, seg_id, logic, query)
coast_search.query_generator.check_length(seed, random, query_words, key_max)

Google limits searches to 32 words, so we need to make sure we won’t be generating anything longer Need to consider - number of words in seed - number of words in random phrase - number of words in the lists from the query Will raise exception if there are too many words

Args:
seed: the seed for segment 1 random: the random query string for segment 0 query_words: object with key=name of dimension, value=list of keywords to use in query key_max: the maximum number of words (32 in Google’s case)
Returns:
bool: True for correct number of words, False for too many
coast_search.query_generator.generate_query_strings_n_dimensions(dimensions_dict, seed='software', key_max=32)

Given dimensions and associated words, the seg1 seed and the max length of query, sets up and generates the query strings dynamically, depending on the number of dimensions.

Args:
dimensions_dict: dictionary containing the dimensions data. key=name, value=list of words seed: seg1 seed key_max: the maximum number of words (32 in Google’s case)
Returns:
result_data: an object containing data about each of the segments (id, logic, query) Returns None if check_length returns False
coast_search.query_generator.generate_result_list(dimensions_data, dimensions, seed, random)

Given the dimensions information, dynamically generates logic & query for each segment.

Note: segment 0 will always contain the random query, and segment 1 will always contain the seed.

Args:
dimensions_data: a list of phrases (e.g. reasoning/experience
indicators)

dimensions: a list of names of the given dimensions seed: the seed for seg 1 random: the random phrase for seg 0

Returns:
result_string: a string of all phrases AND’d together ready for a
search engine.
coast_search.query_generator.get_random_query(words_to_exclude)

Segment 1 uses a random seed query. This function creates that seed query using the random_words library and returns it. Notes:

1. The random query returned wont contain any word that exists in any topic string or indicator list. 2. The random query string will always be three words long.
Args:
words_to_exclude: the list of words from each of the dimensions to use as stoplist
Returns:
qs: The generated random query.
coast_search.query_generator.neg_query_segment(phrase_list)

Given a list of phrases, returns a string of all the phrases negated. e.g. -“but” -“because” -“however” Args:

phrase_list: a list of phrases (e.g. reasoning/experience
indicators)
Returns:
result_string: a string of all phrases AND’d together ready for a
search engine.
coast_search.query_generator.pos_query_segment(phrase_list)

Given a list of phrases, returns a string of all the phrases AND’d together. e.g. (“but” AND “because” AND “however”) Args:

phrase_list: a list of phrases (e.g. reasoning/experience
indicators)
Returns:
result_string: a string of all phrases AND’d together ready for a
search engine.