Query Generator¶
Introduction¶
The query generator module contains functions that can be used to generate a set of queries that follow the heuristics proposed by Rainer and Williams in IST‘18
Usage¶
To use the query_generator module:
>>> import coast_search
>>> coast_search.query_generator.function(to_use)
or: .. code-block:: console
>>> from coast_search import query_generator
>>> query_generator.function(to_use)
Functions¶
Contains the functions required for generating the multiple queries from the config, given n number of dimensions and any constraints
-
coast_search.query_generator.
add_api_config_to_queries
(generated_query_strings, search_engines)¶ Merges the two parameters and returns a list of dicts that include the api config.
If only 1 API key is provided, it is assumed this is valid for many searches and is used for all queries If more than 1 is provided, then the number of keys provided needs to match the number of queries
- Args:
- generated_query_strings: The output from the generate_query_strings
- function.
- search_engines: The search engines list that is found in the
- api_config file. See the documentation for usage guidelines (http://coast_search.readthedocs.io/).
- Returns:
- result_list: Updated list of query data now including search engine/api info
-
coast_search.query_generator.
add_to_result_list
(result_list, seg_id, logic, query)¶
-
coast_search.query_generator.
check_length
(seed, random, query_words, key_max)¶ Google limits searches to 32 words, so we need to make sure we won’t be generating anything longer Need to consider - number of words in seed - number of words in random phrase - number of words in the lists from the query Will raise exception if there are too many words
- Args:
- seed: the seed for segment 1 random: the random query string for segment 0 query_words: object with key=name of dimension, value=list of keywords to use in query key_max: the maximum number of words (32 in Google’s case)
- Returns:
- bool: True for correct number of words, False for too many
-
coast_search.query_generator.
generate_query_strings_n_dimensions
(dimensions_dict, seed='software', key_max=32)¶ Given dimensions and associated words, the seg1 seed and the max length of query, sets up and generates the query strings dynamically, depending on the number of dimensions.
- Args:
- dimensions_dict: dictionary containing the dimensions data. key=name, value=list of words seed: seg1 seed key_max: the maximum number of words (32 in Google’s case)
- Returns:
- result_data: an object containing data about each of the segments (id, logic, query) Returns None if check_length returns False
-
coast_search.query_generator.
generate_result_list
(dimensions_data, dimensions, seed, random)¶ Given the dimensions information, dynamically generates logic & query for each segment.
Note: segment 0 will always contain the random query, and segment 1 will always contain the seed.
- Args:
- dimensions_data: a list of phrases (e.g. reasoning/experience
- indicators)
dimensions: a list of names of the given dimensions seed: the seed for seg 1 random: the random phrase for seg 0
- Returns:
- result_string: a string of all phrases AND’d together ready for a
- search engine.
-
coast_search.query_generator.
get_random_query
(words_to_exclude)¶ Segment 1 uses a random seed query. This function creates that seed query using the random_words library and returns it. Notes:
1. The random query returned wont contain any word that exists in any topic string or indicator list. 2. The random query string will always be three words long.- Args:
- words_to_exclude: the list of words from each of the dimensions to use as stoplist
- Returns:
- qs: The generated random query.
-
coast_search.query_generator.
neg_query_segment
(phrase_list)¶ Given a list of phrases, returns a string of all the phrases negated. e.g. -“but” -“because” -“however” Args:
- phrase_list: a list of phrases (e.g. reasoning/experience
- indicators)
- Returns:
- result_string: a string of all phrases AND’d together ready for a
- search engine.
-
coast_search.query_generator.
pos_query_segment
(phrase_list)¶ Given a list of phrases, returns a string of all the phrases AND’d together. e.g. (“but” AND “because” AND “however”) Args:
- phrase_list: a list of phrases (e.g. reasoning/experience
- indicators)
- Returns:
- result_string: a string of all phrases AND’d together ready for a
- search engine.