Search¶
Introduction¶
The search module contains functions for conducting searches against the Google Custom Search API, and then for parsing the results.
Usage¶
To use the utils module:
>>> import coast_search
>>> coast_search.search.function(to_use)
or: .. code-block:: console
>>> from coast_search import search
>>> search.function(to_use)
Functions¶
Title: search_command.py Author: Ashley Williams Description: A collection of functions that can be used for running searches. This module calls is called by init, so there is no need to import this module specifically. Refer to the documentation for details of how to use this module (http://coast_search.readthedocs.io/).
-
coast_search.search.
deduplicate_urls
(json_data)¶ function to create and return a list of deduplicated URLS Args:
json_data: json data result from queriesReturns: a list of deduplicated urls. If there is duplication across segments, also returns a warning
-
coast_search.search.
extract_search_results_from_JSON
(json_data)¶ Given the json output of the search queries, extracts the results(i.e. the URLS, titles from the search results) Args:
json_data: the json output result from the searches- Returns:
- json obj of the relevant extracted data
-
coast_search.search.
get_object_to_write
(result)¶ Returns the constructed object containing the search results data, for later analysis. Args:
result: The output from running the query- Returns:
- object_to_write: the constructed object containing desired search result data
-
coast_search.search.
queryAPI
(query, number_of_results, api_key, search_engine_id, segment_id)¶ Query the API, return the results as a list of JSON objects. Refer to the documentation for usage guidelines and descriptions of what each parameter means (http://coast_search.readthedocs.io/). Args:
query: The query string to run. number_of_results: The number of results you wish to be returned.
Note, the free version of the Custom Search API is limited to 100 searches per day. Each search returns 10 results.api_key: The api key of the search engine, provided by Google. search_engine_id: The id of the Custom Search Engine provided by
Google.segment_id: The segment which the results belong to.
- Returns:
- results_list: The results from Google as a list of JSON objects
- Err:
- In the event of an error, the error is printed to the stdout.
-
coast_search.search.
run_all_queries
(query_dict_list, number_of_runs, number_of_results, day, search_backup_dir)¶ Given a list of queries and configuration parameters, calls the method run_query for each query object in the given list. Args:
query_dict_list: list of query data for all of the queries wanting to be searched number_of_runs: number of desired runs (from config file) number_of_results: number of desired results (from config file) day: Day number in search process (number of days since start date) search_backup_dir: location to store file output of searches- Returns:
- object containing results of all of the queries
-
coast_search.search.
run_daily_search
(config_file, write_to_file_flag)¶ Run a full daily search. This function can be set up as a cronjob (or scheduled task on Windows) to search over consecutive days. Refer to the documentation for usage guidelines and descriptions of how the config file should be structured (http://coast_search.readthedocs.io/). Args:
- config_file: Path to a JSON file containing all relevant information for
- conducting the searches.
write_to_file_flag: boolean flag for writing to file
Returns: results from the search
-
coast_search.search.
run_query
(query_string, number_of_runs, number_of_results, api_key, search_engine_id, segment_id, day, backup_dir)¶ Runs the query against the Google Custom Search API. Writes the results to file and appends them to the extracted results list. Refer to the documentation for usage guidelines and descriptions of what each parameter means (http://coast_search.readthedocs.io/). Args:
query_string: The query string to run. number_of_runs: The number of runs you wish to be repeat for each
day. Note, the free version of the Custom Search API is limited to 100 searches per day. Each search returns 10 results.- number_of_results: The number of results you wish to be returned.
- Note, the free version of the Custom Search API is limited to 100 searches per day. Each search returns 10 results.
api_key: The api key of the search engine, provided by Google. search_engine_id: The id of the Custom Search Engine provided by
Google.segment_id: The segment which the results belong to. day: The day of the search period that the result has originated
from.- backup_dir: A directory that can be used for storing results
- as files.
Returns: extracted_results: list of results
-
coast_search.search.
write_to_file
(name, result, directory, extension)¶ Writes to results to a file Args:
name: desired filename result: directory: A directory that can be used for storing results
as files.extension: the desired file extension, e.g: json or txt result: The output from running the query.