OpenTREP Logo  0.6.0
C++ Open Travel Request Parsing Library
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
OPENTREP::Filter Struct Reference

Class filtering out the words not suitable for indexing and/or searching, when part of greater strings. Hence, most of the methods take as parameter the "initial"/greater string. More...

#include <opentrep/bom/Filter.hpp>

Static Public Member Functions

static void trim (std::string &ioPhrase, const NbOfLetters_T &iMinWordLength=4)
 
static bool shouldKeep (const std::string &iPhrase, const std::string &iWord)
 

Detailed Description

Class filtering out the words not suitable for indexing and/or searching, when part of greater strings. Hence, most of the methods take as parameter the "initial"/greater string.

For instance, words of length less than 3 (e.g., "de", "a", "san"), when part of greater strings (e.g., respectively, "rio de janeiro", "san francisco"), should not be indexed and searched for.

Definition at line 21 of file Filter.hpp.

Member Function Documentation

void OPENTREP::Filter::trim ( std::string &  ioPhrase,
const NbOfLetters_T iMinWordLength = 4 
)
static

Trim all the non-relevant words from the given phrase.

The following rules are applied to the right and left outer words, iteratively until no more outer word can be stripped out:

  • If the left or right outer word has no more than <iMinWordLength> letters (e.g., 'de', 'san'), it should be stripped out
  • If the left or right outer word is part of the "black-list" (e.g., 'airport', 'intl', 'international'), it should be filtered out
Parameters
std::string&The phrase to be amended (e.g., 'de san francisco', part of the 'aeroport de san francisco' global phrase).
constNbOfLetters_T& The minimum length of the words (default is 4 letters).

Definition at line 131 of file Filter.cpp.

References OPENTREP::createStringFromWordList(), OPENTREP::tokeniseStringIntoWordList(), and OPENTREP::trim().

Referenced by OPENTREP::Result::calculateCodeMatches().

bool OPENTREP::Filter::shouldKeep ( const std::string &  iPhrase,
const std::string &  iWord 
)
static

State whether or not to keep the given word, as opposed to filter out a non-indexable/searchable word.

The following rules are applied in sequence (if a rule applies, then the method returns, and the other rules are not processed/checked):

  • When the word is equal to the phrase (e.g., 'san'), it should be kept (not filtered out), as it is obviously here intentionally
  • If the word has no more than 3 letters (e.g., 'de', 'san'), it should be filtered out
  • If the word is part of the "black-list" (e.g., 'airport', 'intl', 'international'), it should be filtered out
Parameters
conststd::string& The initial phrase (e.g., 'san francisco airport').
conststd::string& The word on which a decision has to be made
Returns
bool Whether or not the word should be kept / filtered out

Definition at line 144 of file Filter.cpp.

References OPENTREP::hasGoodSize(), and OPENTREP::isBlackListed().

Referenced by OPENTREP::addUnmatchedWord(), OPENTREP::Result::calculateCombinedWeights(), and OPENTREP::Result::fullTextMatch().


The documentation for this struct was generated from the following files: