Parses Html documents for words and links.
Keyoti2.SearchEngine.Core (Module: Keyoti2.SearchEngine.Core) Version: 2010.4.1.609
Creates a new instance of HtmlDocumentParser.
Gets the instance of the Configuration class that holds the settings to be used.(Inherited from Parser.)
Tries to find the encoding of a HTML file from the Content-type meta tag.
Attempts to returns the title of the document, based on the documentBody
The character encoding used in the document Stream, if applicable.(Inherited from Parser.)
Determines whether the specified(Inherited from is equal to the current . .)
Allows an(Inherited from to attempt to free resources and perform other cleanup operations before the is reclaimed by garbage collection. .)
Finds all ignore regions in documentBody.
Creates a footer with filename info from the Uri(Inherited from Parser.)
Serves as a hash function for a particular type.(Inherited from .)
Returns the next 'word' in rawBody, is iterative, so subsequent calls move to consecutive words.(Overrides Parser.GetNextWord(String).)
Gets the(Inherited from of the current instance. .)
Returns list of words as strings in an ArrayList, that are in the Uri(Inherited from Parser.)
Whether word last returned by GetNextWord is in title.(Overrides Parser.IsCurrentWordInTitle()()().)
Determines whether current word (at wordStart) is in an ignored region.(Inherited from Parser.)
Whether the parser would need a stream to be passed to it in order to perform a ReadText or ReadLinks operation.(Inherited from Parser.)
Creates a shallow copy of the current(Inherited from . .)
|ParseWords(String, ArrayList, WordCollection, StringBuilder, ArrayList)|
Parses rawBody into descrete Word objects and places them in readDocumentWords.(Inherited from Parser.)
Applies any required processing to a chunk of text that typically forms either a word or whitespace block.(Inherited from Parser.)
Processes the list of all words found in the document and returns a list that should be index.(Inherited from Parser.)
|Read(Stream, Uri, Encoding)|
Reads a document and returns an object holding it's text and any links.(Overrides Parser.Read(Stream, Uri, Encoding).)
Returns string read from 'stream'.
|ReadLinks(Stream, Encoding)|| Obsolete.|
Reads links to other pages.(Inherited from Parser.)
|ReadText(Stream, Uri, Encoding)|| Obsolete.|
Reads text and returns list of words and title(Inherited from Parser.)
Resets the current word being processed.(Inherited from Parser.)
Returns a(Inherited from that represents the current . .)
Removes repeated non-letters from word.(Inherited from Parser.)
The current word's end.(Inherited from Parser.)
The current word's start.(Inherited from Parser.)