Apache lucene scoreing

12/16/2023

↑ and finally by a field normalization factor ( 0.5000), which tells us how many overall terms were in the field. ↑ This is multiplied by the idf which we calculated above ( 2.9105) ↑ We take the square root of the termFreq ( 2.0) = 1.4142 ↑ fieldWeight ( 2.0581) is how often the search term (‘ movies‘) appears in the field we searched on ‘tags’. This normalization factor is the same for all results returned by our query and just stops the queryWeight scores from becoming too exaggerated for any single result. and is itself multiplied by a normalization factor ( 0.2432) called queryNorm. ↑ This rarity is called inverse document frequency (idf) ↑ queryWeight ( 0.7079) is how rare the search term is within the whole index – in our case, ‘ movies‘ appears in 147 out of the 1000 documents in our index. ↑ The term (‘ movies‘) appears twice in the ‘ tags‘ field for document 127, so we get a term frequency of 2.0 ‘ tags:movies‘ is the raw query, 127 is the individual document number we’re examining, and DefaultSimilarity is the scoring mecahsnism we’re using. ↑ The total score for the ‘movies’ subquery is 1.4570. Let’s go line by line: 1.4570 weight(tags:movies in 127), result of: The sum of the two subqueries ( 1.4570 for ‘movies’ and 1.0255 for ‘kids’) are added to arrive at our total score.įor our first subquery, the ‘movies’ part, we arrive at the score of 1.4570 by multiplying queryWeight ( 0.709) by fieldWeight ( 2.0581).

As our query contained two terms, ‘movies’ and ‘kids’, Lucene breaks the overall query down into two subqueries. 2.4824 is the total score for this single search result.Ok! But still, there’s a lot going on in there. | 127 | 2.4824 | Movies, Kids, Animation, Movies | To get inside the formula for a given search result, Lucene provides an explanation feature, which we can call from code ( c# example in Lucene.Net): public List GetExplainerByRawQuery ( string rawQuery, int doc = 0 ) Ĭalling searcher.Explain(query, match.doc) gives us a text output explanation of how the matched document scores against the query: query: tags:movies|kids. If you’re new to Lucene (or even if you’re not!) this formula can be a bit to get your head around. This class implements the commonly referenced TfIdf scoring formula: By default, we use the DefaultSimilarityimplementation of the Similarityabstract class. There are many methods Lucene can use to calculate scoring. In our index, # is the unique document number, score is the the closeness of each hit to our query, and tags is a text field belonging to a document.

Each time you perform a search using Lucene, a score is applied to the results returned by your query.

0 Comments

Apache lucene scoreing

Leave a Reply.

Author

Archives

Categories