Pesquisar este blog

terça-feira, 19 de fevereiro de 2013

Monte Carlo Re-sample and distance similarity semantic map for abstract recommendation system.


In this post I present the basic idea of the functionality of the recommendation article system of researchdiary.net.

The system of recommendation is based in the Monte Carlo re-sample method together with the similarity semantic map. The Monte Carlo re-sample is a technical used to construct a subsample from a mayor, or principal, sample. For our system, the principal sample are the total of articles added into the favorite list of one user. From this sample, we choose a group of $n$ abstracts in the favorite list, been the $n$ value lower than the total number of articles. This random process is important in order to estimate the preference of the user.
From the sub-sample generated by Monte Carlo method the system construct a semantic similarity map. To understand better this map, consider the following figure:
Semantic Similarity Map. The yellows circus represents the abstracts
and $w_{ij}$ are the similarity weights.

The yellow circus represent the abstract sub-sample, when the arrow are the connection among the abstract. Each arrow has a weight $w_{ij}$, where $i$ is the abstract index and $j$ is the connection index, that represent how much is semantically similar an article with others. For computing the similarity between tow article, the system use the API (application program interface) of the Vitalie Scurtu project (http://www.scurtu.it/). The final maximum degree of similarity is calculated by:

$P_{i} = \frac{1}{M}\sum\limits_{j=1}^{M}{w_{ij}},$

where $M$ is the number of connection of one abstract with other in the similarity map.

The abstract that has the greater value of $P_{i}$ is chosen as comparator for selection of the recommended article from news of the arXiv.

Cheers.


Share Twitter

Nenhum comentário:

Postar um comentário