Add BERTopic.
This commit is contained in:
@@ -0,0 +1,40 @@
|
||||
After having created a BERTopic model, you might end up with over a hundred topics. Searching through those
|
||||
can be quite cumbersome especially if you are searching for a specific topic. Fortunately, BERTopic allows you
|
||||
to search for topics using search terms. First, let's create and train a BERTopic model:
|
||||
|
||||
|
||||
```python
|
||||
from bertopic import BERTopic
|
||||
from sklearn.datasets import fetch_20newsgroups
|
||||
|
||||
# Create topics
|
||||
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
|
||||
topic_model = BERTopic()
|
||||
topics, probs = topic_model.fit_transform(docs)
|
||||
```
|
||||
|
||||
After having trained our model, we can use `find_topics` to search for topics that are similar
|
||||
to an input search_term. Here, we are going to be searching for topics that closely relate the
|
||||
search term "motor". Then, we extract the most similar topic and check the results:
|
||||
|
||||
```python
|
||||
>>> similar_topics, similarity = topic_model.find_topics("motor", top_n=5)
|
||||
>>> topic_model.get_topic(similar_topics[0])
|
||||
[('bike', 0.02275997701645559),
|
||||
('motorcycle', 0.011391202866080292),
|
||||
('bikes', 0.00981187573649205),
|
||||
('dod', 0.009614623748226669),
|
||||
('honda', 0.008247663662558535),
|
||||
('ride', 0.0064683227888861945),
|
||||
('harley', 0.006355502638631013),
|
||||
('riding', 0.005766601561614182),
|
||||
('motorcycles', 0.005596372493714447),
|
||||
('advice', 0.005534544418830091)]
|
||||
```
|
||||
|
||||
It definitely seems that a topic was found that closely matches "motor". The topic seems to be motorcycle
|
||||
related and therefore matches our "motor" input. You can use the `similarity` variable to see how similar
|
||||
the extracted topics are to the search term.
|
||||
|
||||
!!! note
|
||||
You can only use this method if an embedding model was supplied to BERTopic using `embedding_model`.
|
||||
Reference in New Issue
Block a user