Publications

You can also access our individual websites (via the Members page) for further information about our research and lists of our publications.

Citation

BibTex format

@inbook{Altuncu:2021:10.1007/978-3-030-65351-4,
author = {Altuncu, T and Yaliraki, S and Barahona, M},
booktitle = {Complex Networks & Their Applications IX},
doi = {10.1007/978-3-030-65351-4},
editor = {Benito and Cherifi and Cherifi and Moro and Rocha and Sales-Pardo},
pages = {154--166},
publisher = {Springer International Publishing},
title = {Graph-based topic extraction from vector embeddings of text documents: application to a corpus of news articles},
url = {http://dx.doi.org/10.1007/978-3-030-65351-4},
year = {2021}
}

RIS format (EndNote, RefMan)

TY  - CHAP
AB - Production of news content is growing at an astonishing rate. To help manage and monitor the sheer amount of text, there is an increasing need to develop efficient methods that can provide insights into emerging content areas, and stratify unstructured corpora of text into ‘topics’ that stem intrinsically from content similarity. Here we present an unsupervised framework that brings together powerful vector embeddings from natural language processing with tools from multiscale graph partitioning that can revealnatural partitions at different resolutions without making a priori assumptions about the number of clusters in the corpus. We show the advantages of graph-based clustering through end-to-end comparisons with other popular clustering and topic modelling methods, and also evaluate different text vector embeddings, from classic Bag-of-Words to Doc2Vec to the recent transformers based model Bert. This comparative work is showcased through an analysis of a corpus of US news coverage during the presidential election year of 2016.
AU - Altuncu,T
AU - Yaliraki,S
AU - Barahona,M
DO - 10.1007/978-3-030-65351-4
EP - 166
PB - Springer International Publishing
PY - 2021///
SN - 978-3-030-65351-4
SP - 154
TI - Graph-based topic extraction from vector embeddings of text documents: application to a corpus of news articles
T1 - Complex Networks & Their Applications IX
UR - http://dx.doi.org/10.1007/978-3-030-65351-4
UR - http://arxiv.org/abs/2010.15067v1
UR - https://link.springer.com/chapter/10.1007/978-3-030-65351-4_13
UR - http://hdl.handle.net/10044/1/97252
ER -