Konferans Bildirisi
BibTex RIS Kaynak Göster

Evaluation of an Event Detection Algorithm for Russian and Kazakh Languages

Yıl 2021, Cilt: 16 , 82 - 86, 31.12.2021
https://doi.org/10.55549/epstem.1068556

Öz

The Event Detection area is gaining increasing interest among researchers. The social media data growth induces the emergence of new algorithms along with the improvement of existing solutions. In this paper we propose to improve of existing algorithm for event detection, SEDTWik (Segment-based Event Detection from Tweets using Wikipedia). The authors define event as a set of similar segments of words within a given time window. A segment is defined as a word or phrase taken from the analyzed text data. The SEDTWik uses Wikipedia as a “supervisor” to identify the segments, to calculate the segments’ bursty value and to calculate the segments’ newsworthiness. We examined the SEDTWik algorithm using our data from Telegram online social network. The overall network message construction of Twitter is different from that of Telegram. Therefore, we transformed the Telegram meta-data to fit the SEDTWik requirements. Another much relevant difference in our experiment lies in the fact that our corpora contain messages in Russian and Kazakh languages. Our results show that the SEDTWik algorithm is strongly dependent on the broad and unfocused Wikipedia data. Such dependency was shown to have a loss effect on the event detection accuracy. This result founds our motivation to improve the SEDTWik algorithm using dynamically calculated segment probabilities from the analyzing data streams.

Kaynakça

  • Du, X., & Cardie, C. (2020). Event extraction by answering (almost) natural questions. arXiv preprint arXiv:2004.13625.
  • Hamborg, F., Breitinger, C., & Gipp, B. (2019). Giveme5w1h: A universal system for extracting main events from news articles. arXiv preprint arXiv:1909.02766.
  • Liu, X., Luo, Z., & Huang, H. (2018). Jointly multiple events extraction via attention-based graph information aggregation. arXiv preprint arXiv:1809.09078.
  • McMinn, A. J., Moshfeghi, Y., & Jose, J. M. (2013, October). Building a large-scale corpus for evaluating event detection on twitter. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management (pp. 409-418).
  • Morabia, K., Murthy, N. L. B., Malapati, A., & Samant, S. (2019, June). SEDTWik: segmentation-based event detection from tweets using Wikipedia. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop (pp. 77-85).
  • Mussina, A. B., Aubakirov, S. S., & Trigo, P. (2020, November). An Architecture for Real-Time Massive Data Extraction from Social Media. In International Conference on Mathematical Modeling and Supercomputer Technologies (pp. 138-145). Springer. https://doi.org/10.1007/978-3-030-78759-2_11
  • Mussina, A., & Aubakirov, S. (2018). Dictionary extraction based on statistical data. Вестник КазНУ. Серия математика, механика, информатика, 94(2), 72-82.
  • Papers with code. (n.d.). Event detection. Papers with code. https://paperswithcode.com/task/event-detection
  • Sun Aixin. (n.d.). Wikipedia Keyphraseness. https://personal.ntu.edu.sg/axsun/datasets.html.
  • Wikimedia Downloads. (n.d.). Data downloads. https://dumps.wikimedia.org/
Yıl 2021, Cilt: 16 , 82 - 86, 31.12.2021
https://doi.org/10.55549/epstem.1068556

Öz

Kaynakça

  • Du, X., & Cardie, C. (2020). Event extraction by answering (almost) natural questions. arXiv preprint arXiv:2004.13625.
  • Hamborg, F., Breitinger, C., & Gipp, B. (2019). Giveme5w1h: A universal system for extracting main events from news articles. arXiv preprint arXiv:1909.02766.
  • Liu, X., Luo, Z., & Huang, H. (2018). Jointly multiple events extraction via attention-based graph information aggregation. arXiv preprint arXiv:1809.09078.
  • McMinn, A. J., Moshfeghi, Y., & Jose, J. M. (2013, October). Building a large-scale corpus for evaluating event detection on twitter. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management (pp. 409-418).
  • Morabia, K., Murthy, N. L. B., Malapati, A., & Samant, S. (2019, June). SEDTWik: segmentation-based event detection from tweets using Wikipedia. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop (pp. 77-85).
  • Mussina, A. B., Aubakirov, S. S., & Trigo, P. (2020, November). An Architecture for Real-Time Massive Data Extraction from Social Media. In International Conference on Mathematical Modeling and Supercomputer Technologies (pp. 138-145). Springer. https://doi.org/10.1007/978-3-030-78759-2_11
  • Mussina, A., & Aubakirov, S. (2018). Dictionary extraction based on statistical data. Вестник КазНУ. Серия математика, механика, информатика, 94(2), 72-82.
  • Papers with code. (n.d.). Event detection. Papers with code. https://paperswithcode.com/task/event-detection
  • Sun Aixin. (n.d.). Wikipedia Keyphraseness. https://personal.ntu.edu.sg/axsun/datasets.html.
  • Wikimedia Downloads. (n.d.). Data downloads. https://dumps.wikimedia.org/
Toplam 10 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Mühendislik
Bölüm Makaleler
Yazarlar

Aigerim Mussına

Sanzhar Aubakırov

Paulo Trıgo

Yayımlanma Tarihi 31 Aralık 2021
Yayımlandığı Sayı Yıl 2021Cilt: 16

Kaynak Göster

APA Mussına, A., Aubakırov, S., & Trıgo, P. (2021). Evaluation of an Event Detection Algorithm for Russian and Kazakh Languages. The Eurasia Proceedings of Science Technology Engineering and Mathematics, 16, 82-86. https://doi.org/10.55549/epstem.1068556