Repository logo
  • English
  • Ελληνικά
  • Log In
    Have you forgotten your password?
Repository logo
  • Collections
  • Research Outputs
  • Projects
  • People
  • Statistics
  • English
  • Ελληνικά
  • Log In
    Have you forgotten your password?
  1. Home
  2. Ψηφιακό Αποθετήριο ΚΥΨΕΛΗ / Kypseli Digital Repository
  3. Theses / Διατριβές και Πτυχιακές Εργασίες
  4. Μεταπτυχιακές Διατριβές / Master Τheses
  5. Πληροφοριακά και Επικοινωνιακά Συστήματα (ΕΛΛ) / Information and Communication Systems (in Greek)
  6. Web data analysis and classification: Automated text classification by linguistic norms, content and genre
 
  • Details
Options

Web data analysis and classification: Automated text classification by linguistic norms, content and genre

Author(s)
Paschalidis, Christos
Date Issued
2020-11
Faculty
Σχολή Θετικών και Εφαρμοσμένων Επιστημών / Faculty of Pure and Applied Sciences 
Abstract
The dissertation aimed to expand the effort of utilizing the largest collection of texts on the Internet. The purpose of this work was to critically approach the methods of analysis and classification of web data and the creation of the deliverable system (Katigoriopoiitis) that utilizes linguistic norms, content and genre of websites in order to facilitate the way in which this data is presented. A bibliographical research on Web Data Mining aimed to describe the techniques of collecting information from the web. A presentation and cross comparison of machine learning algorithms (Naïve Bays, Decision Trees, K-Nearest neighbors and Support Vector machines) aimed to find the best fit for general purpose content classification for the implementation of the classifier. Accuracies of different classification models were tested on the same dataset. The outcome of the dissertation was that there are efficient techniques that can be applied in order to sufficiently use Internet information. Internet technologies and standards are getting richer and this maximizes the options for data mining. Content classification can be easily achieved by using simple model implementations. More sophisticated models are needed in order to achieve high accuracy in sentiment analysis, or classification based on linguistic norms.
Publisher
Ανοικτό Πανεπιστήμιο Κύπρου
Format
vi, 39 σ. ; 30 εκ.
Subjects

Web data -- Analysis ...

File(s)
Loading...
Thumbnail Image
Name

ΠΕΣ-2020-00318.pdf

Size

1.04 MB

Format

Adobe PDF

Checksum

(MD5):454f5c0bae1b1f919fc9c5173a9a9573

  • Contact Us
  • Cookie settings
  • Open University of Cyprus
  • OUC Library
  • Policies
  • Accessibility and Data Protection

Find us on:

FacebookFacebook

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science - Powered by Dataly