Kypseli Logo
    • Ελληνικά
    • English
  •  Home
  •  Browse 
    • Communities & Collections
    • By Issue Date
    • Authors
    • Titles
    • Subjects
    • By Issue number
  • Language elLanguage en
  •  Login 
    • Sign in
    View Item 
    • Home
    • Αποθετήριο Ανοικτού Πανεπιστημίου Κύπρου (Repository of the Open University of Cyprus)
    • Μεταπτυχιακές διατριβές / Master Τhesis
    • Πληροφοριακά και Επικοινωνιακά Συστήματα (ΕΛΛ) / Information and Communication Systems (in Greek)
    • View Item
    •   Home
    • Αποθετήριο Ανοικτού Πανεπιστημίου Κύπρου (Repository of the Open University of Cyprus)
    • Μεταπτυχιακές διατριβές / Master Τhesis
    • Πληροφοριακά και Επικοινωνιακά Συστήματα (ΕΛΛ) / Information and Communication Systems (in Greek)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Web data analysis and classification: Automated text classification by linguistic norms, content and genre

    Thumbnail
    View/Open
    ΠΕΣ-2020-00318.pdf (1.036Mb)
    Date
    2020-11
    Author
    Paschalidis, Christos
    Metadata
    Show full item record
    Abstract
    The dissertation aimed to expand the effort of utilizing the largest collection of texts on the Internet. The purpose of this work was to critically approach the methods of analysis and classification of web data and the creation of the deliverable system (Katigoriopoiitis) that utilizes linguistic norms, content and genre of websites in order to facilitate the way in which this data is presented. A bibliographical research on Web Data Mining aimed to describe the techniques of collecting information from the web. A presentation and cross comparison of machine learning algorithms (Naïve Bays, Decision Trees, K-Nearest neighbors and Support Vector machines) aimed to find the best fit for general purpose content classification for the implementation of the classifier. Accuracies of different classification models were tested on the same dataset. The outcome of the dissertation was that there are efficient techniques that can be applied in order to sufficiently use Internet information. Internet technologies and standards are getting richer and this maximizes the options for data mining. Content classification can be easily achieved by using simple model implementations. More sophisticated models are needed in order to achieve high accuracy in sentiment analysis, or classification based on linguistic norms.
    URI
    http://hdl.handle.net/11128/4750
    Collections
    • Πληροφοριακά και Επικοινωνιακά Συστήματα (ΕΛΛ) / Information and Communication Systems (in Greek)

    Open University of Cyprus

    PO Box 12794,

    2252, Latsia

    Cyprus

    Tel.: +357 22 411600

    Fax.: +357 22 411601

    • Help
    • Contact Us
    • Open University of Cyprus
    • OUC Library
    • Policies
    • Accessibility and Data Protection

    Find us on:

    • FacebookFacebook
    • EU Flag
    • Republic of Cyprus
    • Structural Funds
    • e University
    • Open University of Cyprus

    The eUniversity Project is co-founded by the European Regional Development Fund and National Funds in the Programmatic Period 2007-2013

     

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsBy Issue numberThis CollectionBy Issue DateAuthorsTitlesSubjectsBy Issue number

    My Account

    Sign inRegister

    Open University of Cyprus

    PO Box 12794,

    2252, Latsia

    Cyprus

    Tel.: +357 22 411600

    Fax.: +357 22 411601

    • Help
    • Contact Us
    • Open University of Cyprus
    • OUC Library
    • Policies
    • Accessibility and Data Protection

    Find us on:

    • FacebookFacebook
    • EU Flag
    • Republic of Cyprus
    • Structural Funds
    • e University
    • Open University of Cyprus

    The eUniversity Project is co-founded by the European Regional Development Fund and National Funds in the Programmatic Period 2007-2013