Skip to content

Latest commit

 

History

History
12 lines (8 loc) · 856 Bytes

File metadata and controls

12 lines (8 loc) · 856 Bytes

LowResourceTextClassification-CN (Thesis Project)

This repository contains codes and resources for the Master thesis "Exploring different Chinese segmentation approaches: Benefits of radical-based segmentation in low-resource text classification" (2022-2023 Winter semester Eberhard-Karls-Universität Tübingen)

Resources used in this project

  1. Data for the TNews experiments: https://metatext.io/datasets/toutiao-text-classification-for-news-titles-(tnews)-(clue-benchmark)
  2. Data for the ChnSentiCorp experiments: https://ieee-dataport.org/open-access/chnsenticorp
  3. Data for the WU3D experiments: https://github.com/aidenwang9867/Weibo-User-Depression-Detection-Dataset
  4. Data for the SWSR experiments: https://zenodo.org/record/4773875
  5. The radical list: https://github.com/hankcs/sub-character-cws/blob/master/data/radical/radical.txt