r studio/ Python : In this assignment, we are working with manuscripts and their reviews from a famous CS conference, ICLR (International Conference on Learning Representations). This is a top conference in computer science on machine learning.(Python not

Question

Question

r studio/ Python : In this assignment, we are working with manuscripts and their reviews from a famous CS conference, ICLR (International Conference on Learning Representations). This is a top conference in computer science on machine learning.(Python not

Dataset: ICLR

In this assignment, we are working with manuscripts and their reviews from a famous CS conference, ICLR (International Conference on Learning Representations). This is a top conference in computer science on machine learning.

Each manuscript have 2 - 3 reviews. Each row in the training.csv and test_contentonly.csv represent a review to a specific manuscript. They contains the following columns

id: id of manuscript
reviewer_name: name of reviewer for this manuscript
title: title of the manuscript
abstract: abstract of the manuscript
comments: review texts of this manuscript by a specific reviewer
decision: final decision (1 if the manuscript was accepted or 0 otherwise).

The decision column was not directly listed in the test_contentonly.csv. Instread, it was listed in test_label.csv.

1. Supervised methods (60 pts)

In this task, you need to predict whether an manuscript is accepted (1) or rejected (0), based on the review texts.

1.1 Dictionary method (20 pts)

Use dictionary method to predict whether manuscripts in the test data were accepted or rejected.

list the dictionaries you used
Discuss how you construct your dictionary (e.g., by reading and summarizing, using embedding, etc).

1.2 Supervised methods (20 pts)

Use dictionary method to predict whether manuscripts in the test data were accepted or rejected, using training.csv as the training data.

1.3 Evaluation (20 pts)

Compare supervised learning’s performance with dictionary methods, based on the test data. The correct labels are provided in test_label.csv. Report the following:

Precision
Recall
F1 score
AUC score (of ROC curves).

Discuss whether supervised methods or dictionary methods yield better performance. And what makes you achieve a good prediction performance?

2. Topic models (40 pts)

2.1 LDA (20 pts)

Run topic models for abstracts. What are the topics you picked up?
How many topics ($K$) you used? How did you make a decision on these variables?
You can do this for training data noly, or you can combine training and test data together.

2.2 Adding covariates (20 pts)

Run topic models for abstracts, and try to compare differences of those accepted and those that were not. Do you see clear differences in their topics?
You can do this for training data noly, or you can combine training and test data together.
Note: I do not think there is a Python implementation of the STM package. So if you do not know how to use R, you could separate the groups and run two LDA for each group.

Computer-Science engineering R-Studio Python

Add a comment Improve this question Transcribed image text

Answer 1

r studio/ Python : In this assignment, we are working with manuscripts and their reviews from a famous CS conference, ICLR (International Conference on Learning Representations). This is a top conference in computer science on machine learning.(Python not

Dataset: ICLR

1. Supervised methods (60 pts)

1.1 Dictionary method (20 pts)

1.2 Supervised methods (20 pts)

1.3 Evaluation (20 pts)

2. Topic models (40 pts)

2.1 LDA (20 pts)

2.2 Adding covariates (20 pts)

Homework Answers

Request Answer!

Add Answer to:
r studio/ Python : In this assignment, we are working with manuscripts and their reviews from a famous CS conference, ICLR (International Conference on Learning Representations). This is a top conference in computer science on machine learning.(Python not

Post as a guest

Earn Coins

r studio/ Python : In this assignment, we are working with manuscripts and their reviews from a famous CS conference, ICLR (International Conference on Learning Representations). This is a top conference in computer science on machine learning.(Python not

Dataset: ICLR

1. Supervised methods (60 pts)

1.1 Dictionary method (20 pts)

1.2 Supervised methods (20 pts)

1.3 Evaluation (20 pts)

2. Topic models (40 pts)

2.1 LDA (20 pts)

2.2 Adding covariates (20 pts)

Homework Answers

Request Answer!

Add Answer to: r studio/ Python : In this assignment, we are working with manuscripts and their reviews from a famous CS conference, ICLR (International Conference on Learning Representations). This is a top conference in computer science on machine learning.(Python not

Post as a guest

Earn Coins

Add Answer to:
r studio/ Python : In this assignment, we are working with manuscripts and their reviews from a famous CS conference, ICLR (International Conference on Learning Representations). This is a top conference in computer science on machine learning.(Python not