NLP

Embedding-based Approaches to Hyperpartisan News Detection

In this report, we describe our models in which the objective is to determine whether a given news article could be considered as hyperpartisan. Hyperpartisan news takes an extremely polarized political standpoint with an intention of creating political divide among the public. We attempted several approaches, including n-grams, sentiment analysis, as well as sentence and document representations using pre-tained ELMo. Our best model was using pre-trained ELMo with Bidirectional LSTM achieved an accuracy of around 83% through a 10-fold cross-validation without much hyperparameter tuning.

Movie Rating Prediction

The Hollywood movie production business has a very instinct and contact driven low-tech decision-making process that generates a portfolio of movies that a production house decides to fund in any given year. The same type of decision-making process is employed by movie stars and their agents to decide which projects to pursue and which ones to pass. This leads to a high degree of variation in the success rate of projects (as measured by gross box office receipts and return in investment). Most production houses employ a portfolio driven approach and diversify their risk across a number of low, medium and high budget movies. I have attempted several data centric ML approaches to solve this interesting predictive problem.

Tamil and Hindi Question Answering

In the case of extractive question answering we assume that the answer is a subset of the context, so we can define it as span prediction, e.g. a range of characters or tokens. We can also formulate this as an abstractive task, where we just want to obtain the answer, which may be phrased differently from our context. Tamil and Hindi Question answering was performed in this project using pertained Transformers- Roberta Models and fine tuning for this particular task. *** Ongoing Project ***