Reports

Embedding-based Approaches to Hyperpartisan News Detection

In this report, we describe our models in which the objective is to determine whether a given news article could be considered as hyperpartisan. Hyperpartisan news takes an extremely polarized political standpoint with an intention of creating political divide among the public. We attempted several approaches, including n-grams, sentiment analysis, as well as sentence and document representations using pre-tained ELMo. Our best model was using pre-trained ELMo with Bidirectional LSTM achieved an accuracy of around 83% through a 10-fold cross-validation without much hyperparameter tuning.

Federated Learning

Increasing privacy concerns and unrestricted access to data lead to the development of a novel machine learning paradigm called Federated Learning (FL). FL borrows many of the ideas from distributed machine learning, however, the challenges associated with federated learning makes it an interesting engineering problem since the models are trained on edge devices. It was introduced in 2016 by Google, and since then active research is being carried out in different areas within FL such as federated optimization algorithms, model and update compression, differential privacy, robustness, and attacks, federated GANs and privacy preserved personalization. There are many open challenges in the development of such federated machine learning systems and this project will be focusing on the communication bottleneck and data Non IID-ness, and its effect on the performance of the models. These issues are characterized on a baseline model, model performance is evaluated, and discussions are made to overcome these issues.

Movie Rating Prediction

The Hollywood movie production business has a very instinct and contact driven low-tech decision-making process that generates a portfolio of movies that a production house decides to fund in any given year. The same type of decision-making process is employed by movie stars and their agents to decide which projects to pursue and which ones to pass. This leads to a high degree of variation in the success rate of projects (as measured by gross box office receipts and return in investment). Most production houses employ a portfolio driven approach and diversify their risk across a number of low, medium and high budget movies. I have attempted several data centric ML approaches to solve this interesting predictive problem.

Tamil and Hindi Question Answering

In the case of extractive question answering we assume that the answer is a subset of the context, so we can define it as span prediction, e.g. a range of characters or tokens. We can also formulate this as an abstractive task, where we just want to obtain the answer, which may be phrased differently from our context. Tamil and Hindi Question answering was performed in this project using pertained Transformers- Roberta Models and fine tuning for this particular task. *** Ongoing Project ***

Vehicle to Vehicle Communication using Light Fidelity

DSRC technology is the current method of V2V communication and has similarities to 802.11a WiFi which can be affected by multi-path and Doppler signal distortion, albeit to a lesser degree. And DSRC has range limitations, which would render it nearly useless in an emergency situation with no other vehicles within range. The proposed system in this project, uses the Light Fidelity technology to overcome the drawbacks of DSCR. The driver remains in control at all the times and the vehicle will not automatically brake or steer. The system consists of Emergency Electronic Brake Lights (EEBL), Blind Spot Warning (BSW), Lane Change Warning (LCW), Forward Collision Warning (FCW), Do not pass warning (DPSW), Intersection Movement Assist (IMA) and Right Turn Assist (RTA). With the high scalability of this system, every vehicle on the road communicates with one another regardless of the model or manufacturer of the vehicles using sophisticated secured systems.