NLP

Auto correct spelling and Auto complete using N-gram

Auto Correct is generally used to provide suggestion on mobile while typing and Auto Correct Spelling Preprocess data and compute word probabilities from corpus Generate words 1 and 2 edit distance away and filter based on vocabulary Suggest word with highest probabilities Auto complete is utilized to complete search query or while writing email.

Document search using approximate k-nearest neighbor

Preprocessed/encoded documents into vector for entire corpus Implemented local sensitive hashing(LSH) for multiple universe (different set of random planes) Developed document search using approximate k-nearest neighbor and LSH

End-to-End Natural Language Generation for Conversational Agents

Dataset : E2E NLG Challenge Each instance consist of a dialogue act-based meaning representation (MR) and up to 5 references in natural language MR: name[The Eagle], eatType[coffee shop], food[French], priceRange[moderate], customerRating[3/5], area[riverside], kidsFriendly[yes], near[Burger King] NL:

Named Entity Recognition

Named Entity Recognition is a method to locate and identify important concepts within documents. I trained vanilla and GRU variant of recurrent neural network to identify person,location and organization from sentences.

Parts-of-Speech Tagging (POS) using Hidden Markov Model (HMM )and (MEMM)

Parts of speech tagging the process of assigning a part-of-speech tag (Noun, Verb, Adjective…) to each word in an input text. I have trained two different model HMM which is generative model and MEMM which is discriminative.

Word Embedding based on Continuous bag of words model

Implement cbow model to get word embedding with 2 different architecture First approach Bengio et al. neural language model Second approach Efficient Estimation of Word Representations in Vector Space Perform PCA on word vectors and visualize relation between few words