UWB CSS590 C: Special Topic in Computing:

Introduction to NLP

(Spring 2017)

 

Yuval Marton

When:  Monday/Wednesday from 8:00pm to 10:00pm, 3/27/2017 to 6/9/2017 (last class 5/31/2017)

Where: UW1 031
Canvas Discussions: discussion board
Canvas Assignments: homework submission dropbox
 

Instructor: Yuval Marton, PhD

Homepage: here and there

Contact: gmail: yuvalmarton or ymarton at_uw

Office hours: catch me before or after class. Or email me to set a time to talk. Skype: yuvalmarton

 

 

Course description

This class will cover the basics of Natural Language Processing (NLP) including: linguistic diversity, part-of-speech tagging and syntactic parsing, semantic representation in vector space (e.g., word2vec), semantic similarity, paraphrasing, and machine translation. We will also cover some basics of Natural Language Understanding (NLU) and dialog systems (bots). We will explore combining linguistic knowledge with data-driven statistical methods, such as: using syntactic parses in NLU; augmenting NLU models with synonyms, paraphrases, and/or other semantic information resources; using different representations to capture linguistic generalizations (e.g., word2vec, GloVe). After this class, students will understand the basics of NLP/NLU; demonstrate hands-on capability in NLP/NLU coding, and articulate approaches to improving a NLP/NLU task or application, such as parsing, paraphrasing, or a dialog system (bot); be able to understand, present and critique NLP/NLU research papers (useful for both academic and industry roles); and potentially work on original improvements of or solutions to NLP/NLU problems, which can lead to academic publications.

Prereqs:

Recommended you know, or helpful if you refresh your memory or take a quick tutorial soon:

-          2+ years programming experience. You will need to code. Python knowledge is useful but not mandatory. You can probably pick up what you need as you go.

-          Basic statistical concepts (e.g., conditional probability, distribution, statistical significance tests)

-          Machine Learning course and/or background (but you don’t need to be an expert J )

-          Basic knowledge of data structures, algorithms, systems-related issues (linux, working with a computing cluster), read/process/write datasets (from database or text)

-          Love of language! (Great if you also know some Linguistics)

Contact me if you are not sure you have sufficient background.

 

Term project (which may be done in pairs or teams of 3, depending on number of registered students) will involve improving an existing baseline solution by applying a linguistic resource, representation, or method. While positive results (significant improvements) are always exciting, deep analysis of negative results (what went wrong and why) is also acceptable, and in fact, recommended. You may choose a topic from the course’s pre-defined list, or come up with your own (real world issue - subject to instructor approval). More details during class and on Canvas.

Project ideas: here, and may be periodically updated. But don’t wait for me, start reading papers, further chapters in SLP (the textbook), consult with friends, etc.

 

Assignment and Project submission: Using Catalyst.

Late submission policy: up to 24 hours: 10% score penalty. Up to 48 hours: 20% score penalty. Later: 0. Please contact me (before the deadline if you can!) if you have an emergency (e.g., medical) preventing you from submitting on time, to avoid penalty.                                                                                                                              

Guidelines for paper presentations and discussions

Quick link to class schedule and assignments. (Note: deadlines on Canvas take precedence!)

Language in Ten Minutes (L10)

 

Slides and recordings

Slides and further materials are here.

Recordings: I will post a message on the discussion board.

 

Textbook

Recommended: Speech and Language Processing, 2nd Edition, by Jurafsky and Martin. Free draft of 3rd Edition here.

Also recommended:

·  Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing [link]

·  Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press. [link]

 

Office Hours, Questions, Problems: If you have any problem with this course, please talk to me as soon as possible.  I would like to help in any way I can, but I have to know there is a problem. If you fall behind in this class, it will be difficult to catch up. I am usually available before or after class, but you can always contact me via email or Skype.

Disability, Access and Accommodations: If you have already established accommodations with Disability Resources for Students (DRS), please communicate your approved accommodations to me at your earliest convenience so we can discuss your needs in this course.

If you have not yet established services through DRS, but have a temporary health condition or permanent disability that requires accommodations (conditions include but not limited to; mental health, attention-related, learning, vision, hearing, physical or health impacts), you are welcome to contact DRS at 425-352-5307 or drs@uwb.edu. DRS offers resources and coordinates reasonable accommodations for students with disabilities and/or temporary health conditions. Reasonable accommodations are established through an interactive process between you, your instructor(s), and DRS. It is the policy and practice of the University of Washington to create inclusive and accessible learning environments consistent with federal and state law.