CSS 590C: NLP – Term Projects

Go to main course website.

Course Project

Overview

The project is comprised of four assignments:

1. Project plan

2. Project midterm point

3. Final project presentation

4. Final project write-up

For project ideas and resources, see below.

See also the possible alternative “Make your own Challenge” below.

Project Plan

Improve on a baseline NLP system; any of the papers you read can be a base for you to improve on.

Create or improve a resource for an NLP task – in a clever way (using linguistic knowledge, machine learning, etc.)

See more ideas below.

Submit project plan, including your evaluation plan for the success of your experiments.

- What issue and/or NLP task are you addressing? What is the current pain point?

- Why is your topic interesting / important? Who/what will it help?

- What improvement are you planning?

- How? (Selected language / resources / representation; describe the method you will be applying).

May be done in pairs. Teams of three are possible with prior explicit instructor permission.

Specify what each team member will do, including tasks that are planned to do together.

All team members receive the same grade for this assignment.
~1 page long, single-spaced (or two pages double-spaced). Normal font size, spacing, paper, margins…

In class: 1-minute 1-slide project presentation. To streamline the process and efficiently use time in class, I suggest each team will record narration on top of their slide (one minute long, no more). Perhaps think about it as your “elevator pitch” to your potential future employer… :-) What’s cool about what you are proposing? Why should your elevator co-rider be interested? What/who will this help? I will string all single slides + narrations together for the entire class.

You probably want to incorporate some sort of linguistic knowledge or theory into some part of your project. You can also look into improvements that are more technical in nature (improved speed, memory utilization, parallelization, etc.) as long as you can argue how these improvements also have positive impact from a linguistic perspective.

Submit your write up and single slide via Canvas. See the dropbox for the current due date.

Project Midterm Report

Your midterm report should extend your project plan with literature review, initial results & (optional but recommended) initial analysis.

It should be no longer than 3 pages (single-spaced) + unlimited references, ACL short paper format. Please use the ACL style files.

You may reuse some of your project plan content, or change direction (which could happen once you go deeper into your topic).

Please submit three files:

Your project midterm report.
Your code and model setting, e.g., the file with tuned feature weights, learned vectors (if any), network weights. All in a single zip file / tarball.
Your input (dev/test) data, output data, evaluation script output (unless you only conducted a human evaluation -- rare). All in another single zip / tarball.
Instructions and script how to run it – and a pointer (URL) where your project lives. I may or may not use it.

No need to submit large files such as the training set, or a big compiled executable.

Paper / report structure and format:

Abstract: Briefly state the task approached, the main innovation and main result. You can skip this section in the midterm version.
Introduction / overview: What is this paper about? Current state in the field, problem you attack, general solution (your novel contribution). May be based on your (revised) project plan.
Background and literature review: Describe with more detail the task you approached, and the previous state-of-the-art. Survey the related work that has been done so far, giving special focus to the relevance to your project, and how that previous work differs from yours.
Experiment: data, method, results. What data / resources you used; what was your methodology (added the following new features… / applied this novel annotation… / incorporated X in the following novel way … / handled special or tough cases like this… etc.) Initial results of your experiment(s). May (and probably should) be revised or extended later, in your final paper and presentation.
Discussion and error analysis: What succeeded, what went wrong, unexpected, etc. WHY? What does the error analysis suggest about the strengths and weaknesses of your approach? Also recommended: Interesting patterns in the data you generated, and perhaps also new insights about your training/tuning/test sets. You may keep this section short or partial for the midterm report – and revise / extend later, in your final paper and presentation.
Conclusion and future work: Summarize the paper (task approached, main innovation, results) and point the way forward for future work. You can skip this section in the midterm version.
Acknowledgments: For term projects done in teams, the acknowledgments section should indicate which team member did what for the project. You may also thank anyone who gave you significant help (code, data, and/or useful discussions or advice).
Bibliography

Length: 3 pages + unlimited references. Use the ACL 2015 format and style sheets :

LaTeX	MS Word
acl2015.tex	acl2015.dot
acl2015.sty	acl2015.pdf
acl2015.pdf
acl.bst

Examples: You can get inspiration from real examples of short papers, e.g., here or here or here (the latter two have a mix of short and long papers).

Note: Your final project submission (at the end of the quarter) may be largely based on this midterm report, which you can then revise and extend according to your final findings and understanding.

May be done in same team as the project plan. You may change teams up to two days after the project plan deadline (no last minute changes). If I haven’t heard of a change by that time, I will assume the same team members continue. All team members receive the same grade for this assignment.

Submit your write up via Canvas. See the dropbox for the current due date.

Final Project and Presentation

Your final project write up should include your final results (perhaps improved from your midterm report), and updated, more elaborate analysis: What went as planned? What didn’t? Why? It is always nice to improve on the state-of-the-art; but it is OK if you didn’t, as long as you can provide a deep analysis and explanation of what failed your expectations, how, and why. If you had a chance to work further on this topic, what would be your future work? You might want to add a short
Conclusions section.

Your write up should be in a quality that’s likely to get it accepted to a good NLP conference as a short paper. It should be 4-5 pages single spaced + unlimited references, similar to a short ACL paper. Please use the ACL style files.

Please submit these files:

Your final project report -- written as a short ACL paper. (See format in project midterm item)
Your code and model setting, e.g., the file with tuned feature weights. All in a single zip file / tarball.
Your input (test) data, output data, evaluation script output (unless you only conducted a human evaluation -- rare). All in another single zip file / tarball.
Instructions and script how to run it – and a pointer (URL) where your project lives. I may or may not use it.
Your presentation (slides) -- due on the day before last class (see separate item for that).

No need to submit large files such as the training set or a large compiled executable.

Presentation: Congratulations! Your short paper was accepted! :-) In addition to the write up, please prepare a 10 minute presentation, to be presented in the conference. You will present it in our last class (in person or online). You may choose to pre-record your presentation if you have a speech impediment or a relevant medical condition (talk to me and let me know in advance if so).

May be done in same team as the midterm. You may change teams up to two days after the project midterm report deadline (no last minute changes). If I haven’t heard of a change by that time, I will assume the same team members continue. All team members receive the same grade for this assignment.

Submit your write up and slides via Canvas. Note that the write-up and slides may have separate deadlines. See the dropbox for the current due date for each.

Project Ideas

If you are still looking for ideas how to improve your baseline NLP system or resource in some linguistically interesting way, here are some suggestions. Some are more general and some are more specific. You can also get inspired by any of the assignments and readings.

For thematic coherence, the top of this list is about the startup company that is developing a wine sales chatbot, which I mentioned in class. Let me repeat and make it absolutely clear that there is no expectation you choose a wine-related project or any project from the list below. Your grade will not be affected by such a choice. You may choose or suggest any project you like, as long as it is relevant to the class (NLP), and is reasonable in scope.

- A startup company is developing a wine sales chatbot. Come up with a clever way to do the NLU when they have no (or very few) labels for intents and slots. It could be a clever way to generate a training (and test) set for them. Or it could be an algorithm, e.g., some neural net, which can learn and achieve competitive results with no or little labeled data (hard to do!). Or a better co-reference / pronoun resolution in a conversation.

- Syntactic parsers are known to perform worse on genres and domain that are different from those they were trained on. Suggest a novel way to do parsing or shallow parsing (chunking) adaptation to user queries and dialog turns, e.g., for the wine sales chatbot mentioned above.

- In order to answer users’ questions about wines, the sales chatbot may need a knowledge base about wines and their attributes. Come up with a novel way to do the required attribute extraction, perhaps with a neural network, perhaps discovering non-traditional yet realistic attributes (e.g., the color or them on the bottle label, type of cork, container material (glass or carton or…), etc.

- Novel user sentiment analysis method (is the user satisfied / not; thanking / sarcastic / angry / …). Perhaps a user of a sales chatbot.

- New way to do entity linking for products in informal text. E.g., which wine is trending now in Twitter?

- New way to use word/phrase vectors for definitions: improve work that can unpack a definition of a word from its vector representation, in some interesting way, e.g., training your neural net to receive a second input specifying the emotional level of the desired definition (“fantastic!” vs. “tastes good”)

- Informal text (e.g., when chatting with a bot or another human) may contain several sentences or intents, not clearly separated. E.g., “thanks no I want red wine” should probably be split to “thanks”, “no” and “I want red wine”. Come up with a clever way of segmenting the text to sentences/intents.

- Word2vec2vec: Representing words (and phrases?) as interpretable vectors, e.g., where each dimension i is the vector similarity between the represented word and word i. Compare with w2v and distributional vectors such as Marton et al (2009). See also Levy, Goldberg and Dagan (2015).

- Paraphrasing over w2v: Paraphrase given words (or phrases) using similar method to that of Marton (TIST 2010), but representing the words in the corpus as short vectors (or hierarchical clustering). I can provide python code, but you will have to update the code to python 3, including the pyrex/cython code (talk to me if interested in details). Evaluate on machine translation or paraphrasing test sets.

- Paraphrasing using the PaxLex or BabelNet or any other linguistically interesting resources, in some clever way, or a novel way exploiting specifics of the potentially under-utilized resource. For example, paraphrase by pivoting through entries in these multilingual resources. (What pivots are good? How far should we hop?)

- Generative Adversarial Networks for Text Generation. Suggested by Kevin Clark (PhD student) - kevclark@cs.stanford.edu in this Stanford course. Generative Adversarial Networks (GANs) have been very popular in the ML community recently. They have shown an amazing ability to generate realistic-looking images for example. But so far there hasn’t been much research on using GANs to generate text, which is more challenging than generating images because text is discrete (we can slightly adjust an image by changing the RGB values for some pixels, but there is not a clear way to slightly adjust a word). But there might be ways of getting around this, for example by having our model generate word vectors instead of words. This project would investigate ways of training a GAN to generate text and apply the GAN to a task like summarization or response generation for dialogue.

- Neural Mention Detection for Co-reference Resolution. Suggested by Kevin Clark (PhD student) - kevclark@cs.stanford.edu in this Stanford course. The goal of coreference resolution is to identify which mentions in a document refer to the same entity (for example “Kevin,” “the PhD student in computer science,” and “him” could be coreferent). Most coreference systems solve this task in two steps: first identifying mentions and second deciding which mentions are coreferent with each other. Although deep learning has been successfully applied to the second step, state-of-the-art coreference systems still use simple rule-based approaches for the first step. And this first step is quite important for system accuracy - a system running on gold-standard mentions gets about 10 F1 points higher than a system using the rule-based mention detection (even a 1 F1 point improvement is considered a publishable advancement to coreference!). In this project you would train a neural network model to identify mentions in a document and evaluate its effectiveness in comparison to existing rule-based mention detection algorithms.

- Query breaker for task completion bot (NLU): given a set of intents (and potentially also a set of slots), e.g., for the wine domain (for the above-mentioned startup), and given a user query (user’s input text in their turn in a conversation), split the query to one of more sub-spans, each corresponding to a single intent. The decision where to break might be different than in other domains or genres. For example, given an intent-set including an intent for declining and an intent for specifying user’s preference (providing values for attributes/slots), and given the query “no, I want red wine” (with or without the comma), split the query to two: “no” and “I want red wine”, even though that for other genres, say, restaurant reviews, this query would have a single span (no split).

Ideas below involve machine translation, so might not be relevant to many of you:

- Using syntactic information (parses) to help the decoder pick better translation units (phrases). It could be (but doesn’t have to be) a variant on Marton and Resnik (2008).

- Learning models of syntax-based distortion / reordering / pre-ordering in MT. It could be (but doesn’t have to be) a variant on Carpuat, Marton and Habash (2010a or b); or Li, Marton, Daume III and Resnik (2014), or many other papers.

- Modify your baseline decoder to be more syntax-aware.

- Augmenting a translation model with new entries generated by paraphrases of OOV phrases. It could be (but doesn’t have to be) a variant on Callison-Burch, Koehn and Osborne (2006), or Marton, Callison-Burch and Resnik (2009), etc.

- Other smart ways to handle OOV (out of vocabulary) issues.

- Using rich monolingual resources (e.g., in English) to improve translation into or from English.

- Improving word alignment, e.g., implement a morphologically-aware alignment model.

- New ways to do Named Entity Transliteration. (e.g., so many ways to spell the former Libya ruler’s name: Qadafi: Kadhafy, Kaddafi, Gadaffy, etc.)

- Use WordNet to match synonyms in an MT evaluation method (BLEU, METEOR, TERp, …), in a different way than was used before. Or enhance the evaluation with vector representations (word2vec, GloVe, …) to better assess semantic similarity. (See what evaluation measures other people have implemented.)

- Use a dependency parser to assess syntactic well-formedness in an MT evaluation method in a novel way.

- Using MT methods to automatically annotate data (corpora) for morphology, syntax, etc.

- Improve sentence alignment

- Monolingual text-to-text generation, e.g., for summarization, or different style (e.g., more/less formal)

- Look up papers in the ACL anthology/MT Archive/arXiv.org about topics and languages that interest you.

- Have a look at the Moses "get involved" section.

Resources and tools

· WordNet

· BabelNet

· Wikipedia

· Twitter

· Various treebanks

· Word2vev, GloVe, …

Gigaword (monolingual), available through LDC.
Open Source tools for NLP on http://stackoverflow.com/questions/22904025/java-or-python-for-natural-language-processing and https://stanbol.apache.org/docs/trunk/components/enhancer/nlp/ . Perhaps a more updated list is here.
UIUC Cognitive Computation Group’s NLP library -- https://github.com/IllinoisCogComp/illinois-cogcomp-nlp
Masaryk University Corpus Tools http://corpus.tools/
Awesome NLP on GitHub: https://github.com/keon/awesome-nlp or generally search GitHub for NLP projects: https://github.com/search?q=topic%3Anlp&type=Repositories
ACL software registry site
WEKA http://www.cs.waikato.ac.nz/ml/weka/ (general ML toolkit)

· …

Parallel Corpora

LDC Linguistic Data Consortium – some LDC titles are available on our servers. Ask David (our sys admin) and CC me, if you can’t find a specific title.
Canadian Hansards
Europarl
OPUS - Contains many corpora, many many language pairs.
WMT shared tasks’ data in recent years (e.g., WMT 2016 data)
Acquis Communitaire
ELRA - various linguistic resources, some might be relevant to you.
UN transcripts
….