Confirmed Sessions at NLP Day Texas 2017

We are just now beginning to announce the confirmed sessions. Check this page regularly for updates.

Chatbots from first principles

Jonathan Mugan - Deep Grammar

There are lots of frameworks for building chatbots, but those abstractions can obscure understanding and hinder application development. In this talk, we will cover building chatbots from the ground up in Python. This can be done with either classic NLP or deep learning. We will cover both approaches, but this talk will focus on how one can build a chatbot using spaCy, pattern matching, and context-free grammars.

Interpretability & Variational Methods

Christopher Moody - StitchFix

Interpretability divides machine learning folks in academia from those in industry. Academia tends to focus on the science of algorithms, but in industry we're focused on algorithms to do science on our data. So this talk is about how to 1) understand what your model is doing (via t-SNE, k-SVD, and lda2vec) and 2) build models that can tell you what they don't know via variational methods. In the last half, we'll review traditional models like word2vec, factorization machines, and t-SNE and with a little bit of code upgrade them to variational versions.

Getting NLP started with AWS's Deep Learning AMI

Rob Munro - Amazon

Abstract forthcoming.

Toward portable NLP solutions for healthcare – the journey of CLAMP

Hua Xu - University of Texas Health Science Center at Houston

Much of detailed patients information is embedded in narrative documents in Electronic Health Records (EHRs) systems, making it difficult to use in clinical research and practice. This presentation will describe our recent development of CLAMP (Clinical language annotation, modeling, and processing), a comprehensive clinical natural language processing (NLP) system, which provides not only high-performance NLP components, but also a user-friendly interface for building customized NLP pipelines. In addition, advanced algorithms behind CLAMP and diverse use cases of CLAMP will also be discussed.

An Introduction to Topic Modeling

Jacob Su Wang - Ojo Labs / University of Texas

Topic models (TM) are a class of generative models which aims at capturing the latent distributions that underpin given observational data (e.g. word distribution in a collection of documents). In the context of Natural Language Process (NLP), TMs are particularly useful for discovering the statistical regularities hidden in textual data in supervised/semi-supervised/unsupervised settings.
In this talk I address the following questions:
- What is a TM?
- What are its potential applications (in particular in NLP)?
- How to formulate and learn a TM (to suit one's particular purposes)?
- What are the common inference algorithms for TM, and which one should you use?
I focus on a deep understanding of TMs, for the objective that one is able to make informed decisions in constructing TMs to dovetail her specific projects.

Textual Analysis and High Finance: What Can We Learn From the Writing of Financial Contracts

Malcolm Wardlaw - UT Dallas

Text analysis and natural language processing have recently seen an incredible amount of growth in application to the area of financial economics, both in industry and academia. This talk will focus on how these tools are being used in academic research, with a focus on new research into the area of financial contracting. The talk will first provide a brief overview of the research at large, and then focus on its specific application to loan contracting, partially described in Dr. Wardlaw's recent paper ( with Bernhard Ganglmair. Particular attention will be paid to the implementation of topic models in this setting, along with a high level technical overview of the challenges faced when trying to analyze contract documents in a systematic way.

Multichannel Event Detection in Twitter

Dr. Joseph A. Gartner III - Sotera Defense Solutions

In their survey of event detection techniques in twitter , event detection techniques are divided into two broad 1 categories of document-pivot and feature pivot techniques, with the distinction that these techniques differentiate by focusing on document or temporal features in social streams. I propose to begin with a brief discussion of what makes working with tweet data difficult compared to traditional text, and the process of cleaning data. I will then briefly touch upon some of the background work in event detection, such as burst detection of words, and document clustering. From there, I will focus on my work, a novel approach to multi-analytic document grouping identifying both burst feature representation in hashtags and general text alignment using word2vec document projections. I will then discuss how these local topic groupings can be combine to form larger events. The talk will conclude with a presentation some results that are available as open sourced software.
Intended Audience: The broad task of event detection has application in a wide range of fields, from marketing to law enforcement. This talk aims to contrast the strengths and weaknesses of a few known techniques, as well as highlight open sourced tools available for the purpose of event detection.

Using Text Analytics to improve Net Promoter Scores: Is a number worth a thousand words?

Anne-Marie Currie - The Advisory Board

In this talk, I highlight some best practices Net Promoter Systems should follow and introduce you to the value of going beyond the score to focus on the voices and words of the customers revealed within the Net Promoter Surveys. I will walk through examples of how unstructured text can lead to a more robust Net Promoter System that will enhance product quality, customer loyalty, and quality service. Identifying techniques to effectively target and reach out to consumers in sincere and meaningful ways at scale is critical to growing a business. Text analytics can provide a way to gather meaning, provide direction, predict a trend, and prescribe a target action beyond what a single number can provide.

Sockpuppets, Secessionists, and Breitbart

Jonathon Morgan - New Knowledge

This presentation is based on Jonathon's recent paper, Sockpuppets, Secessionists, and Breitbart, published on Medium.
From the paper: New evidence points to a highly orchestrated, large-scale influence campaign that infiltrated Twitter, Facebook, and the comments section of Breitbart during the run up to the 2016 election. Tens of thousands of bots and hundreds of human-operated, fake accounts acted in concert to push a pro-Trump, nativist agenda across all three platforms in the spring of 2016. Many of these accounts have since been refocused to support US secessionist movements and far-right candidates in upcoming European election, all of which have strong ties to Moscow and suggest a coordinated Russian campaign. Jonathon will walk through the methodology to uncover and detail this campaign.

Bootstrapping Knowledge-bases from Text

Garrett Eastham - Edgecase / Data Exhaust
Garrett will give an overview of types of knowledge-bases (ontologies, graph structure, etc.). He will then go over examples of how knowledge-bases can / are being applied to improve common intelligent systems (specifically search and personalization engines). Next, he will discuss state-of-the-art approaches to information extraction and the challenges / opportunities in leveraging these when attempting to train a knowledge base from large text corpora. Finally, Garrett will walk through example music-knowledge base extracted from music reviews and user-submitted tags (focus on methods used and challenges overcome)
Intended Audience: Applied machine learning engineers / scientists - ideally those who are working on improving the results of existing production search and/or recommendation pipelines
Technical Skills / Concepts: Familiarity with general NLP practices / techniques (specifically modern information extraction approaches), Apache Spark, Understanding of core search / personalization principles

Words as Vectors - Introduction to Word Embedding

Erik Skiles - SparkCognition

Word Embeddings have made a significant impact in NLP over the last few years. The goal of this workshop is to provide attendees with the understanding and the tools needed to create word embeddings and use them in various downstream NLP tasks such as classification. We will begin by examining the core concept of representing words as vectors. We will then do a deeper dive into what information word embeddings are learning. This will be followed by a survey of methods for creating word embeddings and some tips on selecting an algorithm or pre-built word embeddings. (note that I don’t want to get bogged down into the details of any one implementation for learning word embeddings) Finally we will explore a few NLP tasks that can benefit from word embeddings.

Artificial Intelligence as a Tool (What makes or breaks natural language processing products)

Tad Turpen - NarrativeDX

Since the advent of artificial intelligence, inventors and scientists alike have been trying to monetize their efforts. Some have succeeded and some have failed. In this talk I will outline what makes or breaks natural language processing products as an example of artificial intelligence, namely: whether or not the products are treated as a tool to get work done. In this talk I will go through several historical examples of natural language processing companies that fell into the trap of becoming one-off consultancies, and how to engineer a single product that generates recurring revenue. It is also important to consider artificial intelligence as a tool to get work done, consider what would have happened if the car remained a toy for the rich. I will conclude that artificial intelligence can broadly fill the role of providing consulting services, but there is also a place for a dedicated product that generates recurring revenue if engineered correctly.