December 2014

December 2014 Meeting

When
December 17, 2014 | 12:00pm - 2:00pm

TOPIC
Why the graph representation is key to effectively analyze and act on your data

ABSTRACT
We take the stance that to get the most value from your Big data, one must have the ability to work with your data in high dimensions and act on the data in real time. While some domains are naturally high dimensional (e.g. text, images), in others, high dimensionality comes from extracting relationships between entities in your data. For instance, an online product recommendation system could benefit from extracting a joint relationship between the product, the user, the user's town, all other users, the user's friends, the time of the year, and so on. Similarly, a social network game recommendation system could benefit from extracting a joint relationship between the user, all other users, the user's friends, the friend's of the user's friends, the game being played by the user, the user's town, and so on. An anti money laundering system could benefit from extracting common money transfer paths taken by fraudsters.

Historically, graph based data representations have been thought to be useful for a small set of domains that naturally produce graphs (e.g. social networks). More recently, however, for reasons set forth above, graph database technologies have started gaining traction and are seen as the next generation in NoSQL technology. Unlike relational technologies, graph database systems are especially suited to extracting higher dimensional features from your data that make true big data analysis and action possible. Beyond the extraction of high dimensional features, graphs also make a variety of high dimensional supervised and unsupervised analysis possible (e.g. pattern mining, clustering, etc.) that allow you to gather powerful insights from your data. We will demonstrate this power from a list of challenging real-world applications from the e-commerce, advertising, and financial industries.

In addition to analyzing your data, acting on it in a timely fashion in the presence of updates is key for many real world applications. For instance, when a user purchases a set of products at a certain time of the year, identifying the next set of products to recommend based on the recent purchase requires the ability to act on the update in real time. We will show that graph databases are especially suited to this type of processing, allowing action that is both complex and timely.

Speaker

Amol Ghoting

Amol Ghoting

GraphSQL Inc.

Amol is currently a Senior Scientist at GraphSQL Inc. His interests lie at the intersection of data mining, machine learning, parallel and distributed computing, and high performance computing. Prior to joining GraphSQL, Amol was a Director of Data Science at American Express where he led a team of scientists and engineers on building the enterprise recommendation platform and machine learning driven data products that deliver never-before-seen insights. His team made three unique data products possible in less than a year. Before joining American Express, Amol was a Research Staff Member at IBM Research, where he focused on designing algorithms and infrastructures for analyzing large data sets on emerging data intensive systems like Apache Hadoop's Map-Reduce. He was a founding member of the System-ML and NIMBLE projects at IBM Research that developed declarative and imperative approaches for implementing parallel analytics algorithms, respectively. These projects contributed significantly to IBM's BigInsights, Cognos, and SPSS product lines. He also designed and implemented several scalable analytics algorithms using these tools for problems in the areas of fault detection, social network analysis, genomics, and topic modeling.

Amol has published over 35 research papers and served on the organizing and program committees of top conferences in his field (e.g. ICDE, ICDM, ICML, KDD, SDM). His research has received numerous distinctions including a Best Paper Award at the International Conference on Very Large Databases (2005), an IBM PhD Fellowship (2006), a Best of SIGMOD selection (2009), and an IBM Pat Goldberg Best Paper Award (2010).