I'm a Technology Entrepreneur and Data Scientist. I completed my PhD in Computer Science in 2012. Have academic background in Machine Learning, Data Mining, Algorithm Design, Social Networks Analysis, and Natural Language Processing. My research interest is in extracting interesting patterns/signals from big data which can be turned into valuable business/marketing actions. I was lucky to have Dr. Valerie King and Dr. Ali Shoja as my PhD supervisors.
I have 3 years industrial experience in building advanced Machine Learning Algorithms for predicting Click Through Rate for Online Display Ads. This was done by real-time analysis of a very large volume of ad performance data shown on websites.
I have more than 3 years industrial/research experience in building complicated NLP models such as Topic Modeling, Name Entity Recognition, Sentiment Analysis, & Spam Detection. Have more than 7 years experience in designing/building predictive models to forecast results of political elections, marketing campaigns, & flu outbreaks by mining/analyzing unstructured data from online social networks such as Twitter.
I have one year experience in Online Fraud Detection where I was responsible to design/implement Machine Learning algorithms to analyze Facebook, Twitter, Google+, LinkedIn & other social networks data in order to detect Fake/Fraud Digital Identities.
In Oct 2014, we published an article explaining the mathematics behind Latent Dirichlet Allocation (LDA) model using Collapsed Gibbs Sampling technique. Also we pushed a Java implementation of LDA to github repository. Read the article from here: Implementing LDA Model using Collapsed Gibbs Sampling
In Aug 2014, we wrote an article describing how social marketers can optimize their twitter campaigns by finding the optimal time for posting tweets. You can read the article from here: How to Optimize Twitter Marketing?
In Aug 2014, we launched a Data Science blog called DeepMinds where we publish articles related to Machine Learning, Data Mining, and Online/Social Advertising.
In July 2014, we open sourced our Java ML/NLP library. We used this library for mining tweets and building predictive models. The predictive models are used for analyzing elections and mining public opinions in social media. You can pull the code from github: Twitter Mining.
I finally documented my notes from 2013 Knowledge Discovery and Data Mining Conference in Chicago. Most of my notes are about talks in advertising and outlier detection spaces. I also posted some notes about some interesting talks that I attended related to optimization and its importance in machine learning, Google Trends and the possibility of predicting present, how to start a data company and so on. Read the article from here: My Takes from KDD 2013!
In Dec 2013, I started benchmarking different languages from performance point of view. You can find the details here: Benchmarks!
In July 2013, I attended hackwithus event @Victoria with three other hackers. We built a simple AI for Snake game!
In July 2013 @Seeker we submitted a paper to EMNLP 2013: Conference on Empirical Methods in Natural Language Processing. The focus of our paper was on analyzing the performance of generative models (e.g. HMM) and discriminative models (e.g. CRFs) for extracting biomedical entites (i.e. disease and treatments) in the presence of rarity.
In March 2013, I attended Morbify hack event. We built a fun game with a purpose (GWAP) to hunt images! Read more about our game from Image Hunt.
In January 2013, I attended the Firefox OS App Day in Vancouver. We built a web application which computes the likelihood that a person catches the flu by collecting/analyzing data from Twitter. You can read more about this project from Predicting Flu!
In January 2013, I joined Seeker Solutions: a company whose focus is on using natural language processing and machine learning technologies for health informatics domain. In Seeker I'm invloved in building software technologies using ML algorithms to solve NLP problems.
Since Fall 2012 I have been doing an extensive research in the area of computational advertising for Red Brick Media company. My focus was on designing/developing algorithms to find & show ads with the high conversion rates to web users. We formulated ad-selection problem as Multi-Armed Bandit problem which is a classical paradigm in Machine Learning. We applied machine learning, data mining, probability and statistics in order to analyze big advertisement data and devise efficient ad selection strategies.
In December 2012 I attended the AngelHack Fall 2012 event in Seattle. The conference was a program for mentorship and fundraising for entrepreneurs. We designed/implemented a mobile crowdsensing application (for iPhone/Android) to detect/track/display real time events. The mobile application had four major components: sensing component (reading from GPS, audio, accelerometer, bluetooth, and etc.), machine learning component, sharing real-time data componet, and data visualization component.
I attended a 24-hour AbeBooks Hackathon event on Friday Sep 28, 2012. There, we built a web application (using Amazon Web Service) which uses machine learning & data mining for analyzing big data.
I defended my PhD in August 2012. My thesis topic was on "Contact Prediction, Routing and Fast Information Spreading in Social Networks". You can download a pdf of my thesis from Jahanbakhsh_Kazem_PhD. You can also download my defence slides from phd slides.
During summer 2011, I did an internship for Proven.com in San Francisco. In Proven.com, our goal was to help trade people find jobs. We implemented ideas from social networking area to efficiently connect employers to workers using CakePhP, MySql, JQuery, and Ajax technologies. I also built a Facebook application called HireProven for Proven website in order to integrate Facebook social features with Proven website.
Real-Time Bus Tracking System: This was a project that we designed/implemented/demoed in AngelHack Hackathon in Seattle. It was a mobile crowdsensing system (iPhone/Android) for tracking bus locations in real-time by using machine learning algorithms. Click RTBTS to find more about this project.
Predicting US 2012 Presidency Election using Twitter: This is an ongoing research project for analyzing/mining 2012 US election conversations in Twitter. The main goal is to test the possibility of predicting election results using political tweets. Read more about this project from Predicting US 2012 Election Results.
Geo Crawler: This is a project for crawling and indexing places that are hard to be found by using Google map service. Click Geo Crawler to read more about this project.
Twheat Map: A web application for showing a real-time map of geo-tagged tweets with their labels (positive/negative) computed by using a sentiment analysis algorithm. This application was implemented in Abebooks Hackathon 2012 event in Victoria. Click here to find more about this application.
Mobile Social Trivia Game: a Twilio SMS powered trivia application developed in HackVan 2012 event in Vancouver. Enter a code and join a multi-player trivia SMS game. Click Trivia to find more about this project.
K-means Clustering: a Python implementation of k-means algorithm. Click k-means to find more about the algorithm and download the code.
Drinking-Fountain Finder App.: a web application which shows the closest drinking fountain to your current location. This application was developed in Open Data Hackaton event in Vancouver. Click Fountains to find more about this application.
Social Community Detection: an implementation of Girvan-Newman community detection algorithm for weighted graphs in Python. You can find more about this code and download its source code from Cmty link.
Flickr Crawler & Hometown Predictor: a two-layer crawler for collecting frienship graph of people and attributes of their uploaded photos from Flickr website. The main goal of this project was to predict Flickr users' hometowns by exploiting the geotag information of their uploaded photos. You can download the source codes and find more about the crawler from Flickr link.
Reliable Datagram Protocol: a multi-threaded reliable transport layer implemented in C. This is an application layer which runs on top of UDP layer in order to make UDP reliable as TCP. You can read more about this project and download its source code from RDP link.
Language Detection: a Java applet for recognizing language of an input sentence by using Naive Bayes classifier. Enter a sentence and find out its natural language. You can read more about this project and download its source code from Language Recognition link.
Soma-Cube Puzzle Solver: a Java code for solving the 7-pieces Soma Cube puzzle by using a recursive backtracking search. You can read more about the puzzle and download the puzzle solver's source code from Soma Cube link.
Autonomous Flying Blimp: an embedded system developed for controlling an autonomous blimp. We developed both the hardware and software to control our blimp. This project was done by me and two other colleagues in 2008 for a course called "Software for Embedded & Mechatronics Systems". You can find the design and source codes for our flying blimp at Super Blimp! You can also click Flying Blimp to watch one of our demos.
Software Research Projects
Information Spreading/Advertising in Online Social Networks: an efficient and scalable program implemented in C for analyzing running times of rumor spreading algorithms in online social networks. Click Spread to find more about this project.
Social Networks Connectivity: a C code for analyzing the detail connectivity of online social networks such as Facebook. Click Connectivity to find more about this project.
Social-Sim Simulator: a comprehensive simulator written in C++ for studying the underlying properties of mobile social networks as well as evaluation of our proposed Social-Greedy routing algorithm. You can find more technical details about this project and download its source code from Social-Sim link.
Human Contact Predictor: a Python code for inferring people movements and contact patterns in real scenarios such as conference or campus environments by exploiting statistical properties of contact graphs. Visit Prediction for more information.
Diffusion of Virus in Social Networks: an efficient C code for simulation of how a virus/disease diffuses in social networks. You can find more about this code at Diffusion.
Distributed Computing (Parallel SIQS): a parallel and optimized software program written in C using Message Passing Interface library for cracking large RSA keys. This project was part of my master thesis. In this project, I also built & configured a "Linux Cluster" of 17 nodes to crack RSA keys. You can find more about my thesis and its code at PSIQS. You can also download my master thesis presentation from master slides.