Company: University of Waterloo
Problem: To increase graduation and retention rates, the university would like to identify students who have a high propensity to abandon their studies. Using student demographic and academic data, a retention model was built to identify these students and provide them with personalized support.
Methods & Tools: logistic regression, R, Tableau
Company: Slyce
Problem: Slyce's visual search product required an automated system to search retail images by colour and category. Image features such as colour, shape and texture were extracted using image processing techniques to cluster and classify images. Natural language processing was also performed to extract text features to enhance object recognition
Methods & Tools: multi-class classification, OpenCV, Scikit-Learn, Scikit-Image, NLTK
Company: The Globe and Mail
Problem: As the company transitioned to a subscriber-based business, the editorial team needed a more holistic metric to measure content success than the traditional page view. A proprietary algorithm was developed to relate various engagement metrics to a monetary value. To serve these custom designed metrics, a web application, Sophi, was built for users to generate reports.
Methods & Tools: attribution, MongoDB, PHP, NodeJS, AngularJS
Company: The Globe and Mail
Problem: To transition to a subscriber-based business, a percentage of content are placed behind the paywall. While page views are lost as a result, subscriptions are gained. To optimize this trade-off, a model was built to predict the success of a piece of content and automatically decide whether it should be marked as subscriber-exclusive content.
Methods & Tools: stochastic gradient descent classification, NLP, topic extraction, NLTK, Scikit-Learn
Company: Scribd
Problem: Each title in Scribd's platform has an associated publisher royalty and an implicit value to the company, by way of retaining or acquiring users. Given a publisher royalty budget, which titles should be included in the catalogue to maximize the total value while keeping fees under budget? Using a combinatorial optimization approach, an optimized catalogue was found and was evaluated against a full catalogue by running simulations.
Methods & Tools: combinatorial optimization, Monte Carlo simulations, PPS sampling, Google optimization tools
Company: The Globe and Mail
Lead Contributor: Natalie Ho
Problem: Editors wanted to be alerted of content that is performing above or below average so that they can take appropriate action. An outlier detection model was built along with a Slack bot to alert users of content that is performing extraordinarily.
Copyright © 2024 Jennifer Nguyen
All views expressed are mine and not my employer's
This website uses cookies. By continuing to use this site, you accept our use of cookies.
Are you interested in becoming a data scientist and not sure where to begin? In this guide, I share with you the skills needed to begin a career in data science and how to get them.