This is a continuing series where I interview current data scientists to glean insights about getting into the field, the types of projects they’re working on and what they think of this ‘big data’ trend. Here’s an interview below Carlos Medina, Data Scientist at Charity Water.
1. Given that big data is a widely used term, how do you see big data in your day to day work?
I’ve always felt funny about the term. To be honest, I don’t know where “Regular Data” ends and “Big Data” begins (when one starts using Hadoop?)… Since I work with different teams across the organization, I’ve seen it in places like the analysis of donations, campaigns, donors, etc, from my work with the Finance and Marketing/Product teams; from the Water Programs team: the data from the remote sensors that transmit live from our wells to track the behavior of water consumption and project functionality in the villages that we’ve helped.
2. What are some of the hardest problems you are working on?
Currently, there are two main ones:
- Analyzing the long term dynamics of our different products (donations, subscriptions, campaigns, major donors, etc) and how to “balance” them.
- Identifying factors that contribute to the functionality of a water project.
A little context on the second one (functionality): The funny thing is that the difficulties don’t come from a lack of tools, what makes it “hard” is rather the datasets themselves: sometimes they come from our partners, and often from different non-unified sources, formatted in different ways and languages. It’s almost impossible to automate the cleaning process, and some of the most time consuming elements come from this aspect.
3. What goal do you hope to accomplish working with data at a non-profit?
From the personal side: I really enjoy the idea that my work eventually allows other people to live a better life. From the technical side: To increase the size of the sector’s “Analytical Toolbox.” There are techniques that are being heavily used in other sectors and areas (Physics, Sports, Financial Industry, etc) that are just as applicable to the non-profit world, but, for some reason, are just not used (or known).
4. What draws you to this field? What tips or advice do you have for people interested in getting into data science?
It’s fun and challenging. As for advice: Get to know Complex Systems, or at least the main ideas behind the way they work. Not even kidding. Most of the major mess-ups I’ve seen in different sectors that rely on models come from not understanding that dynamical interactions between different agents can create non-linear responses. Also, give this guy a go: http://www.wired.com/2013/02/big-data-means-big-errors-people/