Dr. Watson Says: |
|
...the three to six man months is more than you will spend cleaning supply chain data, but not much more... |
|
What Do You Say?
|
|
|
|
All the talk and hype around Big Data has put the pressure on managers in the supply chain to do more with the data you have. But, what often frustrates managers is that the data you have is not clean enough or in the right format to answer the questions you want.
I was recently reminded that the problems we face in answering supply chain problems pops up in other areas as well:
First, in a recent Freakonomics podcast (around the 18:00 minute mark), Steven Levitt, who is also a partner in a consulting firm that answers questions with data- not unlike supply chain questions, made two interesting points:
|
1. |
Companies that embrace data will dominate those that don’t. (Nothing new with this idea, but worth reminding ourselves) |
|
2. |
Companies just don’t have the data they need to answer important questions. He mentioned that his firm will spend three to six man-months to just put together a data set they can use for a basic analysis. The problem, he says, is that “the data are held in 27 different data sets that have different identifiers.” |
The last point is the one that we see when we are pulling together supply chain data. We’ll have different demand files for different divisions, different transportation data from different parts of the business, and different production data. And, nothing will match up. Creating a coherent data set will take work—the three to six man-months is more than you will spend cleaning supply chain data, but not much more.
Previous Columns by
Dr. Watson |
|
|
Second, a New York Times article came out called “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights.” The article discusses the importance of data cleaning (or “data wrangling” or “data janitor work”). Like the Steve Levitt quote, the article reminds us of the strategic value of data and the fact that it is difficult to clean it. I liked the following sentence—it highlights that you have work to do before you get good answers:
|
“It’s an absolute myth that you can send an algorithm over raw data and have insights pop up,” said Jeffrey Heer, a professor of computer science at the University of Washington |
The above podcast and article remind us that the problems we have with cleaning supply chain data are the same ones faced by other people in different industries. In my experience, when doing a supply chain study, you should plan on spending about 60-70% of the total time developing a clean data set.
Final Thoughts:
|