right_division Green SCM Distribution
Bookmark us
SCDigest Logo

SCDigest Expert Insight: Supply Chain by Design

About the Author

Dr. Michael Watson, one of the industry’s foremost experts on supply chain network design and advanced analytics, is a columnist and subject matter expert (SME) for Supply Chain Digest.

Dr. Watson, of Northwestern University, was the lead author of the just released book Supply Chain Network Design, co-authored with Sara Lewis, Peter Cacioppi, and Jay Jayaraman, all of IBM. (See Supply Chain Network Design – the Book.)

Prior to his current role at Northwestern, Watson was a key manager in IBM's network optimization group. In addition to his roles at IBM and now at Northwestern, Watson is director of The Optimization and Analytics Group.

By Dr. Michael Watson

March 3, 2015

Top Five Rules for Cleaning Data for a Strategic Analysis

Keep in Mind That Getting Clean Data for a Strategic Study is Never as Easy as it Should Be

Dr. Watson Says:

...The management team needs to realize that if they skip the data cleaning step, they may regret it later...
What Do You Say?

Click Here to Send Us Your Comments
Click Here to See Reader Feedback


I recently wrote an article on the difference between accounting data and data for a strategic study.   We followed that up with an interview on Supply Chain TV on the same topic.  The topic generated a lot of interest.  One reader wrote in with the following story:


 “I've experienced the thrill of building an operating model based on "Rock Solid" data provided by a Fortune 100 company only to have backed myself into a corner.

I wish I had heard your presentation prior to the self inflicted scars of taking data as good, without adequate scrutiny.”

Based on the topic's popularity, we thought it would be good to follow up with five rules for getting to a clean data set for a strategic project.  Ganesh Ramakrishna (my partner at Opex Analytics) came up with these rules.


Be patient.  Usually by the time the project starts, the management team wants (or has been promised) fast results.  It may do much more harm than good to show initial results without first coming up with a clean data set.  The person doing the project needs to set expectations that it will take time to develop a clean data set.  The management team needs to realize that if they skip the data cleaning step, they may regret it later.


Assume that the data is neither complete nor correct. No matter what you been told or hope about the data, you should force the analysis to prove that it is correct.  It like the old saying in journalism, “if you mom says she loves you, check it out.”  A rigorous checking often reveals innocent mistakes, data that wasn’t entered correctly, or missing data.  And, if the data turns out to be clean, a rigorous check doesn’t take very long.

Previous Columns by Dr. Watson

The Three Use Cases for Data Scientists

Learn Python, PuLP, Jupyter Notebooks, and Network Design

EOQ Model and the Hidden Costs of Fixed Costs

CSCMP Edge - Nike Quote: "It is All an Art Project Until you Get it on Someone's Feet"

Supply Chain by Design: Why Business Leaders should think of AI as an Umbrella Term



Cross check with other data sources and be as granular as possible. When validating and checking data, it is good to cross reference similar data from different sources.  For example, the sales data from the demand planning system should match the shipment data and should match the financial data.  Also, makes sure you don’t just check the summary level data.  We have found many cases where the summary level data matched, but there were many problems with the details. 



“Data Sushi Principle:  Raw Data is Better” (We saw this title at a talk at Strata this year and thought it was appropriate here too).  At the start of a strategic study, we are often asked for data templates.  Whenever you ask for data in a specific format, you are introducing plenty of room for errors—the data gets rolled up in ways you don’t want, the data gets calculated in ways that you don’t want, and data gets left out that shouldn’t be.  Instead, it is better to define the types of data needed and have the IT team pull the raw data.  And, if you have the raw data, you can later go back an fix issues that unexpectedly pop up.



Know your data architects and work with them: There are IT/Business Analysts who understand the meta data better than anyone else in the company. They are not your business users, nor your SQL Data extraction experts.  It is good to have these people on the team.  They will have good insight on how people are filling in the data and can likely help you anticipate possible problems.


Final Thoughts

When you see these discussions on cleaning data, you may wonder how any strategic projects get done or if it is even worth it.  Although there are always data problems, we’ve never run across a situation where we couldn’t get a clean set of data to work with.



Recent Feedback


No Feedback on this article yet