7 Tips for better data handling
If data was key to success, which is something we have learnt over the past years, then why are we still trusting our instinct more than data? Is it that our instinct just feels right, while data may not seem right at first sight? Are we being influenced by that time a while ago where we trusted some data and it turned out to be wrong? Or maybe we remember the expression “fake news” and add some doubt to the data we are looking at.
Data is everywhere, and it is becoming more accessible too. So is bad quality data, or incorrect data, which is also becoming more and more available. I believe bad quality data actually spreads faster and wider than good quality data.
So what do we do about this? What do we do at a company wide level, but also at a personal level? These are a few tips that may be useful when handling data:
1. Examine the source: We should only take data from sources we trust. Even if that means handling much less data than desired. It’s better to have less data but good quality data, instead of having loads of data which can’t really be trusted. Remember what we say about predictive models: an average model fed with good data is much better than an excellent model fed with bad quality data.
2. Start small: It’s always tempting to be ambitious and think we can mix different sources and build that dream dataset which will provide golden insights. Yes, that is possible, and it may work once in a million times, but you have higher chances of being successful if you start small and iterate. Start with a small dataset, build confidence in your data. Then once that small step is successful, start thinking about adding more data, but always taking one small step at a time.
3. Share your insights: Insights are not useful if you keep them to yourself. Sharing your insights starts a virtuous circle where other people benefit from your data, your data quality increases, you may get new ideas from users, etc.
4. Document your data: Nobody will use a dataset they don’t understand. Even yourself will not re-use a dataset that you haven’t properly documented. Documenting a dataset is a good practice towards ensuring your data can be trusted.
5. Pay attention to the way your data is being visualized. We are sometimes tempted to build very complex charts that nobody can understand. A chart should be as simple as possible, with a clear intention of showing something, or providing a specific insight. Try to make simpler charts, and carefully choose the variables that are going to be part of each visualization.
6. Data expires: your data is useful for a limited amount of time. Old, outdated data, loses value. So unless you are prepared to keep your data up to date, make sure you get the value out of your data before it becomes too old to provide any useful insight.
7. Adopt a format that is simple and easy to distribute and consume: There are some well established standards about data. Pick a standard, and try to stick to it. You will benefit from re-using scripts and tools that you may have built for other datasets, avoiding the frustration of re-work. You will also share that benefit with other consumers that use your data.
I hope you are able to adopt these simple 7 tips and start building confidence in using trusted, good quality data for your analysis.
Thank you for reading! 😀