The Difference between ‘Structured’ and ‘Unstructured’ Data

Broadly speaking, there are two types of data. Structured data is highly organised and formatted in a way that is easily searchable in databases. It follows a specific schema, meaning it is organized into predefined fields and tables. The best example of this is an excel spreadsheet (data in rows and columns) but it extends to relational databases (SQL) and even online forms, where information like date of birth, age and address are captured in specific fields. 

The second type of data is unstructured. Unstructured data lacks a predefined format or organisation, making it more complex to process and analyse. It is typically text-heavy, like a word document or social media post, but photos and videos are equally good examples.

There is a third category though, and it's a bit of a half-way house of the two above. A good example of this is the humble email. It has structured data (to, cc and bcc fields, subject line and the date it was sent), but it also has unstructured data (the content of the email, unless it's a table, excel attachment or other structured data - you get the gist. This group of data is called Semi-structured because it's, well, semi structured.  

Is the goal to convert all unstructured data into structured data?

Nope! Structured data is typically the easiest to work with because it is easy to integrate with other datasets, easy to store, index and retrieve efficiently and it gives a standardised starting point for traditional data analysis and techniques. However, unstructured data is better for preserving the context and nuances within the data, especially in text and multimedia content. And if your application needs real-time analysis, there isn’t an opportunity to convert it first. 

So the real objective is to understand the specific goal of the analysis (usually a single question that needs answering) whilst considering both the nature of the available data, the available tools and how skilled the data analyst is. There has been exponential growth in the number of advanced analytics tools that can now handle unstructured data directly. The downside is that you really need to know what you are doing and make sure that by using them you don’t lose valuable information in the conversion process. 

Repaying ‘Digital Debt’

One of my favourite applications of these new tools is converting unstructured paper-based records into analytics friendly structured data. This process has the added benefit of storing that data in a less perishable and more collaborative format. These forms are digitised by good ol’ scanning off of the 1990s, but from there, Machine Learning can be used to assess sentiment, context and nuances as well as convert required information into structured format. At all times the original document is available to the end user to protect against that loss of context or nuance we discussed earlier. It also makes the whole thing future proof - the next improved tool is just around the corner. 


To hear more about this tool, see a demo of it in action, or discuss any of your Data Strategy needs, get in touch! hello@missiondecisions.com

Previous
Previous

Beyond the Hype: Senior Officers’ Toolkit for Evaluating AI Solutions

Next
Next

What is CLEAN data?