Part three of their series on The Definitive Guide to Data Classification, Digital Guardian are discussing the different approaches to data classification, with guidelines on choosing the right method for your organisation.
Starting at the most basic level, there are two ways to perform data classification: automated and manual. Automated classification can scale quickly, while a manual approach will give a direct touch to the data. To be successful, your data classification should leverage both methods. Analyst firm Forrester has the following to say here:
“Dynamic data classification requires the integration of both manual processes involving employees as well as tools for automation and enforcement.”1
Within that spectrum, these three different approaches are the industry standard for data classification:
- Content-based classification
- Context-based classification
- User-based classification
Each method analyzes a document and assigns a classification level to it; this “tag” is what drives data protection decisions and actions. How each company arrives at that decision, however, varies.
Content-based classification inspects and interprets files looking for sensitive information. Methods include fingerprinting and regular expression. This approach answers the question “What is in the document?” and relies upon examining the information inside the file, using a number of different techniques such as regular expression, fingerprinting, or Bayesian engines.
Context-based classification looks at application, location, or creator among other variables as indirect indicators of sensitive information. Context-based answers: How is the data being used? Who is accessing it? Where are they moving it? When are they accessing it? If content looks inside the box, context looks at the shipping label.
Both content- and context-based classification have varying levels of automation in them to drive rapid deployment, scalability, and accuracy.
Finally, user-based classification depends on a manual, end-user selection of each document. User-based classification relies on user knowledge and discretion at creation, edit, review, or dissemination to flag sensitive documents.
Each of those three deliver value, but to be most effective they need to align with the primary business need.
Is your challenge mainly protecting PCI/PII, PHI, or GDPR-protected data? Regulated data is often structured data with a consistent pattern. Leading with a content-based classification will provide the greatest ability to accurately classify PII, PHI, PCI, and GDPR data.
Contrast that with the “anything goes” that is typically the case with intellectual property (IP) data. To address this, context-based classification looks to other attributes of the document to assign a classification. For example, all documents created in AutoCAD likely contain proprietary engineering specifications.
When it comes to including the end user in your overall security program, user-based classification is ideal. Data owners should know their data best. A user-based classification approach allows them to apply this knowledge to improve classification accuracy.
Data classification drives amazing insights about your organization, but to realize them with accuracy you need to look for the right method. Content-, context-, and user-based approaches can both be right or wrong depending on the business need and data type. Automation helps with enterprise scalability while manual approaches apply the human understanding of data that cannot easily be achieved any other way.
Many enterprises realize each of the challenges above, and a mixed classification approach often delivers the most accuracy and visibility. Finally, it is important that any data protection solution you use can see and interpret each of this tags, understand what to do when there is a conflict between them, and apply protective measures based on classification levels.
Find out how you can create the perfect classification blend for your organisation by arranging a custom demonstration of Boldon James Classifier here.
This post was originally published on the Digital Guardian blog here.