talent crunch

Data classification: How Organizations can Identify and Manage Data

In the previous, third series of articles on Data with relation to the Indian PDPB2019, we had delved deeper into the meaning and context of Personal Information and Data. In this fourth series, we will look at data classification to further define the right process to identify type of data to enable organisations to better manage and control it.

My PI-My-Safety-My Responsibility?

The responsibility for protecting Personal information (PI) is not solely attributed to organizations, responsibility may be shared with the individual owners of the data. Companies may or may not be legally liable for the PI they hold.

However, according to most studies, almost half of users perceive this to be the company’s responsibility to protect their personal data, and majority of users said they would be disappointed, if this data is somehow leaked or compromised. In light of the public perception that organizations are responsible for PI, it is a widely accepted as a best practice to secure PI.

Also read: Data Privacy and Information Security: What Becomes of Users’ Personal Information?

Discovering and classifying personal data

Determining what types of data is collected (e.g., medical, financial, or personally identifying data such as Aadhaar Number), where and how the data is collected, where data is stored, who has access to the data and where are they physically located, and how it is erased, data flows within and across the different units of the organisation, and data transfers within and between geographies.

What is Data Classification

Data classification defines and classifies data according to its type, sensitivity, and perceived value and loss to the organization, if edited, compromised, or deleted, either knowingly or unknowingly. It helps an organization understand the value of its data, determine whether the data is at risk, and implement controls to mitigate risks.

Also read: Defining Privacy and Data Privacy in the Indian PDPB 2019

Some others reasons for classification include optimising data storage by defining and segregating unused or less frequently used data, identify sensitive files and trade secrets and to doubly secure company critical data.

Data classification also helps an organization comply with relevant industry-specific regulatory compliances from industry specifics regulators such as RBI, TRAI, SEBI, IRADA to name a few.

What Type of Data is Included?

When considering Privacy issues, any personal information that could be sensitive or can be used with malicious intent, has to be considered. These data types may include data related to:

  • Personal Data records; pertaining to the maintaining the privacy of the individual
  • Geographic records: sharing of personal information such as location or address online can be a potential risk and needs protection from unauthorized use.
  • Social presence: This includes all personal data that is given out during online interactions.

This includes your posts, views and comments on social media sites. Many sites have a privacy policy regarding the use of the data shared by users or collected from users.

  • Financial: Any financial information shared online or offline is sensitive as it can be utilized to commit financial fraud.
  • Sexual: A growing number of activist consider this to be privileged information.
  • Medical: Any detail of medical treatment and history is privileged information and cannot be disclosed to a third party, without prior consent.
  • Religious and Political Privacy: this has become a growing concern that these preferences should be privileged information.

Data Sensitivity Levels

Data is classified according to its sensitivity level – high, medium, or low.

Types of Data Classification

Data classification can be performed based on content, context, or user selections:

  • Content-based classification – inspects and interprets files looking for sensitive information. Defines the class of the file as per the perceived importance and confidentiality of the content.
  • Context-based classification—involves classifying files based on meta data like the application that created the file (for example, CRM software), the person who created the document (such as Marketing department), or the location in which files were authored or modified (such as IT or legal department).
  • User-based classification – involves classifying data files according to the best of intent judgement of an expert user. Individuals who work with documents can specify how sensitive they are – they can do so when they create the document, after a significant edit or review, or before releasing the document.

Data States and Data Format

Two additional dimensions of data classifications are:

Data states: Data exists in one of three states – at rest, in process, or in transit. In this regard data classified as confidential must remain confidential.

Data format: Data can be either structured or unstructured. Structured data are usually human readable and can be indexed. Examples of structured data are database objects and spreadsheets. Unstructured data are usually not human readable or indexable. Such data, such as emails, audio/video files, web pages, source code and social media messages, doesn’t fit neatly into the traditional row and column structure of relational databases.

By Sameer Mathur, Founder and CEO, SM Consulting

President, Delhi-NCR Chapter of the Foundation of Data Protection Professionals in India

With inputs from Vijayashankar Nagaraj Rao

Leave a Reply

Your email address will not be published. Required fields are marked *