CybersecurityManagementLogical SecuritySecurity & Business Resilience

Embrace the data sprawl, securely

Half open laptop in blue — *Image via Unsplash*

Data is the lifeblood of every enterprise, so why does the task of managing and securing rapidly expanding volumes and types of data and new, modern threats continue to be a formidable challenge for most organizations? According to the Identity Theft Resource Center (ITRC), 2023 set a new record for data breaches, with data compromises jumping 78% over 2022. Malicious actors obviously see the value of data — but without visibility and controls into where data exists, who has access to it and its impact on the business, a security program will slow down the organization’s future growth while increasing security and compliance risks.

For many organizations, improving data-driven business outcomes while ensuring data security in the era of cloud, artificial intelligence (AI), and the internet of things (IoT) comes with inherent friction. To handle these frictions, it’s important to understand a few key issues: the challenges related to unchecked data growth, how data backup and security needs have changed, how to evaluate the business impact of your data and why it’s imperative to secure data now and in the future.

Unchecked data growth

It’s not difficult to understand that data is not static — intuitively, people understand that data grows and changes. Data growth reflects the increasingly common use of computers and technological advancements in data collection and storage. There are more devices connected to the internet than ever before due to the ever-expanding Internet of Things (IoT), all generating and transmitting data. Plus, more businesses are undergoing digital transformation, shifting from paper-based and manual processes to digital ones. From user-generated content to social media to tracking steps and storing shopping preferences, every activity results in more data growth.

Adding to the complexity of this growth is machine learning (ML) and AI, which both require large volumes of data to train algorithms and produce new data through their operations and output. In addition, Large Language Models (LLMs) pose unique challenges that cannot be ignored. Organizations must be vigilant both about the data inputted into LLMs and the information they output, in part because these models can inadvertently generate detailed and sensitive information. To manage these types of risk, organizations must evaluate and possibly integrate solutions that offer control over LLM outputs. These solutions may include proxy services or plugins (although current technologies may only provide partial control).

Data can also be stored longer (and more economically) in the cloud, resulting in larger data volumes and potentially multiple copies of data in different states due to the need for data backups for restore purposes and historical reference. Data is frequently updated or corrected based on latest information or when errors are identified, and it’s easy for users to share and create new copies of documents, videos, databases and everything in between. For all these reasons, both structured and unstructured data are growing at an astonishing rate.

Modern data backup and security needs

In the past, data was important, but a primary focus was backup and recovery; backup vendors enabled resiliency. For power outages or other natural disasters, disk failures and terrorist attacks, data backups were key to recovery. As ransomware attacks rose, backup vendors began to play a larger role in cybersecurity, enabling organizations to restore stolen files and potentially avoid paying to get access to encrypted information back. Now, attackers don’t just encrypt the data, they exfiltrate it and threaten to release sensitive information unless a ransom is paid. While a backup might still help in terms of recovery, it does nothing to protect sensitive data from being made publicly available. As attacks evolved, the needs of organizations changed, making backup vendors more involved in security than ever before.

And while production backups are vital for recovery, most data breaches occur in lower environments. Unfortunately, during the normal course of business, bits and pieces of data are copied around, diffusing into lower environments. This might occur when employees copy and paste from one document to another or a data scientist copies 10,000 rows from a database with billion rows. To eliminate the risk of exposing sensitive data in a lower environment data breach, organizations must implement advanced data detection technologies that can detect sensitive data, even when it is mixed with unrelated data. In addition, organizations must adopt tools that can detect similarities in source code, even if variables or functions are renamed. This capability helps to identify unauthorized copies or modifications to proprietary software.

With huge volumes of data distributed across various locations, organizations need more than what backup and recovery and traditional data security solutions can provide. Some organizations have turned to data security posture management (DSPM) solutions to gain visibility into all the data spread across their infrastructure. Unfortunately, many DSPMs provide only a bird’s eye view of the data. Users can see that their organization has a huge volume of data, but visibility without analysis and classification isn’t enough. Users need to know what to focus on: what exactly are the risks to each of these data sets, which machines hold what data and which users have access to what data, where and why. This is a huge undertaking.

Understanding the business impact of the data

With data spread across all business environments, there are multiple copies of data, and all an adversary needs is access to a single copy of sensitive data. With so many attack surfaces, attack methods have changed, and malicious actors are hitting with greater frequency and moving through networks faster than ever. When company infrastructure is breached and the organization has many users and a huge volume of requests, how do security professionals identify a few anomalous activities in this haystack of information? A quick response could mean the difference between quickly blocking the attack and a significant incident.

It’s not just about protected health information (PHI), personally identifiable information (PII) or payment card information (PCI), either. While protecting PHI, PII and PCI is important and failing to do so can result in fines, reputation damage and even harm business partnerships, malicious access to other data also has a material impact on business. Security leaders must also include protection for source code, financial statements, proprietary formulas and engineering drawings, as these are also vulnerable to unauthorized access and leaks.

In every organization, there are also classes of data that become part of the core intellectual property (IP). Whether it’s information about buyer technology, model weights, investment strategies described in key files or something else, this IP is critical to the success of the business. To comprehensively protect this data, security leaders must develop a data classification strategy that precisely identifies and tags all the data. The main challenge here is the volume and variety of data; modern data lakes are multi petabytes in size. Regardless of how much data an organization has, using, analyzing and protecting all of the data is vital to creating and executing on the organization’s long-term strategic plans.

The urgency of protecting data now

Many organizations believe that the data security solutions they have in place are adequate to meet their needs. Unfortunately, rapid data growth means that if users can’t discover, identify and classify all data continuously now, the risk of a data breach within weeks is startlingly high. There are more nation-state actors than ever before, prepared to take advantage of any vulnerability or misconfiguration. Other malicious actors are becoming increasingly sophisticated, leveraging AI and ML themselves to make it easy to conduct more sophisticated attacks.

Organizations must evolve their approach to data security just as quickly as attackers have changed their approaches. Backup solution providers and legacy data security solutions weren’t designed to assess petabytes of data, analyze and identify risks, and then take the appropriate remediation steps. Anything less delays data security efforts and puts organizations at risk. It’s time to embrace data sprawl and enable the secure use of data to power business success.