Americans are increasingly becoming aware of the vulnerability of their private data with breaches now up in the millions that have not only disclosed credit card information, but even more potentially dangerous information such as social security numbers linked to birthdates, names, and addresses. Your name and address along with a birth date and social security number is a virtual ticket to complete identity theft creating risks of massive financial destruction. There are multiple pieces of data that are private to an individual, but the social security number (SSN) is the one that is the most critical and is the focus of this post. The same principles apply to securing SSN as any other private data.
How did we get to this point? Is this just the inevitable consequence of Internet access, distributed systems, and evolving technology? Or is much of this due to irresponsibility of those that hold the most confidential aspects of your data? In this post, I make the argument that it is due to carelessness of organizations that hold such data and that there are ways to ensure safeguarding of data protecting not only against external threats, but against the more powerful threats from within an organization.
This post first articulates the drivers behind the situation which can be divided into two main areas: wrong assumptions and lack of diligence. Next, we identify the architectural principles of the software and infrastructure that mitigate the risks for security breach and contrast that with what most organizations believe is adequate. Finally, we provide practical strategies to remediate the situation.
The most common wrong assumption is that security threats are primarily outside of an organization rather than from within. Studies have shown this is not the case. Even in the case of external attack, there is often an insider involved such as distraught employee who gives away or even sells his username and password that will provide access to confidential data. Organizations spend large amounts of data on firewalls, external network monitoring, but how much do they invest to protect the confidentiality of the data within their own network?
Many organizations run on the assumption that their employee usernames and passwords could never be compromised and do not consider the impact of an external user using a username/password to gain access into the network. Many organizations allow the simple use of a username and password to access a VPN without use of any other physical validation such as a smart card or token device. But, even those that have restrictions to minimize the risk of an external user utilizing corporate credentials to enter the network are often careless about private data including the SSN and birthdate and have overly lose restrictions on access with often no auditing. A disgruntled user who turns dark has the potential to create just as much damage as any hacker from North Korea.
I find it interesting that virtually every break-in is blamed on sophisticated attacks from foreign entities or terrorists. However, the reality is that most of the break-ins that occur require very little sophistication and are probably mostly home-grown even if foreign entities are involved.
I have had the opportunity to work on database, architecture and application development projects over the years. Many of those customers stored SSN unencrypted and lacked effective safeguards to audit access to that data. If I was a criminal, I could have retired a long time ago. 10 million SSNs along with birthdates, and complete private data including with identifying information can be dumped from a database into a .CSV file in a manner of seconds. Such a file can be zipped up and encrypted, renamed to a .txt and small enough to be emailed as an attachment using a web mail client with virtually no detection. That is how easy it is to completely confiscate social security numbers within many organizations. At $25.00 per complete private identity record (street value) for 10 million records, that works out to a $250 million payoff.
Lack of Regulation
It amazes me that our government which supposedly is so concerned about protecting the welfare of it’s citizens has created virtually no regulatory standards for storage and retrieval of private data. Yes, there are requirements for secure external interfaces such as SSL, but that is only one piece of a secure infrastructure. I speak from experience in working with state and federal entities and have found that the vast majority of the databases that have SSNs do not even store these encrypted. Worse, the SSN is stored right alongside all of the information needed to identify the person linked to the number, including complete name, address, and many times even their birthdates. The complacency in regard to social security numbers is startling given that one cannot change this and once compromised means a person is at risk of identity theft for the rest of their life. At least when a password is compromised, it can be changed. There is no excuse for any organization to store social security numbers unencrypted in a database. This is a worse violation of security than storing a password in plain text.
Lack of Diligence
The fact that the government has not imposed regulations that require companies to store private data encrypted is no excuse for organizations. The necessary technology exists to not only encrypt private data, but to ensure that it remains encrypted through the lifecycle of it’s access and use. Even in organizations that have made attempts to secure this data, huge gaps are usually left. Consider the following ineffective strategies that often employed:
1) Data at Rest Security: This is the concept that data as it exists on any file system should be encrypted so that even if a database is stolen, it cannot be moved to another system so that the data can be used. Microsoft SQL Server Transparent Data Encryption (TDE) is an example of this approach. While this is useful to protect against physical data theft, it is useless for protection from internal threats unless a framework of auditing and security has been placed around access to the system containing the data. Secure data at rest systems still reveal private data to those that are authorized through the interfaces that access the data.
2) Reversible encryption: The problem with this is in the name – it is reversible. SQL Server again provides the capability to encrypt particular data columns such as a SSN. I have seen this utilized, but the fact that the encryption is reversible implies that it will be decrypted at some point downstream. In systems that use this, often the private key information is included in the views or stored procedures so that the data is only secure in the case of direct table access. Even in the case where the decryption is not accomplished until further down in the application layer, it is still decrypted and available to anybody who has the key. Worse, the security of the data is totally dependent on some supposedly private key that has to be defined somewhere in code. While reversible encryption provides some protection, it is only as good as the secret of the encryption key and anybody with access to that information has the keys to go in and exploit the data.
3) Partial obfuscation. Even the last 4 digits commonly used to validate along with other PI data should not need to be stored in plain text or with reversible encryption. A partial number such as the last 4 is still a ticket to identity theft when combined with other data. Any portion of the SSN that is involved with identification must be non-reversibly encrypted to minimize identity theft risk.
What is really needed
Earlier, I suggested that the only real solution to protecting private data such as social security numbers is to treat them with the same respect as a password. This implies that they utilize non-reversible encryption. With non-reversible encryption, there is no way to decrypt the value. The only way to determine a value that is based on non-reversible encryption is to actually know the value. When one knows the value, the value can be encrypted and then it can be matched against the stored encrypted key. But, is this practical? There are many arguments used for why social security numbers cannot be encrypted. Below is a list of them and why none of them are valid excuses:
1) The SSN is used to link data together. This is easily resolved through the use of surrogate keys.
2) Users need to search on a SSN or partial SSN to verify the identity – i.e. A credit agency will need the user’s security number as one piece of data to verify the person is authorized for the information requested. This is not a problem for a non-reversible encryption scheme. The full SSN as well as last 4 digits is stored in the database using a non-reversible encryption scheme. The user can enters the last 4 digits in a web form on the client over a secure connection. The web client then invokes the same encryption scheme to pass this along with other identifying information to find a match. Note that with non-reversible encryption whether it is a MD5 hash or some other technique, the encryption scheme does not even need to be kept private. Since there is no way to reverse the encrypted values, knowing the scheme does not circumvent the security.
The bottom line is that even organizations who need to utilize SSNs for validation purpose have no excuse to store the data in plain text or even with a reversible encryption scheme. At no point, should such data be unencrypted. Instead such data should be permanently one-way encrypted (non-reversible). Validation using the SSN should always be done with partial strings – the last 4, rather than the whole number. The full SSN does not need to be used since the last four digits can be encrypted separately. In this approach, presentation of the last part of the SSN is immediately encrypted to lookup a matching record in the database using just the last 4 digits encrypted along with some other personal identification, rather than the plain text value.
What is an organization to do that finds itself in the position of having SSNs in plain text or with a reversible encryption scheme, especially when it is linked to other private data and there are applications dependent on these structures? The first step is to start to remediate the situation immediately. This needs to be a top priority for any business, even given the government’s failures to make it a priority. Consider the risks and costs and litigation not to mention the tarnished reputation when a breach occurs. And it will occur or worse it maybe already has occurred and the business isn’t even aware of it. Few organizations actually audit external activities at the level of detail needed to even know if private data has been breached. They only find out usually after the data is already been utilized in a fraudulent fashion.
It may take months or years to remediate all the systems within an organization in a way that still allows the organization to do the business. In keeping with the principle of due diligence, what can an organization do until they reach the ideal state in securing data? The key is to go after the low hanging fruit to put safeguards in place even while mitigating the core issue. One way to do this is through implementation of a gatekeeper mentality. If you have to send an important package, you request tracking from the mail service. In the same way any confidential data must be tracked from end to end. That is the first step toward protection. It is not adequate, but it is a start.
The Gatekeeper Approach
Below are 5 steps to implementing a gatekeeper approach:
1) Ensure that all access to private data is captured. This includes the source from which the data was retrieved, the account used to access the data and everything that happens to the data afterwards. This means that all aspects of data movement must be tracked including not only the database access, but any user manipulation of the data after it has been retrieved. This means that both database auditing and network auditing must be implemented.
2) Define a repository for storing the complete lifecycle of data access corresponding to the actions captured from #1. For every piece of sensitive data, all access to the information must be accomplished. There are ways to implement this framework in SQL Server trough the use of custom profiler traces as well as extended events and logon triggers. I cannot speak to how to do this at the network level, but I know that it is possible.
3) Define a dictionary of rules that identify the type of accesses allowed to the data. As data movements are captured, exceptions will be encountered. Safeguards can be put in place to block movements that are not defined in the dictionary. A major part of this development is training such that the system learns from previously sanctioned activities so that it can recognize unauthorized accesses.
4) Stop movements that are not allowed by the dictionary. This may result in some pain as individuals with legitimate requests are blocked.
5) Put a learning framework in place with continuous feedback so that the systems can learn from mistakes and improve.
Databases: Any organization that is storing sensitive data non-encrypted or even with reversible encryption has been following a flawed data architecture. All databases and applications should be reviewed for compliance with principles of non-reversible encryption as well as complete tracking of all such data. Systems that are in violation need to be prioritized in terms of business impact due to downtime with changing them over to a secure approach.
Interfaces: In addition to databases, interfaces also need to be analyzed and modified to preserve security of SSN and other private data in flight. The service accounts associated with all processes must be linked to the interfaces using such techniques as IPSEC such that compromise of the password for the service account does not create a risk. It is not enough to simply have secure interfaces, the data within the secure interface must be obfuscated so that if the interface is compromised, there is little to gain from the data being transmitted.
User interfaces: User applications that simply retrieve data including unencrypted SSN must be modified to work with non-reversible encrypted SSN and components need to be added to allow validating a SSN by re-encrypting the portion of it needed to check for a match rather than by matching in the database.
This post just scratches the surface of the data security issue. I hope to write more in the upcoming months. It takes a focused team of database experts, network experts, application developers, architects, and user-interface experts working together to remediate this issue. The time is now to insist that organizations become accountable for protecting their customer data and this includes government agencies. Partial obfuscation with just the last 4 digits unencrypted is almost as bad as having the whole string unencrypted since this provides the means to impersonation to acquire the full SSN.
As a citizen, I encourage you to lobby your congressman for regulations that require protection of private data. Developers, I encourage you to not even consider transmitting any piece of private PI without using non-reversible encryption schemes. CEOs, I encourage you to place private data security at the top of the risk and treat this with the urgency of the Y2K crisis. DBAs, I encourage you to educate your organization and insist on not storing private data without non-reversible encryption. Architects, do not even consider a design that does not include complete auditing of all sensitive data flows or allows for transmission of SSN in plain text, even within the context of a SSL interface.