Privacy preservation is critical to managing sensitive data. Two components of Skyflow’s infrastructure, the Privacy Preservation Engine (PPE) and Privacy Data Types (PDT) allow the platform to preserve privacy for your sensitive data.
The Privacy Preservation Engine leverages sophisticated inference and encryption algorithms to secure data and ensure that privacy is not compromised. Some of these features are Polymorphic Encryption, De-Identification, and Anonymization.
Encryption securely protects all sensitive data at rest, in transit and during processing. Traditional databases secure data at rest, but during a query, in-memory decryption followed by processing makes it insecure. Processing data post-decryption is a huge vulnerability for businesses — 46% of data breaches happen on in-memory data. Skyflow’s proprietary polymorphic encryption techniques allow data to be queried and processed while remaining encrypted, providing unprecedented security and privacy.
De-identification is used in conjunction with tokenization technology to help businesses reduce the amount of sensitive data they store on their servers and replace them with tokens. We support several kinds of tokens, from format-preserving tokens to Data-Loss-Prevention (DLP) tokens and random tokens. Tokenization is a powerful technology that can help businesses reduce their compliance scope by limiting the amount of sensitive data that is stored.
Anonymization is used to abstract away the identifying details in data, to prevent the data from being traced back to an individual. Anonymization enables use-cases like secure multi-party data sharing, that allows businesses to share data with their partners without compromising the integrity and privacy of the data, and data analytics on privacy sensitive data from hospitals.
Privacy Data Types (PDTs) are Skyflow-defined classifications of various kinds of sensitive data. One example of a PDT is Social Security Number. The PDT dictates how Skyflow treats the underlying data by default. Since SSNs are extremely sensitive information, the Social Security Number PDT specifies redaction of SSN data by default. PDTs apply industry-standard privacy and security rules to your data right out of the box.
To view information about the PDT for a particular field, click the arrow next to the field name in the Vault Browser and click View PDT:
Each PDT considers the identifiability and sensitivity of underlying data and specifies a Default DLP Policy, the Default Token Policy, the supported Operations, and the supported Validations of the data.
Identifiability is the ease with which an individual could be identified using data. The default DLP policy is derived from how easily an individual can be identified using a type of data. A DLP policy determines the level of exposure of data to the end-user.
The supported values are:
- Low - Cannot be easily identified. Some examples are Country, State and ZIP code.
- Moderate - Can be identified relatively easily when combined with other data but cannot uniquely identify the person, for example education information or employment information.
- High - Can uniquely identify the person, for example name, address, telephone number or email address.
The sensitivity of a data field is the level of damage it could cause to the person the data belongs to if compromised. Sensitivity is a factor in deriving the default DLP policy.
Skyflow has used the NIST(National Institute of Standards and Technology) guidelines to define our sensitivity levels..
The supported levels are:
- Low - Doesn’t cause harm more significant than an inconvenience, such as changing a telephone number.
- Moderate - Could result in financial loss due to identity theft or denial of benefits, public humiliation, discrimination and the potential for blackmail.
- High - Involves serious physical, social, or financial harm, resulting in potential loss of life, loss of livelihood or inappropriate physical detention.
A DLP policy is used for transformations on a privacy data type. It helps prevent unauthorized access to sensitive data by transforming sensitive data to reveal minimal or no information based on the data type.
Each of the privacy data types has a default DLP policy associated with the privacy data type and its identifiability/sensitivity levels.
The supported values are:
- Mask - Data is partially masked in a format so that it becomes non-identifiable. For example, the email address email@example.com becomes ****@gmail.com.
- Redact - Completely obscures all the data. No part of the data is revealed. For example, the email address firstname.lastname@example.org becomes REDACTED.
Tokenization is the process of substituting sensitive data with a non-sensitive equivalent, referred to as a token, that has no extrinsic or exploitable meaning or value. The token is a reference (or an identifier) that maps back to the sensitive data and can be used instead of the actual sensitive data.
Token policies are token types generated for a privacy data type. Each privacy data type is assigned a default token policy.
The supported values are:
- Random Token - A randomly generated token not derived from input data that is stored in your database as a placeholder for the sensitive data
- Format Preserving Token - Format-preserving tokenization makes sure that the tokens are in the same format as the original data. For example, the clear-text data email@example.com is tokenized as firstname.lastname@example.org. It retains the same format as the original data but is unreadable.
Skyflow supports the ability for users and developers to perform queries and operations on fully encrypted data. This ability preserves the security and privacy of the data while making it actionable. The privacy data type determines the operations that can be performed. Users can perform these operations on a PDT:
- Exact Match - Exact match operations attempt to match the query to a value in the data exactly. An exact match operation can retrieve a record for a specific user with a particular email address, SSN, or another identifier.
- Aggregation - Aggregation operations attempt to aggregate values and produce a SUM within a given query. This operation also supports AVERAGE operations.
- Order - Order operations attempt to produce range operations such as GREATER THAN or LESS THAN.
- All Ops - When added to a PDT, it allows all the above operations
You can perform the following validations to limit the values entered for a privacy data type.
- Regular Expressions - Some privacy data types have a specific format or allowed character set that must match with input data. These constraints are validated using regular expression validation. For example, each SSN should be nine digits, which may be in a formatted or unformatted pattern.
- Bounded Values - Some Privacy data types should only be allowed to have values from a predefined set. A predefined set of values disallows invalid values for these privacy data types. For example, users should only select a country from a list of possible country values.