AI can help all of us be more successful for the future by allowing us to be efficient in our day-to-day operations. It can go through significant amounts of data, spot trends, and help us with routine operations all while we get the very important sleep that we need. But we need to be concerned with the unintended access that AI might have with our sensitive data when we are using either public or private instances of an AI engine for productivity. How will the world be able to leverage the power of AI while still maintaining data security and data privacy and limiting the exposure to sensitive information?
The answer lies in the difference between what AI sees and what we as human see. AI’s pattern matching is actually working at a binary level, not at a “word” (look up a better term for this) level. As such, it knows that it should be looking for a series of characters, but it does not really care that those characters mean anything. This is where application agnostic encryption comes into play.
In the same way we might protect our data using a sophisticated level of deterministic encryption (encryption that always results in the same cryptographic information) which preserves the relationships in a relational database, this same methodology could be used to allow AI to build learning models and still come to the same conclusions about the data (although the data is encrypted). So it can “learn” from the data without really have access to the underlying source data.
When developing GenAI applications, we need to ensure sensitive data is not shared with external GenAI engines, such as ChatGPT API. Specifically, we need to evaluate these two scenarios: 1) sensitive data provided by users in their prompts; 2) sensitive data in data sources integrated with your GenAI applications. For prompts entered by users, the GenAI application should automatically detect such sensitive data, and apply redaction or data masking to ensure the sensitive data is not sent to external GenAI engines. For internal data sources integrated with your GenAI application, the sensitive data within such data sources should be de-identified or masked, so that the sensitive data is not sent to or accessed by external GenAI engines.
OnData provides a mechanism for identifying and protecting sensitive data in your organization once and for all. It only decrypts the sensitive information for authorized users of that data and only at the time that they need to see it. Using this same methodology, OnData enables organizations to leverage the power of GenAI while making sure it does not have access to any sensitive information. Only for authorized users, the sensitive data is seamlessly made available from your GenAI applications.
As noted above, the other area we might need to protect sensitive information is to limit AI’s access to our unstructured data. In this case we may have sensitive data in spreadsheets, documents, and other files that we simply want to keep off of AI’s radar because it has no need to be looking at this data. As we use either public or private instances of the big provider AI engines, we need to just keep these documents out of the scope of what AI is learning. As such, the same methodology for identifying and protecting unstructured data provided with the OnData toolset can be used to keep these documents out of the AI learning process.
As AI evolves, it is likely to become more and more a part of our daily lives to accelerate and simplify some of our standard operations. It does this by learning how things are done, looking at facts and figures, data, and documents quickly and perpetually. OnData provides a toolset that can be used to identify and protect sensitive data to keep it out of the hands of someone who may misuse that information, and those same methodologies can provide a safer path for leveraging AI without it having access to the same sensitive information.
