Monday, December 25, 2023

How usually the data security is handled with LLMs and its hosting providers

Will Azure OpenAI Models Store my Data and Train on those? 


No, In OpenAI that is secure within the Azure service boundary. Now, having said that, the Azure OpenAI model is hosted on shared Azure tenant, when any application wants to use the Azure OpenAI models, they will call the models via a dedicated and secure API. When we configure the API, we can use a checkbox to “opt out” of Azure saving any data for audit purpose (not training but simply audit). This will ensure that Azure will not save any usage data for logging or auditing. From a training perspective, Azure confirms and states in their service contract that no data used to interact with the model is ever used in training. This is similar to service contract to like usage of Azure Blob Storage or Snowflake database or Amazon S3 etc. where we store our data in Azure (or AWS etc.) cloud with the confidence that the data is not used for anything else.


Does one's model only serve them? 

Yes at least in Azure, only one person's LLM is serving only one. There is service boundary. So, how it works is — Azure will host a dedicated OpenAI model in Azure’s own environment. That environment is managed and controlled by Azure and within Azure network and service boundary. Within that setup, Azure will create a dedicated instance for you which can only be connected and integrated with your Azure Open AI service (which is running in your Azure account) thsough keys. This means that only the applications running in your Azure account can connect to the dedicated LLM model.


What are things need to be worried when moving LLMs to production? 


1) how to scale out; 

2) how to guarantee SLAs; 

3) how to handle hundreds of concurrent calls in parallel; 

4) how to handle the token limits in LLM’s; 

5) implement content moderations and safety breaks in the solution; 

6) responsible AI; six- audibility and logging; 

7) handle hallucinations and provide confidence in response; 

8) client data protection and role/ attribute based access control; 

9) handle security and vulnerability; 

10) high availability and disaster recovery; 

11) data aging 


 

references:

https://ai.plainenglish.io/faqs-and-frequent-questions-on-large-language-models-and-genai-dc004217f161

No comments:

Post a Comment