Business leaders are not likely to argue against the importance of data-driven decision making. Data analytics gives businesses insight in real-time. This allows them to react to changes in markets and customer requirements. These benefits have led to a rapid evolution in the concept of the data warehouse, which has been around for decades. The “big data” platform, Hadoop, emerged. Next came “data as a Service” and “data lakes.” Google BigQuery, Amazon Redshift Spectrum and Snowflake all brought enormous scale to data operations. They leveraged the cloud’s elastic and distributed nature and created huge repositories that include both data warehouses as well as lakes.
These technologies developed in tandem with the data use cases. It is long past the days of waiting for data to be available and submitting a request. Data is now a self-serve resource. This allows data scientists to have access to any data they need at any time. Many companies consider data operations an essential requirement. DataOps, just like DevOps revolutionized application development, has transformed how data is stored and accessed. This is especially true in cloud-native environments.
These self-service data models are a boon for the business but also present new risks. Companies consolidating data in large repositories can lead to looker security issues that don’t apply to either self-serve or cloud technology. As a result, compliance and security problems can arise. People can gain access to personally identifiable information (PII), even if there are no controls. This could be a breach of privacy policies, security policies or regulations. Even more concerning is the fact that security teams often lose sight of data flows within an organization making it difficult to provide consistent governance.
Satori’s Method: Security and Agile Data Governance
Based on their prior experience in securing large data systems, Yoav Cohen and Eldad Chai, co-founders of Satori, they understood the problem. They set out to create a DataOps privacy and security control layer. Satori Secure Data Access Platform allows organizations to manage data governance and security across cloud data stores. It does this without regard for the type of data store or type.
These data governance functions are available on the Satori platform:
Satori allows for fine-grained access control policies. This can be done based on user identities, groups and data types as well as schema. Satori’s console and APIs allow teams to manage policies as-code. This platform includes policies to implement the NIST Cyber Security Framework, the Payment Card Industry Data Security Standard Standard (PCI DSS), as well as other features.
Satori offers sophisticated data protection: Satori can mask data dynamically in accordance with minimal access policies or enforce a particular access pattern. Satori can ensure that analysts are able to retrieve only masked PII within a specified time limit. It can trigger a workflow in real time based on data types and user identities.
Satori allows security teams to gather information about the data flow within an organization. Administrators can create granular data maps or data access audits. This allows them to examine each user’s access to the data store. These data flow maps can be used by organizations for security and business analysis. They show data usage by users, groups and volume as well as tags and locations.
ARCHITECTURALLY SPEAKING
Satori is transparent proxy service. It consists of two components: the Context Engine (the Policy engine) and the Context Engine (the Engine). The Context Engine inspects queries and returns asynchronously, creating a map showing how data flows and how organizations use it. The Policy engine determines the context for data access and applies the policy to access a particular type of data.
Satori placed high importance on reliability and low latency when implementing these functions to ensure that the Secure Data Access Platform meets the business’s security and performance requirements. Satori achieves this goal by:
A reliable network proxy is essential for performance and reliability. However, they are not always the best choice. Application layer proxies can prove to be very effective. However, it requires organizations to add an additional component to their technology stack. This increases both complexity as well as the possibility of higher latency. Satori instead relied on Nginx as a reliable and proven network proxy. According to Netcraft, Nginx served and proxyed 25.75% of the most visited websites in August 2020 as well as 36.45% of all active websites. Satori does not require organizations to add application proxy elements to their technology stacks. Instead, Satori can concentrate on query inspection, data mapping and policy application functions, which leverage Nginx’s reliability and performance.
Dynamic in-lining is used to ensure low latency. As Nginx proxy queries, Satori’s Context Engine asynchronously analyzes the combination of who the customer is, what data they request, and what they wish to do with that data. Satori’s Policy Engine uses dynamic inlining to interrupt connections only when necessary, applying security teams and developers policies, and taking the actions that policies direct. Benchmarks show that there is no additional latency in small to medium result sets (10MB and less), and around 5 percent latency in large result sets (over 100MB).
Integration with existing identity management and access management systems: Data security policies that work on user identities and data they wish to access are effective. Satori uses existing identity and access management systems (IAM) to identify the user identities and attributes that will drive access control policies. Satori currently works with Okta. Active Directory support is planned for the company in the near future.
Real-time mapping and classification of data: Satori’s asynchronous architecture allows it to classify and map data flows in real time without affecting performance. Satori automatically classes various data types within the result set based upon the actual data and metadata (e.g. column or field names). Satori is able to detect sensitive data occurrences and adjust the policy accordingly. Data mapping, when combined with classification, gives data operations and security teams a clear view of where data is moving within their organizations. Security teams can use data mapping to map governance programs to see the real world of how people are using it.
Rust programming language for safety and performance: Rust is rapidly becoming the preferred choice for safe and critical software components. It is fast and reliable, with no garbage collection or runtime. The focus on memory safety also ensures greater security and reliability. Satori decided to implement the Data Access Controller using Rust because of its importance in data operations. This ensures performance and security.