“So this way, you break down the silos, you get the innovation that Google has invested in for more than decade, and you still keep your open formats and open standards that you love as an organization,” Hasbe said. BigLake will enable customers to bring the data governance and performance management capabilities of BigQuery to bear against the large data sets they have stored in these formats, the Google product manager said. Two other important components of BigLake is support for open standards and support for open processing engines, which are components of BigLake’s support for Dataplex, Google Cloud’s data fabric solution, Hasbe said.īig Lake customers will be able to store their data in popular open data formats, such as Parquet and ORC, in addition to emerging formats, such as Iceberg. According to Hasbe, these capabilities have also been extended to data lakes offered by AWS and Microsoft Azure too. What Google Cloud has done with BigLake is it has taken the governance, security, and performance-management capabilities that it has already developed in BigQuery and it has extended it into Google Cloud Storage, the company’s data lake environment. “The biggest advantages then you don’t have to duplicate your data across two different environments and create data silos.” “Specifically, BigLake allows companies to unify the data warehouse and lakes to analyze data without worrying about the underlying storage format or systems,” Hasbe said.
These data silos–and the problems that are associated with data silos–begin to dissipate with BigLake, Hasbe said.
Google is melding its data warehouse with its data lake with BigLake (Image courtesy Google Cloud) And these provide different capabilities historically, and that actually creates lot of data silos.” “And so all of these different types of data are being stored across different systems, whether it’s in data warehouses for structured data or semi-structured, or its data lakes for…all the other types of data. “Then semi-structured data with clickstream comes in, and then over a period of time you have unstructured data around product images and machine as well as IoT data that we’re getting collected. ‘This is your orders and shipments in a retail environment,’” Hasbe said during a press conference on Monday. While Google Cloud has made progress in improving the scale and flexibility of both storage repositories, customers often gravitate to one storage environment or the other depending on the type of data they’re working with, according to Sudhir Hasbe, senior director of product management Google Cloud.
It also is a leader in data warehousing via BigQuery, which provides traditional SQL processing for structured data. Google Cloud is no stranger to data lakes, with its Google Cloud Storage offering, which offered nearly limitless storage for less-structured data in an object storage system that is S3 compatible. The company also used the opening of its Data Cloud Summit to announce a preview of BigBI, which extends Looker’s semantic data layer to other BI products.
Google Cloud made its way into the lakehouse arena today with the launch of Big Lake, a new storage engine that melds the governance of its data warehousing offering, BigQuery, with the flexibility of open data formats and the ability to use open compute engines.