Unity Catalog XTable Integration¶
This document walks through the steps to register an Apache XTable™ (Incubating) synced Delta table in Unity Catalog.
Apache XTable provides cross-table omni-directional interoperability between Apache Hudi, Apache Iceberg, and Delta Lake.
Pre-Requisites¶
- Source table(s) (Hudi/Iceberg) already written to external storage locations like S3/GCS/ADLS or local. In this guide, we will use a S3 example.
- Follow the XTable installation guide here
- Clone the Unity Catalog repository from here and build the project by following the steps outlined here
To sync a source Hudi/Iceberg table using XTable use the following:
sourceFormat: HUDI|ICEBERG # choose only one
targetFormats:
- DELTA
datasets:
tableBasePath: s3://path/to/source/data
tableName: table_name
partitionSpec: partitionpath:VALUE
Now, from your terminal under the cloned Apache XTable™ (Incubating) directory, run the sync process using the below command. This will generate the Delta Lake metadata.
java -jar xtable-utilities/target/incubator-xtable-utilities-0.1.0-SNAPSHOT-bundled.jar --datasetConfig my_config.yaml
Note: At this point, if you check your bucket path, you will be able to see _delta_log directory with the JSON log.
Configure Server Property for using S3¶
The server config file is at the location etc/conf/server.properties
For enabling server to vend AWS temporary credentials to access S3 buckets, the following parameters need to be set:
s3.bucketPath.i
: The S3 path of the bucket where the data is stored. Should be in the format s3://. s3.accessKey.i
: The AWS access key, an identifier of temp credentials.s3.secretKey.i
: The AWS secret key used to sign API requests to AWS.s3.sessionToken.i
: THE AWS session token, used to verify that the request is coming from a trusted source.
Run the Unity Server¶
bin/start-uc-server
Register the XTable-synced table in the Unity Catalog¶
In a separate terminal, run the following commands to register the target table in Unity Catalog.
bin/uc table create --full_name unity.default.people --columns "id INT, name STRING, age INT, city STRING, create_ts STRING" --storage_location s3://path/to/source/data
Validating the results¶
You can now read the table registered in Unity Catalog using the below command.
bin/uc table read --full_name unity.default.people