Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Unity Catalog supplies a unified governance resolution for all knowledge and AI belongings in your lakehouse on any cloud. As prospects undertake Unity Catalog, they need to do that programmatically and mechanically, utilizing infrastructure as a code method. With Unity Catalog, there’s a single metastore per area, which is the top-level container of objects in Unity Catalog. It shops knowledge belongings (tables and views) and the permissions that govern entry.
This presents a brand new problem for organizations that shouldn’t have centralized platform/governance groups to personal the Unity Catalog administration operate. Particularly, groups inside these organizations now need to collaborate and work collectively on a single metastore, i.e. how one can govern entry and carry out auditing in full isolation from one another.
On this weblog put up, we’ll talk about how prospects can leverage the assist for Unity Catalog objects within the Databricks Terraform supplier to handle a distributed governance sample on the lakehouse successfully.
We current two options:
As a one-off bootstrap exercise, prospects have to create a Unity Catalog metastore per area they function in. This requires an account administrator, which is a highly-privileged that’s solely accessed in breakglass eventualities, i.e. username & password saved in a secret vault that requires approval workflows for use in automated pipelines.
An account administrator must authenticate utilizing their username & password on AWS:
supplier "databricks" {
host = "https://accounts.cloud.databricks.com"
account_id = var.databricks_account_id
username = var.databricks_account_username
password = var.databricks_account_password
}
Or utilizing their AAD token on Azure:
supplier "databricks" {
host = "https://accounts.azuredatabricks.web"
account_id = var.databricks_account_id
auth_type = "azure-cli" # or azure-client-secret or azure-msi
}
The Databricks Account Admin wants to offer:
The Terraform code can be much like under (AWS instance)
useful resource "databricks_metastore" "this" {
identify = "main"
storage_root = var.central_bucket
proprietor = var.unity_admin_group
force_destroy = true
}
useful resource "databricks_metastore_data_access" "this" {
metastore_id = databricks_metastore.this.id
identify = aws_iam_role.metastore_data_access.identify
aws_iam_role {
role_arn = aws_iam_role.metastore_data_access.arn
}
is_default = true
}
Groups can select to not use this default location and identification for his or her tables by setting a location and identification for managed tables per particular person catalog, or much more fine-grained on the schema degree. When managed tables are created, the information will then be saved utilizing the schema location (if current) falling again to the catalog location (if current), and solely fall again to the metastore location if the prior two places haven’t been set.
When making a metastore, we nominated the unity_admin_group
because the metastore administrator. To keep away from having a government that may checklist and handle entry to all objects within the metastore, we’ll hold this group empty
useful resource "databricks_group" "admin_group" {
display_name = var.unity_admin_group
}
Customers could be added to the group for distinctive break-glass eventualities which require a excessive powered admin (e.g., organising preliminary entry, altering possession of catalog if catalog proprietor leaves the group).
useful resource "databricks_user" "break_glass" {
for_each = toset(var.break_glass_users)
user_name = every.key
pressure = true
}
useful resource "databricks_group_member" "admin_group_member" {
for_each = toset(var.break_glass_users)
group_id = databricks_group.admin_group.id
member_id = databricks_user.break_glass[each.value].id
}
Every workforce is chargeable for creating their very own catalogs and managing entry to its knowledge. Preliminary bootstrap actions are required for every new workforce to get the required privileges to function independently.
The account admin then must carry out the next:
team-admins
Grant CREATE CATALOG, CREATE EXTERNAL LOCATION
, and optionally GRANT CREATE SHARE, PROVIDER, RECIPIENT
if utilizing Delta Sharing to this workforce
useful resource "databricks_group" "team_admins" {
display_name = "team-admins"
}
useful resource "databricks_grants" "sandbox" {
metastore = databricks_metastore.this.id
grant {
principal = databricks_group.team_admins.display_name
privileges = ["CREATE_CATALOG", "CREATE_EXTERNAL_LOCATION", "CREATE SHARE", "CREATE PROVIDER", "CREATE RECIPIENT"]
}
}
When a brand new workforce onboards, place the trusted workforce admins within the team-admins group
useful resource "databricks_user" "team_admins" {
for_each = toset(var.team_admins)
user_name = every.key
pressure = true
}
useful resource "databricks_group_member" "team_admin_group_member" {
for_each = toset(var.team_admins)
group_id = databricks_group.team_admins.id
member_id = databricks_user.team_admins[each.value].id
}
Members of the team-admins
group can now simply create new catalogs and exterior places for his or her workforce with out interplay from the account administrator or metastore administrator.
In the course of the technique of including a brand new workforce to Databricks, preliminary actions from an account administrator is required in order that the brand new workforce is free to arrange their workspaces / knowledge belongings to their desire:
useful resource "databricks_group" "team_X_admins" {
display_name = "team_X_admins"
}
useful resource "databricks_user" "team_X_admins" {
for_each = toset(var.team_X_admins)
user_name = every.key
pressure = true
}
useful resource "databricks_group_member" "team_X_admin_group_member" {
for_each = toset(var.team_X_admins)
group_id = databricks_group.team_X_admins.id
member_id = databricks_user.team_X_admins[each.value].id
}
useful resource "databricks_storage_credential" "exterior" {
identify = "team_X_credential"
azure_managed_identity {
access_connector_id = azurerm_databricks_access_connector.ext_access_connector.id
}
remark = "Managed by TF"
proprietor = databricks_group.team_X_admins.display_name
}
useful resource "databricks_metastore_assignment" "this" {
workspace_id = var.databricks_workspace_id
metastore_id = databricks_metastore.this.id
default_catalog_name = "hive_metastore"
}
Some organizations might not need to make groups autonomous in creating belongings of their central metastore. In reality, giving a number of groups the power to create such belongings could be tough to control, naming conventions can’t be enforced and conserving the setting clear is tough.
In such a situation, we propose a mannequin the place every workforce recordsdata a request with an inventory of belongings they need admins to create for them. The workforce can be made proprietor of the belongings to allow them to be autonomous in assigning permissions to others.
To automate such requests as a lot as doable, we current how that is finished utilizing a CI/CD. The admin workforce owns a central repository of their most popular versioning system the place they’ve all of the scripts that deploy Databricks of their group. Every workforce is allowed to create branches on this repository so as to add the Terraform configuration recordsdata for their very own environments utilizing a predefined template (Terraform Module). When the workforce is prepared, they create a pull request. At this level, the central admin has to evaluation (this may be additionally automated with the suitable checks) the pull request and merge it to the principle department, which can set off the deployment of the assets for the workforce.
This method permits one to have extra management over what particular person groups do, nevertheless it entails some (restricted, automatable) actions on the central admins’ workforce.
On this situation, the Terraform scripts under are executed mechanically by the CI/CD pipelines utilizing a Service Principal (00000000-0000-0000-0000-000000000000), which is made account admin. The one-off operation of creating such service principal account admin have to be manually executed by an present account admin, for instance:
useful resource "databricks_service_principal" "sp" {
application_id = "00000000-0000-0000-0000-000000000000"
}
useful resource "databricks_service_principal_role" "sp_account_admin" {
service_principal_id = databricks_service_principal.sp.id
position = "account admin"
}
When a brand new workforce needs to be onboarded, they should file a request that can create the next objects (Azure instance):
team_X_admins
, which accommodates the Account Admin Service Principal (to permit future modifications to the belongings) plus the members of the group
useful resource "databricks_group" "team_X_admins" {
display_name = "team_X_admins"
}
useful resource "databricks_user" "team_X_admins" {
for_each = toset(var.team_X_admins)
user_name = every.key
pressure = true
}
useful resource "databricks_group_member" "team_X_admin_group_member" {
for_each = toset(var.team_X_admins)
group_id = databricks_group.team_X_admins.id
member_id = databricks_user.team_X_admins[each.value].id
}
knowledge "databricks_service_principal" "service_principal_admin" {
application_id = "00000000-0000-0000-0000-000000000000"
}
useful resource "databricks_group_member" "service_principal_admin_member" {
group_id = databricks_group.team_X_admins.id
member_id = databricks_service_principal.service_principal_admin.id
}
useful resource "azurerm_resource_group" "this" {
identify = var.resource_group_name
location = var.resource_group_region
}
useful resource "azurerm_databricks_workspace" "this" {
identify = var.databricks_workspace_name
resource_group_name = azurerm_resource_group.this.identify
location = azurerm_resource_group.this.location
sku = "premium"
}
useful resource "azurerm_storage_account" "this" {
identify = var.storage_account_name
resource_group_name = azurerm_resource_group.this.identify
location = azurerm_resource_group.this.location
account_tier = "Customary"
account_replication_type = "LRS"
account_kind = "StorageV2"
is_hns_enabled = "true"
}
useful resource "azurerm_storage_container" "container" {
identify = "container"
storage_account_name = azurerm_storage_account.this.identify
container_access_type = "personal"
}
useful resource "azurerm_databricks_access_connector" "this" {
identify = var.databricks_access_connector_name
resource_group_name = azurerm_resource_group.this.identify
location = azurerm_resource_group.this.location
identification {
sort = "SystemAssigned"
}
}
useful resource "azurerm_role_assignment" "this" {
scope = azurerm_storage_account.this.id
role_definition_name = "Storage Blob Information Contributor"
principal_id = azurerm_databricks_access_connector.metastore.identification[0].principal_id
}
useful resource "databricks_metastore_assignment" "this" {
metastore_id = databricks_metastore.this.id
workspace_id = azurerm_databricks_workspace.this.workspace_id
}
useful resource "databricks_storage_credential" "storage_credential" {
identify = "mi_credential"
azure_managed_identity {
access_connector_id = azurerm_databricks_access_connector.this.id
}
remark = "Managed identification credential managed by TF"
proprietor = databricks_group.team_X_admins
}
useful resource "databricks_external_location" "external_location" {
identify = "exterior"
url = format("abfss://%[email protected]%s.dfs.core.home windows.web/",
"container",
"storageaccountname"
)
credential_name = databricks_storage_credential.storage_credential.id
remark = "Managed by TF"
proprietor = databricks_group.team_X_admins
depends_on = [
databricks_metastore_assignment.this, databricks_storage_credential.storage_credential
]
}
useful resource "databricks_catalog" "this" {
metastore_id = databricks_metastore.this.id
identify = var.databricks_catalog_name
remark = "This catalog is managed by terraform"
proprietor = databricks_group.team_X_admins
storage_root = format("abfss://%[email protected]%s.dfs.core.home windows.web/managed_catalog",
"container",
"storageaccountname"
)
}
As soon as these objects are created the workforce is autonomous in growing the undertaking, giving entry to different workforce members and/or companions if vital.
Groups will not be allowed to change belongings autonomously in Unity Catalog both. To do that they’ll file a brand new request with the central workforce by modifying the recordsdata they’ve created and make a brand new pull request.
That is true additionally if they should create new belongings reminiscent of new storage credentials, exterior places and catalogs.
Above, we walked by means of some tips on leveraging built-in product options and beneficial finest practices to deal with enablement and ongoing administration hurdles for Unity Catalog.
Go to the Unity Catalog documentation [AWS, Azure], and our Unity Catalog Terraform information [AWS, Azure] to be taught extra