azure-datalake

datarootsio/azure-datalake/module

Terraform Module HCL MODULE

Terraform module for an Azure Data Lake

Install
module "azure-datalake" {
source = "datarootsio/azure-datalake/module"
version = "0.5.13"
}
plain text: /constructs/tfmod-datarootsio-azure-datalake-module/install.txt
⭐ Source on GitHub 📦 Registry page
README

Terraform module Azure Data Lake This is a module for Terraform that deploys a complete and opinionated data lake network on Microsoft Azure. ![maintained by dataroots](https://dataroots.io) ![Terraform 0.13](https://www.terraform.io) ![Terraform Registry](https://registry.terraform.io/modules/datarootsio/azure-datalake/module/) ![tests](https://github.com/datarootsio/terraform-module-azure-datalake/actions) ![Go Report Card](https://goreportcard.com/report/github.com/datarootsio/terraform-module-azure-datalake) Components Azure Data Factory for data ingestion from various sources Azure Data Lake Storage gen2 containers to store data for the data lake layers Azure Databricks to clean and transform the data Azure Synapse Analytics to store presentation data Azure CosmosDB to store metadata

Inputs (43)
NameTypeDescriptionDefault
key_vault_depends_onstringOptionally set to a dependency for the Key Vault secrets (e.g. access policy) required
data_warehouse_dtustringService objective (DTU) for the created data warehouse (e.g. DW100c) required
storage_replicationstringType of replication for the storage accounts. See https://www.terraform.io/docs/ required
sql_server_admin_passwordstringPassword of the administrator of the SQL server required
data_factory_vsts_tenant_idstringOptional tenant ID for the VSTS back-end for the created Azure Data Factory. You required
data_lake_namestringName of the data lake (has to be globally unique) required
service_principal_client_idstringClient ID of the existing service principal that will be used for communication required
service_principal_client_secretstringClient secret of the existing service principal that will be used for communicat required
service_principal_object_idstringObject ID of the existing service principal that will be used for communication required
resource_group_namestringName of the resource group where the resources should be created required
sql_server_admin_usernamestringUsername of the administrator of the SQL server required
regionstringRegion in which to create the resources required
use_key_vaultboolSet this to true to enable the usage of your existing Key Vaultfalse
data_factory_vsts_project_namestringOptional project name for the VSTS back-end for the created Azure Data Factory. ""
data_factory_vsts_repository_namestringOptional repository name for the VSTS back-end for the created Azure Data Factor""
data_factory_github_repository_namestringOptional repository name for the GitHub back-end for the created Azure Data Fact""
dl_aclmap(string)Optional set of ACL to set on the filesystem roots inside the data lake. This is{}
provision_synapseboolSet this to false to disable the creation of the Synapse Analytics instance.true
databricks_workspace_namestringDue to changes in how Terraform modules can use provider configurations, the mod""
provision_databricks_resourcesboolSet this to true to provision all Databricks related resources.false
data_factory_github_branch_namestringOptional branch name for the GitHub back-end for the created Azure Data Factory.""
databricks_max_workersnumberMaximum amount of workers in an active cluster4
data_factory_vsts_account_namestringOptional account name for the VSTS back-end for the created Azure Data Factory. ""
data_factory_vsts_root_folderstringOptional root folder for the VSTS back-end for the created Azure Data Factory. Y""
dl_directoriesmap(map(string))Optional root directories to be created inside the data lake. The value is a map{}
data_lake_filesystemslist(string)A list of filesystems to create inside the storage account[ "raw", "clean", "curated", "in
log_analytics_workspace_idstringOptional Log Analytics Workspace ID where logs are stored""
databricks_cluster_node_typestringNode type of the Databricks cluster machines"Standard_F4s"
key_vault_idstringID of the optional Key Vault. The module will store all relevant secrets inside ""
data_factory_github_git_urlstringOptional Git URL (either https://github.mycompany.com or https://github.com) for""
databricks_cluster_versionstringRuntime version of the Databricks cluster"7.2.x-scala2.12"
databricks_workspace_resource_group_namestringDue to changes in how Terraform modules can use provider configurations, the mod""
… and 3 more inputs
Outputs (9)
storage_dfs_endpoint — Primary DFS endpoint of the created storage account
storage_account_name — Name of the created storage account for ADLS
data_factory_name — Name of the created Data Factory
data_factory_id — Resource ID of the Data Factory
sql_dw_server_hostname — Name of the SQL server that hosts the Azure Synapse Analytics instance
sql_dw_server_database — Name of the Azure Synapse Analytics instance
data_factory_identity — Object ID of the managed identity of the created Data Factory
name — Name of the data lake
created_key_vault_secrets — Secrets that have been created inside the optional Key Vault with their versions
Resources (23)
azurerm_cosmosdb_accountazurerm_cosmosdb_sql_containerazurerm_cosmosdb_sql_databaseazurerm_data_factoryazurerm_data_factory_linked_service_data_lake_storage_gen2azurerm_key_vault_access_policyazurerm_key_vault_secretazurerm_monitor_diagnostic_settingazurerm_role_assignmentazurerm_sql_databaseazurerm_sql_firewall_ruleazurerm_sql_serverazurerm_storage_accountazurerm_storage_data_lake_gen2_filesystemazurerm_template_deploymentdatabricks_azure_adls_gen2_mountdatabricks_clusterdatabricks_instance_pooldatabricks_secretdatabricks_secret_scopedatabricks_tokenlocal_filenull_resource
Details
FrameworkTerraform Module
LanguageHCL
Version0.5.13
Cloud MODULE
★ Stars30
Forks22
Total downloads1.9k
Inputs43
Outputs9
Resources23
LicenseMIT
Namespacedatarootsio
Updated