Enable cross-account sharing with direct IAM principals using AWS Lake Formation Tags

With AWS Lake Formation, you can build data lakes with multiple AWS accounts in a variety of ways. For example, you could build a data mesh, implementing a centralized data governance model and decoupling data producers from the central governance. Such data lakes enable the data as an asset paradigm and unleash new possibilities with data discovery and exploration across organization-wide consumers. While enabling the power of data in decision-making across your organization, it’s also crucial to secure the data. With Lake Formation, sharing datasets across accounts only requires a few simple steps, and you can control what you share.

Lake Formation has launched Version 3 capabilities for sharing AWS Glue Data Catalog resources across accounts. When moving to Lake Formation cross-account sharing V3, you get several benefits. When moving from V1, you get more optimized usage of AWS Resource Access Manager (AWS RAM) to scale sharing of resources. When moving from V2, you get a few enhancements. First, you don’t have to maintain AWS Glue resource policies to share using LF-tags because Version 3 uses AWS RAM. Second, you can share with AWS Organizations using LF-tags. Third, you can share to individual AWS Identity and Access Management (IAM) users and roles in other accounts, thereby providing data owners control over which individuals can access their data.

Lake Formation tag-based access control (LF-TBAC) is an authorization strategy that defines permissions based on attributes called LF-tags. LF-tags are different from IAM resource tags and are associated only with Lake Formation databases, tables, and columns. LF-TBAC allows you to define the grant and revoke permissions policy by grouping Data Catalog resources, and therefore helps in scaling permissions across a large number of databases and tables. LF-tags are inherited from a database to all its tables and all the columns of each table.

Version 3 offers the following benefits:

True central governance with cross-account sharing to specific IAM principals in the target account
Ease of use in not having to maintain an AWS Glue resource policy for LF-TBAC
Efficient reuse of AWS RAM shares
Ease of use in scaling to hundreds of accounts with LF-TBAC

In this post, we illustrate the new features of cross-account sharing Version 3 in a producer-consumer scenario using TPC datasets. We walk through the setup of using LF-TBAC to share data catalog resources from the data producer account to direct IAM users in the consumer account. We also go through the steps in the receiving account to accept the shares and query the data.

Solution overview

To demonstrate the Lake Formation cross-account Version 3 features, we use the TPC datasets available at s3://aws-data-analytics-workshops/shared_datasets/tpcparquet/. The solution consists of steps in both accounts.

In account A, complete the following steps:

As a data producer, register the dataset with Lake Formation and create AWS Glue Data Catalog tables.
Create LF-tags and associate them with the database and tables.
Grant LF-tag based permissions on resources directly to personas in consumer account B.

The following steps take place in account B:

The consumer account data lake admin reviews and accepts the AWS RAM invitations.
The data lake admin gives CREATE DATABASE access to the IAM user lf_business_analysts.
The data lake admin creates a database for the marketing team and grants CREATE TABLE access to lf_campaign_manager.
The IAM users create resource links on the shared database and tables and query them in Amazon Athena.

The producer account A has the following personas:

Data lake admin – Manages the data lake in the producer account

lf-producersteward – Manages the data and user access

The consumer account B has the following personas:

Data lake admin – Manages the data lake in the consumer account

lf-business-analysts – The business analysts in the sales team needs access to non-PII data

lf-campaign-manager – The manager in the marketing team needs access to data related to products and promotions

Prerequisites

You need the following prerequisites:

Two AWS accounts. For this demonstration of how AWS RAM invites are created and accepted, you should use two accounts that are not part of the same organization.
An admin IAM user in both accounts to launch the AWS CloudFormation stacks.
Lake Formation mode enabled in both the producer and consumer account with cross-account Version 3. For instructions, refer to Change the default permission model.

Lake Formation and AWS CloudFormation setup in account A

To keep the setup simple, we have an IAM admin registered as the data lake admin.

Sign into the AWS Management Console in the us-east-1 Region.
On the Lake Formation console, under Permissions in the navigation pane, choose Administrative roles and tasks.
Select Choose Administrators under Datalake administrators.
In the pop-up window Manage data lake administrators, under IAM users and roles, choose IAM admin user and choose Save.
Choose Launch Stack to deploy the CloudFormation template:

Choose Next.
Provide a name for the stack and choose Next.
On the next page, choose Next.
Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
Choose Create.

Stack creation should take about 2–3 minutes. The stack establishes the producer setup as follows:

Creates an Amazon Simple Storage Service (Amazon S3) data lake bucket
Registers the data lake bucket with Lake Formation
Creates an AWS Glue database and tables
Creates an IAM user (lf-producersteward) who will act as producer steward
Creates LF-tags and assigns them to the created catalog resources as specified in the following table

Database
Table
LF-Tag Key
LF-Tag Value
Resource Tagged

lftpcdb
.
Sensitivity
Public
DATABASE

lftpcdb
items
HasCampaign
true
TABLE

lftpcdb
promotions
HasCampaign
true
TABLE

lftpcdb
customers table columns = “c_last_name”,”c_first_name”,”c_email_address”
Sensitivity
Confidential
TABLECOLUMNS

Verify permissions in account A

After the CloudFormation stack launches, complete the following steps in account A:

On the AWS CloudFormation console, navigate to the Outputs tab of the stack.

Choose the LFProducerStewardCredentials value to navigate to the AWS Secrets Manager console.
In the Secret value section, choose Retrieve secret value.
Note down the secret value for the password for IAM user lf-producersteward.

You need this to log in to the console later as the user lf-producersteward.

On the LakeFormation console, choose Databases on the navigation pane.
Open the database lftpcdb.
Verify the LF-tags on the database are created.

Choose View tables and choose the items table to verify the LF-tags.

Repeat the steps for the promotions and customers tables to verify the LF-tags assigned.

On the Lake Formation console, under Data catalog in the navigation pane, choose Databases.
Select the database lftpcdb and on the Actions menu, choose View Permissions.
Verify that there are no default permissions granted on the database lftpcdb for IAMAllowedPrincipals.
If you find any, select the permission and choose Revoke to revoke the permission.
On the AWS Management Console, choose the AWS CloudShell icon on the top menu.

This opens AWS CloudShell in another tab of the browser. Allow a few minutes for the CloudShell environment to set up.

Run the following AWS Command Line Interface (AWS CLI) command after replacing {BUCKET_NAME} with DataLakeBucket from the stack output.

aws s3 cp s3://aws-data-analytics-workshops/shared_datasets/tpcparquet/ s3://${BUCKET_NAME}/tpcparquet/ –recursive

If CloudShell isn’t available in your chosen Region, run the following AWS CLI command to copy the required dataset from your preferred AWS CLI environment as the IAM admin user.

Verify that your S3 bucket has the dataset copied in it.
Log out as the IAM admin user.

Grant permissions in account A

Next, we continue granting Lake Formation permissions to the dataset as a data steward within the producer account. The data steward grants the following LF-tag-based permissions to the consumer personas.

Consumer Persona
LF-tag Policy

lf-business-analysts
Sensitivity=Public

lf-campaign-manager
HasCampaign=true

Log in to account A as user lf-producersteward, using the password you noted from Secrets Manager earlier.
On the Lake Formation console, under Permissions in the navigation pane, choose Data Lake permissions.
Choose Grant.
Under Principals, select External accounts.
Provide the ARN of the IAM user in the consumer account (arn:aws:iam::<accountB_id>:user/lf-business-analysts) and press Enter.

Under LF_Tags or catalog resources, select Resources matched by LF-Tags.
Choose Add LF-Tag to add a new key-value pair.
For the key, choose Sensitivity and for the value, choose Public.
Under Database permissions, select Describe, and under Table permissions, select Select and Describe.

Choose Grant to apply the permissions.
On the Lake Formation console, under Permissions in the navigation pane, choose Data Lake permissions.
Choose Grant.
Under Principals, select External accounts.
Provide the ARN of the IAM user in the consumer account (arn:aws:iam::<accountB_id>:user/lf-campaign-manager) and press Enter.
Under LF_Tags or catalog resources, select Resources matched by LF-Tags.
Choose Add LF-Tag to add a new key-value pair.
For the key, choose HasCampaign and for the value, choose true.

Under Database permissions, select Describe, and under Table permissions, select Select and Describe.
Choose Grant to apply the permissions.

 Verify on the Data lake permissions tab that the permissions you have granted show up correctly.

AWS CloudFormation setup in account B

Complete the following steps in the consumer account:

Log in as an IAM admin user in account B and launch the CloudFormation stack:

Choose Next.
Provide a name for the stack, then choose Next.
On the next page, choose Next.
Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
Choose Create.

Stack creation should take about 2–3 minutes. The stack sets up the following resources in account B:

IAM users datalakeadmin1, lf-business-analysts, and lf-campaign-manager, with relevant IAM and Lake Formation permissions
A database called db_for_shared_tables with Create_Table permissions to the lf-campaign-manager user
An S3 bucket named lfblog-athenaresults-<your-accountB-id>-us-east-1 with ListBucket and write permissions to lf-business-analysts and lf-campaign-manager

Note down the stack output details.

Accept resource shares in account B

After you launch the CloudFormation stack, complete the following steps in account B:

On the CloudFormation stack Outputs tab, choose the link for DataLakeAdminCredentials.

This takes you to the Secrets Manager console.

On the Secrets Manager console, choose Retrieve secret value and copy the password for DataLakeAdmin user.
Use the ConsoleIAMLoginURL value from the CloudFormation template output to log in to account B with the data lake admin user name datalakeadmin1 and the password you copied from Secrets Manager.
Open the AWS RAM console in another browser tab.
In the navigation pane, under Shared with me, choose Resource shares to view the pending invitations.

You should see two resource share invitations from the producer account A: one for database-level share and one for table-level share.

Choose each resource share link, review the details, and choose Accept.

After you accept the invitations, the status of the resource shares changes from Active from Pending.

Grant permissions in account B

To grant permissions in account B, complete the following steps:

On the Lake Formation console, under Permissions on the navigation pane, choose Administrative roles and tasks.

Under Database creators, choose Grant.

Under IAM users and roles, choose lf-business-analysts.
For Catalog permissions, select Create database.
Choose Grant.
Log out of the console as the data lake admin user.

Query the shared datasets as consumer users

To validate the lf-business-analysts user’s data access, perform the following steps:

Log in to the console as lf-business-analysts, using the credentials noted from the CloudFormation stack output.
On the Lake Formation console, under Data catalog in the navigation pane, choose Databases.

Select the database lftpcdb and on the Actions menu, choose Create resource link.

Under Resource link name, enter rl_lftpcdb.
Choose Create.
After the resource link is created, select the resource link and choose View tables.

You can now see the four tables in the shared database.

Open the Athena console in another browser tab and choose the lfblog-athenaresults-<your-accountB-id>-us-east-1 bucket as the query results location.
Verify data access using the following query (for more information, refer to Running SQL queries using Amazon Athena):

Select * from rl_lftpcdb.customers limit 10;

The following screenshot shows the query output.

Notice that account A shared the database lftpcdb to account B using the LF-tag expression Sensitivity=Public. Columns c_first_name, c_last_name, and c_email_address in table customers were overwritten with Sensitivity=Confidential. Therefore, these three columns are not visible to user lf-business-analysts.

You can preview the other tables from the database similarly to see the available columns and data.

Log out of the console as lf-business-analysts.

Now we can validate the lf-campaign-manager user’s data access.

Log in to the console as lf-campaign-manager using the credentials noted from the CloudFormation stack output.
On the Lake Formation console, under Data catalog in the navigation pane, choose Databases.

Verify that you can see the database db_for_shared_tables shared by the data lake admin.

Under Data catalog in the navigation pane, choose Tables.

You should be able to see the two tables shared from account A using the LF-tag expression HasCampaign=true. The two tables show the Owner account ID as account A.

Because lf-campaign-manager received table level shares, this user will create table-level resource links for querying in Athena.

Select the promotions table, and on the Actions menu, choose Create resource link.

For Resource link name, enter rl_promotions.

Under Database, choose db_for_shared_tables for the database to contain the resource link.
Choose Create.
Repeat the table resource link creation for the other table items.

Notice that the resource links show account B as owner, whereas the actual tables show account A as the owner.

Open the Athena console in another browser tab and choose the lfblog-athenaresults-<your-accountB-id>-us-east-1 bucket as the query results location.
11. Query the tables using the resource links.

As shown in the following screenshot, all columns of both tables are accessible to lf-campaign-manager.

In summary, you have seen how LF-tags are used to share a database and select tables from one account to another account’s IAM users.

Clean up

To avoid incurring charges on the AWS resources created in this post, you can perform the following steps.

First, clean up resources in account A:

Empty the S3 bucket created for this post by deleting the downloaded objects from your S3 bucket.
Delete the CloudFormation stack.

This deletes the S3 bucket, custom IAM roles, policies, and the LF database, tables, and permissions.

You may choose to undo the Lake Formation settings also and add IAM access back from the Lake Formation console Settings page.

Now complete the following steps in account B:

Empty the S3 bucket lfblog-athenaresults-<your-accountB-id>-us-east-1 used as the Athena query results location.
Revoke permission to lf-business-analysts as database creator.
Delete the CloudFormation stack.

This deletes the IAM users, S3 bucket, Lake Formation database db_for_shared_tables, resource links, and all the permissions from Lake Formation.

If there are any resource links and permissions left, delete them manually in Lake Formation from both accounts.

Conclusion

In this post, we illustrated the benefits of using Lake Formation cross-account sharing Version 3 using LF-tags to direct IAM principals and how to receive the shared tables in the consumer account. We used a two-account scenario in which a data producer account shares a database and specific tables to individual IAM users in another account using LF-tags. In the receiving account, we showed the role played by a data lake admin vs. the receiving IAM users. We also illustrated how to overwrite column tags to mask and share PII data.

With Version 3 of cross-account sharing features, Lake Formation makes possible more modern data mesh models, where a producer can directly share to an IAM principal in another account, instead of the entire account. Data mesh implementation becomes easier for data administrators and data platform owners because they can easily scale to hundreds of consumer accounts using the LF-tags based sharing to organizational units or IDs.

We encourage you to upgrade your Lake Formation cross-account sharing to Version 3 and benefit from the enhancements. For more details, see Updating cross-account data sharing version settings.

About the authors

Aarthi Srinivasan is a Senior Big Data Architect with AWS Lake Formation. She likes building data lake solutions for AWS customers and partners. When not on the keyboard, she explores the latest science and technology trends and spends time with her family.

Srividya Parthasarathy is a Senior Big Data Architect on the AWS Lake Formation team. She enjoys building analytics and data mesh solutions on AWS and sharing them with the community.