Using Cloudera Navigator with Amazon S3
Amazon Simple Storage Service (S3) is a storage solution offered by Amazon Web Services (AWS) that provides highly available storage in the cloud. Clusters deployed not only in the AWS cloud but also on-premises are using Amazon S3 as persistent storage. Common use cases include BDR (backup and disaster recovery) and persistent storage for transient clusters deployed to the cloud, such as storage for ETL workload input and output.
This section provides conceptual information about Amazon S3 storage and shows you how to configure Cloudera Navigator to extract metadata and lineage from an Amazon S3 bucket.
- Log in to the Cloudera Navigator console.
- Under the Source Type filter, click the S3 selector to display all S3 entities.
- Click the Region filter to display any AWS regions that can be selected (only if S3 entities are from more than one region).
- To remove implicit folders from the S3 entities displayed, enter implicit:false in the Search field. Or enter implicit:true to display implicit entities.
- See S3 Properties for more information about entity properties displayed for Amazon S3 objects.
Amazon S3 Storage Characteristics
Amazon S3 is an object store rather than a file store or block store. It does not have the hierarchy found in typical filesystems. Amazon S3 uses the construct of a bucket as a container for objects. An object can be any kind of file—text file, image, photo, graphic, video, an ETL bundle to be ingested into a cluster, and so on.
Files can be added to Amazon S3 through the AWS Management Console, by using the AWS CLI, or by using scripts that invoke the CLI.
Amazon S3 storage is highly available because Amazon replicates data across multiple servers within its data centers and uses an eventual consistency model—not all accesses of an object on Amazon S3 may be reflected concurrently or instantaneously. However, eventually, all updates to data across servers are synchronized. The eventual consistency model can result in a short delay between the time objects are uploaded to Amazon S3 and the time their metadata is available in Cloudera Navigator. This is expected behavior and simply how eventual consistency works.
For more information about Amazon S3, see Amazon S3 documentation.
<< Configuring Extraction for Altus Clusters on AWS | ©2016 Cloudera, Inc. All rights reserved | Configuring Extraction for Amazon S3 >> |
Terms and Conditions Privacy Policy |