Cloudera Enterprise 5.15.x | Other versions

Enabling Replication Between Clusters with Kerberos Authentication

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

To enable replication between clusters, additional setup steps are required to ensure that the source and destination clusters can communicate.

  Note: If either the source or destination cluster is running Cloudera Manager 4.6 or higher, then both clusters (source and destination) must be running 4.6 or higher. For example, cross-realm authentication does not work if one cluster is running Cloudera Manager 4.5.x and one is running Cloudera Manager 4.6 or higher.

Continue reading:

Considerations for Realm Names

If the source and destination clusters each use Kerberos for authentication, use one of the following configurations to prevent conflicts when running replication jobs:
  • If the clusters do not use the same KDC (Kerberos Key Distribution Center), Cloudera recommends that you use different realm names for each cluster. Additionally, if you are replicating across clusters in two different realms, see the steps for HDFS and Hive replication later in this topic to setup trust between those clusters.
  • You can use the same realm name if the clusters use the same KDC or different KDCs that are part of a unified realm, for example where one KDC is the master and the other is a slave KDC.
  •   Note: If you have multiple clusters that are used to segregate production and non-production environments, this configuration could result in principals that have equal permissions in both environments. Make sure that permissions are set appropriately for each type of environment.
  Important: If the source and destination clusters are in the same realm but do not use the same KDC or the KDcs are not part of a unified realm, the replication job will fail.

HDFS Replication

  Note: These steps also enable Hive/Impala replication with Kerberos. The additional steps in Hive/Impala Replication are only required if the source and destination clusters use a Cloudera Manager version before 5.12.
  1. On the hosts in the destination cluster, ensure that the krb5.conf file (typically located at /etc/kbr5.conf) on each host has the following information:
    • The kdc information for the source cluster's Kerberos realm. For example:
      [realms]
       SOURCE.MYCO.COM = {
        kdc = src-kdc-1.src.myco.com:88
        admin_server = src-kdc-1.src.myco.com:749
        default_domain = src.myco.com
       }
       DEST.MYCO.COM = {
        kdc = dest-kdc-1.dest.myco.com:88
        admin_server = dest-kdc-1.dest.myco.com:749
        default_domain = dest.myco.com
       }
    • Domain/host-to-realm mapping for the source cluster NameNode hosts. You configure these mappings in the [domain_realm] section. For example, to map two realms named SRC.MYCO.COM and DEST.MYCO.COM, to the domains of hosts named hostname.src.myco.com and hostname.dest.myco.com, make the following mappings in the krb5.conf file:
      [domain_realm]
       .src.myco.com = SRC.MYCO.COM
       src.myco.com = SRC.MYCO.COM
       .dest.myco.com = DEST.MYCO.COM
       dest.myco.com = DEST.MYCO.COM
  2. On the destination cluster, use Cloudera Manager to add the realm of the source cluster to the Trusted Kerberos Realms configuration property:
    1. Go to the HDFS service.
    2. Click the Configuration tab.
    3. In the search field type "Trusted Kerberos" to find the Trusted Kerberos Realms property.
    4. Enter the source cluster realm.
    5. Click Save Changes to commit the changes.
  3. In the search field, type "domain name".
  4. Enter the domain names for Kerberos.
  5. If domain_realm is configured in the Advanced Configuration Snippet (Safety Valve) for remaining krb5.conf, remove the entries for it.
  6. If your Cloudera Manager release is 5.0.1 or lower, restart the JobTracker to enable it to pick up the new Trusted Kerberos Realm settings. Failure to restart the JobTracker prior to the first replication attempt may cause the JobTracker to fail.

Hive/Impala Replication

  1. Perform the procedure described in the previous section, including restarting the JobTracker.
      Note: If the source and destination clusters both run Cloudera Manager 5.12 or higher, you can skip steps 2 and 3 in this section. These additional steps are no longer required for Hive/Impala replication. When you complete the configuration steps in HDFS Replication, you also configure Hive/Impala replication..
  2. On the hosts in the source cluster, ensure that the krb5.conf file on each host has the following information:
    • The kdc information for the destination cluster's Kerberos realm.
    • Domain/host-to-realm mapping for the destination cluster NameNode hosts.
  3. On the source cluster, use Cloudera Manager to add the realm of the destination cluster to the Trusted Kerberos Realms configuration property.
    1. Go to the HDFS service.
    2. Click the Configuration tab.
    3. In the search field type "Trusted Kerberos" to find the Trusted Kerberos Realms property.
    4. Enter the destination cluster realm.
    5. Click Save Changes to commit the changes.

    It is not necessary to restart any services on the source cluster.

Kerberos Connectivity Test

As part of Test Connectivity, Cloudera Manager tests for properly configured Kerberos authentication on the source and destination clusters that run the replication. Test Connectivity runs automatically when you add a peer for replication, or you can manually initiate Test Connectivity from the Actions menu.

This feature is available when the source and destination clusters run Cloudera Manager 5.12 or later. You can disable the Kerberos connectivity test by setting feature_flag_test_kerberos_connectivity to false with the Cloudera Manager API: api/<version>/cm/config.

If the test detects any issues with the Kerberos configuration, Cloudera Manager provides resolution steps based on whether Cloudera Manager manages the Kerberos configuration file.

Cloudera Manager tests the following scenarios:
  • Whether both clusters have Kerberos enabled. If one cluster uses Kerberos but the other does not, replication is not supported.
  • Whether both clusters are in the same Kerberos realm. Clusters in the same realm must share the same KDC or the KDCs must be in a unified realm.
  • Whether clusters are in different Kerberos realms. If the clusters are in different realms, cross-realm trust must be configured on the destination cluster according to the following criteria:
    • Destination HDFS services must have the correct trusted realm configuration. Cloudera Manager can check if the trusted realm configuration is correct.
    • The krb5.conf file has the correct domain_realm mapping on all the hosts.
    • The krb5.conf file has the correct realms information on all the hosts.
  • Whether the local and peer KDC are running on an available port. The default port is 88.
After Cloudera Manager runs the tests, Cloudera Manager makes recommendations to resolve any Kerberos configuration issues.

Kerberos Recommendations

If Cloudera Manager manages the Kerberos configuration file, Cloudera Manager configures Kerberos correctly for you and then provides the set of commands that you must manually run to finish configuring the clusters. The following screen shots show the prompts that Cloudera Manager provides in cases of improper configuration:

Configuration changes:

Steps to complete configuration:

If Cloudera Manager does not manage the Kerberos configuration file, Cloudera manager provides the manual steps required to correct the issue. For example, the following screen shot shows the steps required to properly configure Kerberos:

Page generated May 18, 2018.