Cloudera Enterprise 5.15.x | Other versions

Adding and Configuring an NFS Gateway

The NFSv3 gateway allows a client to mount HDFS as part of the client's local file system. The gateway machine can be any host in the cluster, including the NameNode, a DataNode, or any HDFS client. The client can be any NFSv3-client-compatible machine.
  Important:

HDFS does not currently provide ACL support for an NFS gateway.

After mounting HDFS to his or her local filesystem, a user can:
  • Browse the HDFS file system as though it were part of the local file system
  • Upload and download files from the HDFS file system to and from the local file system.
  • Stream data directly to HDFS through the mount point.

File append is supported, but random write is not.

Continue reading:

Adding and Configuring an NFS Gateway Using Cloudera Manager

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

The NFS Gateway role implements an NFSv3 gateway. It is an optional role for a CDH 5 HDFS service.

Requirements and Limitations

  • The NFS gateway works only with the following operating systems and Cloudera Manager and CDH versions:
    • With Cloudera Manager 5.0.1 or higher and CDH 5.0.1 or higher, the NFS gateway works on all operating systems supported by Cloudera Manager.
    • With Cloudera Manager 5.0.0 or CDH 5.0.0, the NFS gateway only works on RHEL and similar systems.
    • The NFS gateway is not supported on versions lower than Cloudera Manager 5.0.0 and CDH 5.0.0.
  • The nfs-utils OS package is required for a client to mount the NFS export and to run commands such as showmount from the NFS Gateway.
  • If any NFS server is already running on the NFS Gateway host, it must be stopped before the NFS Gateway role is started.
  • There are two configuration options related to NFS Gateway role: Temporary Dump Directory and Allowed Hosts and Privileges. The Temporary Dump Directory is automatically created by the NFS Gateway role and should be configured before starting the role.
  • The Access Time Precision property in the HDFS service must be enabled.

Adding and Configuring the NFS Gateway Role

  1. Go to the HDFS service.
  2. Click the Instances tab.
  3. Click Add Role Instances.
  4. Click the text box below the NFS Gateway field. The Select Hosts dialog box displays.
  5. Select the host on which to run the role and click OK.
  6. Click Continue.
  7. Click the NFS Gateway role.
  8. Click the Configuration tab.
  9. Select Scope > NFS Gateway.
  10. Select Category > Main.
  11. Ensure that the requirements on the directory set in the Temporary Dump Directory property are met.

    To apply this configuration property to other role groups as needed, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

  12. Optionally edit Allowed Hosts and Privileges.

    To apply this configuration property to other role groups as needed, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

  13. Click Save Changes to commit the changes.
  14. Click the Instances tab.
  15. Check the checkbox next to the NFS Gateway role and select Actions for Selected > Start.

Configuring an NFSv3 Gateway Using the Command Line

  Important:
  • Follow these command-line instructions on systems that do not use Cloudera Manager.
  • This information applies specifically to CDH 5.15.0. See Cloudera Documentation for information specific to other releases.

The subsections that follow provide information on installing and configuring the gateway.

  Note: Install Cloudera Repository
Before using the instructions on this page to install or upgrade:
  • Install the Cloudera yum, zypper/YaST or apt repository.
  • Install or upgrade CDH 5 and make sure it is functioning correctly.
For instructions, see Installing the Latest CDH 5 Release and Upgrading Unmanaged CDH Using the Command Line.

Upgrading from a CDH 5 Beta Release

If you are upgrading from a CDH 5 Beta release, you must first remove the hadoop-hdfs-portmap package. Proceed as follows.

  1. Unmount existing HDFS gateway mounts. For example, on each client, assuming the file system is mounted on /hdfs_nfs_mount:
    $ umount /hdfs_nfs_mount
  2. Stop the services:
    $ sudo service hadoop-hdfs-nfs3 stop
    $ sudo hadoop-hdfs-portmap stop
  3. Remove the hadoop-hdfs-portmap package.
    • On a RHEL-compatible system:
      $ sudo yum remove hadoop-hdfs-portmap
    • On a SLES system:
      $ sudo zypper remove hadoop-hdfs-portmap
    • On an Ubuntu or Debian system:
      $ sudo apt-get remove hadoop-hdfs-portmap
  4. Install the new version
    • On a RHEL-compatible system:
      $ sudo yum install hadoop-hdfs-nfs3
    • On a SLES system:
      $ sudo zypper install hadoop-hdfs-nfs3
    • On an Ubuntu or Debian system:
      $ sudo apt-get install hadoop-hdfs-nfs3
  5. Start the system default portmapper service:
    $ sudo service portmap start
  6. Now proceed with Starting the NFSv3 Gateway, and then remount the HDFS gateway mounts.

Installing the Packages for the First Time

On RHEL and similar systems:

Install the following packages on the cluster host you choose for NFSv3 Gateway machine (we'll refer to it as the NFS server from here on).
  • nfs-utils
  • nfs-utils-lib
  • hadoop-hdfs-nfs3
The first two items are standard NFS utilities; the last is a CDH package.

Use the following command:

$ sudo yum install nfs-utils nfs-utils-lib hadoop-hdfs-nfs3

On SLES:

Install nfs-utils on the cluster host you choose for NFSv3 Gateway machine (referred to as the NFS server from here on):
$ sudo zypper install nfs-utils

On an Ubuntu or Debian system:

Install nfs-common on the cluster host you choose for NFSv3 Gateway machine (referred to as the NFS server from here on):
$ sudo apt-get install nfs-common

Configuring the NFSv3 Gateway

Proceed as follows to configure the gateway.
  1. Add the following property to hdfs-site.xml on the NameNode:
    <property>
        <name>dfs.namenode.accesstime.precision</name>
        <value>3600000</value>
        <description>The access time for an HDFS file is precise up to this value. The default value is 1 hour.
        Setting a value of 0 disables access times for HDFS.</description>
    </property>
    
  2. Add the following property to hdfs-site.xml on the NFS server:
    <property>
      <name>dfs.nfs3.dump.dir</name>
      <value>/tmp/.hdfs-nfs</value>
    </property>
      Note:

    You should change the location of the file dump directory, which temporarily saves out-of-order writes before writing them to HDFS. This directory is needed because the NFS client often reorders writes, and so sequential writes can arrive at the NFS gateway in random order and need to be saved until they can be ordered correctly. After these out-of-order writes have exceeded 1MB in memory for any given file, they are dumped to the dfs.nfs3.dump.dir (the memory threshold is not currently configurable).

    Make sure the directory you choose has enough space. For example, if an application uploads 10 files of 100MB each, dfs.nfs3.dump.dir should have roughly 1GB of free space to allow for a worst-case reordering of writes to every file.

  3. Configure the user running the gateway (normally the hdfs user as in this example) to be a proxy for other users. To allow the hdfs user to be a proxy for all other users, add the following entries to core-site.xml on the NameNode:
    <property>
       <name>hadoop.proxyuser.hdfs.groups</name>
       <value>*</value>
       <description>
         Set this to '*' to allow the gateway user to proxy any group.
       </description>
    </property>
    <property>
        <name>hadoop.proxyuser.hdfs.hosts</name>
        <value>*</value>
        <description>
         Set this to '*' to allow requests from any hosts to be proxied.
        </description>
    </property>
  4. Restart the NameNode.

Starting the NFSv3 Gateway

Do the following on the NFS server.

  1. First, stop the default NFS services, if they are running:
    $ sudo service nfs stop
  2. Start the HDFS-specific services:
    $ sudo service hadoop-hdfs-nfs3 start

Verifying that the NFSv3 Gateway is Working

To verify that the NFS services are running properly, you can use the rpcinfo command on any host on the local network:
$ rpcinfo -p <nfs_server_ip_address>
You should see output such as the following:
program    vers    proto   port

100005     1       tcp     4242  mountd
100005     2       udp     4242  mountd
100005     2       tcp     4242  mountd
100000     2       tcp     111   portmapper
100000     2       udp     111   portmapper
100005     3       udp     4242  mountd
100005     1       udp     4242  mountd
100003     3       tcp     2049  nfs
100005     3       tcp     4242  mountd
To verify that the HDFS namespace is exported and can be mounted, use the showmount command.
$ showmount -e <nfs_server_ip_address>
You should see output similar to the following:
Exports list on <nfs_server_ip_address>:
/ (everyone)

Mounting HDFS on an NFS Client

To import the HDFS file system on an NFS client, use a mount command such as the following on the client:
$ mount -t  nfs  -o vers=3,proto=tcp,nolock <nfs_server_hostname>:/ /hdfs_nfs_mount
  Note:

When you create a file or directory as user hdfs on the client (that is, in the HDFS file system imported using the NFS mount), the ownership may differ from what it would be if you had created it in HDFS directly. For example, ownership of a file created on the client might be hdfs:hdfs when the same operation done natively in HDFS resulted in hdfs:supergroup. This is because in native HDFS, BSD semantics determine the group ownership of a newly-created file: it is set to the same group as the parent directory where the file is created. When the operation is done over NFS, the typical Linux semantics create the file with the group of the effective GID (group ID) of the process creating the file, and this characteristic is explicitly passed to the NFS gateway and HDFS.

Page generated May 18, 2018.