Installing DataFu
Warning: DataFu has been in decline for a significant period of time and is now officially
deprecated. Cloudera recommends that you replace the DataFu UDFs with Hive UDFs. Hive UDFs provide most of DataFu's functions and many additional functions. Moreover, Hive UDFs are more stable and
well-supported. In an upcoming release, Apache Pig will support Hive UDFs. For more information about using Hive UDFs in CDH, see Managing UDFs.
DataFu is a collection of Apache Pig UDFs (User-Defined Functions) for statistical evaluation. They were developed by LinkedIn and are now open source under an Apache 2.0 license.
A number of usage examples and other information are available at https://github.com/linkedin/datafu.
To Use DataFu in a Parcel-deployed Cluster
If your cluster uses parcels, DataFu is installed for you. You need to register the JAR file prior to use with the following command.
REGISTER /opt/cloudera/parcels/CDH/lib/pig/datafu.jar
To Use DataFu in a Package-deployed Cluster:
- Install the DataFu package:
Operating system
Install command
Red-Hat-compatible
sudo yum install pig-udf-datafu
SLES
sudo zypper install pig-udf-datafu
Debian or Ubuntu
sudo apt-get install pig-udf-datafu
This puts the DataFu JAR file (for example, datafu-0.0.4-cdh5.0.0.jar) in /usr/lib/pig.
- Register the JAR. Replace the <component_version> string with the current DataFu and CDH version numbers.
REGISTER /usr/lib/pig/datafu-<DataFu_version>-cdh<CDH_version>.jar
For example:
REGISTER /usr/lib/pig/datafu-0.0.4-cdh5.0.0.jar
Page generated May 18, 2018.
<< Using Pig with HBase | ©2016 Cloudera, Inc. All rights reserved | Viewing the Pig Documentation >> |
Terms and Conditions Privacy Policy |