Skip to main content

NiFi

Certified

This plugin extracts the following:

  • NiFi flow as DataFlow entity
  • Ingress, egress processors, remote input and output ports as DataJob entity
  • Input and output ports receiving remote connections as Dataset entity
  • Lineage information between external datasets and ingress/egress processors by analyzing provenance events

Current limitations:

  • Limited ingress/egress processors are supported
    • S3: ListS3, FetchS3Object, PutS3Object
    • SFTP: ListSFTP, FetchSFTP, GetSFTP, PutSFTP

CLI based Ingestion

Install the Plugin

pip install 'acryl-datahub[nifi]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: "nifi"
config:
# Coordinates
site_url: "https://localhost:8443/nifi/"

# Credentials
auth: SINGLE_USER
username: admin
password: password

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
site_url 
string
URI to connect
auth
Enum
Nifi authentication. must be one of : NO_AUTH, SINGLE_USER, CLIENT_CERT
Default: NO_AUTH
ca_file
string
Path to PEM file containing certs for the root CA(s) for the NiFi
client_cert_file
string
Path to PEM file containing the public certificates for the user/client identity, must be set for auth = "CLIENT_CERT"
client_key_file
string
Path to PEM file containing the client’s secret key
client_key_password
string
The password to decrypt the client_key_file
password
string
Nifi password, must be set for auth = "SINGLE_USER"
provenance_days
integer
time window to analyze provenance events for external datasets
Default: 7
site_name
string
Site name to identify this site with, useful when using input and output ports receiving remote connections
Default: default
site_url_to_site_name
map(str,string)
username
string
Nifi username, must be set for auth = "SINGLE_USER"
env
string
The environment that all assets produced by this connector belong to
Default: PROD
process_group_pattern
AllowDenyPattern
regex patterns for filtering process groups
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
process_group_pattern.allow
array(string)
process_group_pattern.deny
array(string)
process_group_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True

Code Coordinates

  • Class Name: datahub.ingestion.source.nifi.NifiSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for NiFi, feel free to ping us on our Slack.