What is Apache NiFi? NiFi’s website explains: “an easy-to-use, powerful, reliable data processing and distribution system.” Popular, namely the Apache NiFi is an easy to use, powerful and reliable data processing and distribution system, its designed for the data stream, it supports highly configurable indicator diagram of data routing, transformation and mediation logic system.
To NiFi can describe more clearly, through NiFi architecture to do a brief introduction to below, as shown in the figure below.
According to the official website of the individual components, do the translation:
• WebServer：The goal is to provide an HTTP command and control API.
• Flow Controller：This is the core of the operation, with Processor for processing unit, the expansion of the provided for running threads, extension in receiving the resource scheduling and management.
• Extensions：In other document describes various types of NiFi extension, Extensions is the key to expand in the JVM operation and execution.
• FlowFile Repository：FlowFile library is NiFi track record is active in the flow of a given current state of the stream file, its implementation is pluggable, located on the specified disk partitions the default method is a durable log before writing.
• Content Repository：The Content library’s role is the location of the actual Content bytes of a given stream file, and its implementation is pluggable. The default method is a relatively simple mechanism to store blocks of data in a file system.
• Provenance Repository：The Provenance repository is where all the source data is stored and supports pluggable. The default implementation is to use one or more physical disk volumes, which are indexed and searchable in each location event data.
2 The Introduction of NiFi Processer
Said in the previous section so much, mainly through NiFi NiFi architecture diagram introduced the basic concept, the concept is a Flow Controller is the core of NiFi, then the Flow Controller specific what is it? Flow Controller plays file communication processor roles, maintain a connection and management of multiple processors each Processer, Processer is actual processing unit. So, let us through NiFi UI see NiFi Processor contains?
By above knowable, the Processor contains various types of components, such as amazon, attributes, hadoop, etc., can be easily identified by the prefix, such as Get, Fetch beginning on behalf of the acquisition, such as getFile, getFTP, FetchHDFS, execute on behalf of the execution, such as the ExecuteSQL, ExecuteProcess, ExecuteFlumeSink can be easier to know its easy use.
3 Actual Combat of NiFi Processer
Having said so much, introducing NiFi’s architecture and Processor, what about the actual combat? Then, this article takes the author’s actual demand as an example, carries on the actual combat of the Processor. The requirements are as follows: Select a data processing scheduling tool to implement custom scheduling for server scripts. The script of the server involves scheduling of environment variables, oracle databases, and Hadoop ecosystem components. When the server script scheduler is completed, it returns the script run state and provides the failed re-run interface.
In order to achieve the requirements, ever scheduling scheduling tools, such as Apache Oozie, Azkaban, Pentaho, finally compares the various pros and cons of trying to use Apache NiFi as a try, by looking at NiFi Processor API, can better support Processor for ExecuteProcess remote operation. Below are in demand for actual combat.
3.1 Add and configure the Processor
1. Add and configure the Processor
2. Right-click on ExecuteProcess and select Configure Processor to Configure the Properties TAB. Each of these configuration options provides related instructions, as shown below.
As the figure above shows, there is a need to explain the options.
• Command： sh.
• Command Arguments：-c;ssh user@ip sh js/job/job_hourly.sh `date
• Batch Duration: Don’t set. // we need to schedule regularly, rather than at intervals.
• Redirect Error Stream: Don’t set.
• Argument Delimiter: ; / / to; Split the parameters.
3.2 Processor Dispatch
NiFi support three scheduling policies, including Time Driven (drive), CRON Driven (CRON) and Event Driven (Event Driven, not optional), according to the actual demand we choose CRON Driven, personal understanding of CRON is the application of Crontab, the parameters of the CRON meanings respectively: second, minute,, day, month, week, years, when the need to cooperate with *,? And L perform together (* representative works on the value of the field; ? Representative for the specified field is not specified value; L on behalf of the long plastic). For example, ” 0 0 13 * *?”The representative wants to have a dispatch at 1 PM every day. Therefore, the scheduling configuration of parameters is based on our requirements. This is shown in the figure below.
3.3 Operation State Monitoring
NiFi is available for developer scheduling through Rest apis, where we monitor the running state with the Processor API (state parameter acquisition, Processor startup and stop).
1. Operation status monitoring parameters:
Command is as follows: the curl ‘http://IP/nifi-api/processors/processorsID’, get the following results can be interpreted through json parser, and access to state.
2. Start and stop of Processor:
NiFi’s Processor startup stops with its Put method. The most effective action of Put is to change its operation state. There are three states of the NiFi Process, namely Running, Stopped and Disabled.
Then we’ll start and stop the two command Rest apis that are executed in the script.
• Start the command (using the Rest API’s Put method) :
curl -i -X PUT -H ‘Content-Type:application/json’ -d ‘
• Stop the command (using the Rest API’s Put method) :
curl -i -X PUT -H ‘Content-Type:application/json’ -d ‘
4 Summary and postscript
This article first introduced the Apache NiFi, then took the actual requirements of the author as an example, and explained the actual combat of the core component Processor of NiFi. Because NiFi still belongs to a top-level project Apache launch time is not long, is very powerful, but can access resources are still limited, in this paper, it is more of a throw brick process, its really powerful functions in data processing, welcome each other discussion of interest to you.