Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统。支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(比如文本、HDFS、Hbase等)的能力 。
* Source:完成对日志数据的收集,分成transtion 和 event 打入到channel之中。
* Channel:主要提供一个队列的功能,对source提供中的数据进行简单的缓存。
* Sink:取出Channel中的数据,进行相应的存储文件系统,数据库,或者提交到远程服务器。
* ExecSource:以运行Linux命令的方式,持续的输出最新的数据,如tail -F 文件名指令,在这种方式下,取的文件名必须是指定的。
* SpoolSource:是监测配置的目录下新增的文件,并将文件中的数据读取出来。
Channel有多种方式: 有MemoryChannel,JDBC Channel,MemoryRecoverChannel,FileChannel。MemoryChannel可以实现高速的吞吐,但是无法保证数据的完整性。MemoryRecoverChannel在官方文档的建议上已经建义使用FileChannel来替换。FileChannel保证数据的完整性与一致性。在具体配置不现的FileChannel时,建议FileChannel设置的目录和程序日志文件保存的目录设成不同的磁盘,以便提高效率。
flume安装配置比较简单,下载flume1.5.0二进制包 http://www.apache.org/dyn/closer.cgi/flume/1.5.0/apache-flume-1.5.0-bin.tar.gz 解压即可 tar -zvxf apache-flume-1.5.0-bin.tar.gz
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = echo ‘hello‘
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动flume: bin/flume-ng agent --f example.conf --name a1 -Dflume.root.logger=INFO,console
14/06/19 18:16:29 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 14/06/19 18:16:29 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:example.conf 14/06/19 18:16:29 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1 14/06/19 18:16:29 INFO conf.FlumeConfiguration: Processing:k1 14/06/19 18:16:29 INFO conf.FlumeConfiguration: Processing:k1 14/06/19 18:16:29 WARN conf.FlumeConfiguration: Invalid property specified: conf 14/06/19 18:16:29 WARN conf.FlumeConfiguration: Configuration property ignored: mple.conf = A single-node Flume configuration 14/06/19 18:16:29 WARN conf.FlumeConfiguration: Agent configuration for ‘mple‘ does not contain any channels. Marking it as invalid. 14/06/19 18:16:29 WARN conf.FlumeConfiguration: Agent configuration invalid for agent ‘mple‘. It will be removed. 14/06/19 18:16:29 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [a1] 14/06/19 18:16:29 INFO node.AbstractConfigurationProvider: Creating channels 14/06/19 18:16:29 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory 14/06/19 18:16:29 INFO node.AbstractConfigurationProvider: Created channel c1 14/06/19 18:16:29 INFO source.DefaultSourceFactory: Creating instance of source r1, type exec 14/06/19 18:16:29 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: logger 14/06/19 18:16:29 INFO node.AbstractConfigurationProvider: Channel c1 connected to [r1, k1] 14/06/19 18:16:29 INFO node.Application: Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@1730d54 counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} } 14/06/19 18:16:29 INFO node.Application: Starting Channel c1 14/06/19 18:16:29 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean. 14/06/19 18:16:29 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started 14/06/19 18:16:29 INFO node.Application: Starting Sink k1 14/06/19 18:16:29 INFO node.Application: Starting Source r1 14/06/19 18:16:29 INFO source.ExecSource: Exec source starting with command:echo ‘hello‘ 14/06/19 18:16:29 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean. 14/06/19 18:16:29 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started 14/06/19 18:16:29 INFO source.ExecSource: Command [echo ‘hello‘] exited with 0 14/06/19 18:16:29 INFO sink.LoggerSink: Event: { headers:{} body: 27 68 65 6C 6C 6F 27 ‘hello‘ }