首页 > 其他 > 详细

Data Flow ->> Fuzzy Lookup & Fuzzy Grouping

时间:2015-06-13 18:18:19      阅读:201      评论:0      收藏:0      [点我收藏+]

这两个任务的作用是数据清洗(Data Cleansing)。

Fuzzy Lookup通过引用另外一张数据库表或者索引来进行相似值匹配。这种组件对于标准化和查找可能错误的客户端数据非常有用。例如像地址或者像城市名这种属性栏位非常有用。

Fuzzy Lookup不仅会输出它的匹配值,同时还会输出similarity和confidence两个属性列。similarity用一个0到1之间的浮点值来表示匹配对间值得相似度。比如Jerry Chan和Jerry Chen的相似度可能是0.89。而对于Confidence,它的值越高代表它可选的匹配对越少。

Fuzzy Lookup一共有4种选择来配置参考表(Reference Table):

1)Generate New Index:根据参考表的参考栏位在内存中建立一条临时索引用来做数据匹配,任务完成后把它删除;

2)Generate New Index + Store New Index选项:相当于建立一条索引在数据库中;

3)Generate New Index + Store New Index选项 + Maintain Stored Index选项:这种情况下勾了Maintain Stored Index选项将会在reference表建一个触发器来捕捉更新以同步更新到该新建的索引;

4)Use Existing Index

Fuzzy Lookup Transformation: Capable of joining to external data based on data similarity,
the Fuzzy Lookup Transformation is a core data cleansing tool in SSIS. This transformation
is perfect if you have dirty data input that you want to associate to data in a table in your
database based on similar values. Later in the chapter, you’ll take a look at the details of the
Fuzzy Lookup Transformation and what happens behind the scenes

Fuzzy Grouping Transformation: The main purpose is de-duplication of similar data. The
Fuzzy Grouping Transformation is ideal if you have data from a single source and you know
you have duplicates that you need to find.

Data Flow ->> Fuzzy Lookup & Fuzzy Grouping

原文:http://www.cnblogs.com/jenrrychen/p/4573726.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!