C4.5分类决策树算法,其核心算法是ID3算法。目前应用在临床决策、生产制造、文档分析、生物信息学、空间数据建模等领域。算法的输入是带类标的数据,输出是树形的决策规则。
C4.5比ID3的改进:
1)用信息增益率来选择属性,克服了用信息增益选择属性时偏向选择取值多的属性的不足;
2)在树构造过程中进行剪枝;
3)能够完成对连续属性的离散化处理;
4)能够对不完整数据进行处理。
C4.5算法优点:产生的分类规则易于理解,准确率较高。
C4.5算法缺点:在构造树的过程中,需要对数据集进行多次的顺序扫描和排序,因而导致算法的低效。
算法过程:
用SQL算法实现c4.5算法的核心代码:
drop procedure if exists buildtree;
DELIMITER |
create procedure buildtree()
begin
declare le int default 1;
declare letemp int default 1;
declare current_num int default 1;
declare current_class varchar(20);
declare current_gain double;
declare current_table varchar(20) default 'weather';
update infoset set state=1,statetemp=1;
rr:while (1=1) do
set @weather = (select play from weather where class is null limit 0,1);
set @feature =(select x from infoset where statetemp=1 limit 0,1);
if (@weather is null ) then
leave rr;
else if(@feature is null) then
update infoset set statetemp = state;
end if;
end if;
if (@weather is not null) then
b:begin
set current_gain = (select max(info_Gain) from infoset where statetemp=1);
set current_class = (select x from infoset where info_Gain = current_gain);
drop table if exists aa;
set @a=concat('create temporary table aa select distinct ',current_class,' as namee from weather where class is null');
prepare stmt1 from @a;
execute stmt1;
tt:while (1=1) do
set @x = (select namee from aa limit 0,1);
if (@x is not null) then
a0:begin
drop table if exists bb;
set @b=concat('create temporary table bb select * from ', current_table,' where ',current_class,' = ? and class is null');
prepare stmt2 from @b;
execute stmt2 using @x;
set @count = (select count(distinct play) from bb);
if (@count =1) then
a1:begin
update weather set class = current_num,levelnum = letemp where id in (select id from bb);
set current_num = current_num+1;
if (current_table ='cc') then
delete from cc where id in (select id from bb);
end if;
set @f=(select play from cc limit 0,1);
if (@f is null) then
set current_table='weather';
update infoset set statetemp=state;
set letemp =le;
end if;
delete from aa where namee = @x;
end a1;
end if;
if (@count>1) then
drop table if exists cc;
create temporary table cc select * from bb;
set current_table = 'cc';
set letemp = letemp+1;
leave tt;
end if;
if(@count=0) then
delete from aa where namee = @x;
set le = le+1;
end if;
end a0;
else
update infoset set state=0 where x=current_class;
leave tt;
end if;
end while;
update infoset set statetemp=0 where x=current_class;
end b;
end if;
end while;
end |
delimiter ;
运行后分类结果如下图所示:
其中,class属性记录分类结果,levelnum属性记录各个数据在决策树中的层数。
对程序中各个表的解释:
原文:http://blog.csdn.net/iemyxie/article/details/39519767