首页 > 其他 > 详细

切分数据集

时间:2021-03-27 12:21:23      阅读:28      评论:0      收藏:0      [点我收藏+]
faqs = pd.read_csv(‘./data/FAQ.csv‘, sep=‘\t‘).iloc[:, 1:]
faqs

# In[3]
faqs

# In[3]
# 切分数据
faqs_len = len(faqs)
print(‘len(faqs):‘, faqs_len)
X_train, X_dev_test, y_train, y_dev_test =         train_test_split(faqs[‘question‘].to_list(), faqs[‘label‘].to_list(), test_size=0.4, random_state=6, stratify=faqs[‘label‘].to_list())
X_dev, X_test, y_dev, y_test =         train_test_split(X_dev_test, y_dev_test, test_size=0.5, random_state=6, stratify=y_dev_test)
print(‘train: ‘, len(X_train), len(y_train))
print(‘dev: ‘, len(X_dev), len(y_dev))
print(‘test: ‘, len(X_test), len(y_test))

# In[3]
from sklearn.model_selection import train_test_split
# 存放train数据
X_train_DataFrame = pd.DataFrame(X_train, columns=[‘question‘])
y_train_DataFrame = pd.DataFrame(y_train, columns=[‘label‘])
train_all = pd.concat([X_train_DataFrame, y_train_DataFrame], axis=1)
train_all.to_csv(‘./data/train.csv‘, sep=‘\t‘)

# In[4]
# 存放dev数据
X_dev_DataFrame = pd.DataFrame(X_dev, columns=[‘question‘])
y_dev_DataFrame = pd.DataFrame(y_dev, columns=[‘label‘])
dev_all = pd.concat([X_dev_DataFrame, y_dev_DataFrame], axis=1)
dev_all.to_csv(‘./data/dev.csv‘, sep=‘\t‘)

# In[4]
# 存放test数据
X_test_DataFrame = pd.DataFrame(X_test, columns=[‘question‘])
y_test_DataFrame = pd.DataFrame(y_test, columns=[‘label‘])
test_all = pd.concat([X_test_DataFrame, y_test_DataFrame], axis=1)
test_all.to_csv(‘./data/test.csv‘, sep=‘\t‘)

切分数据集

原文:https://www.cnblogs.com/douzujun/p/14585218.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!