STB数据集来源于这篇论文:A hand pose tracking benchmark from stereo matching.
数据集内容:
Our stereo hand pose benchmark contains sequences with 6 different backgrounds and every background has two sequences with counting and random poses. Every sequence has 1500 frames, so there are totally 18000 frames in our benchmark. Stereo and depth images were captured from a Point Grey Bumblebee2 stereo camera and an Intel Real Sense F200 active depth camera simultaneously.
一个双目(左右相机RGB)和一个深度(彩色RGB+depth),一搬论文使用方式:
STB is a real-world dataset containing 18,000 images with the ground truth of 21 3D hand joint locations and corresponding depth images. we split the dataset into 15,000 training samples and 3,000 test samples.
具体怎么分割使用自己衡量即可。
使用此数据集的人很多:
Learning to Estimate 3D Hand Pose from Single RGB Images
3D Hand Shape and Pose Estimation from a Single RGB Image
...还有很多论文使用此数据,最近一直在做2D和3D手势识别的方向,后面会写一篇综述具体来说。
Point Grey Bumblebee2 stereo camera: base line = 120.054 fx = 822.79041 fy = 822.79041 tx = 318.47345 ty = 250.31296
#左右相机仅相差个平移参数baseline,旋转忽略
fx = 822.79041
fy = 822.79041
tx = 318.47345
ty = 250.31296
base = 120.054
#增广矩阵计算方便
R_l = np.asarray([
[1,0,0,0],
[0,1,0,0],
[0,0,1,0]])
R_r = R_l.copy()
R_r[0, 3] = -base #作为平移参数
#内参矩阵
K = np.asarray([
[fx,0,tx],
[0,fy,ty],
[0,0,1]])
#世界坐标系点,4*21矩阵,[x,y,z,1]增广矩阵,计算方便
points = XXX
#平移+内参
left_point = np.dot(np.dot(K , R_l), points)
right_point = np.dot(np.dot(K , R_r), points)
#消除尺度z
image_cood = left_point / left_point[-1, ...]
image_left = (image_cood[:2,...].T).astype(np.uint)
image_cood = right_point / right_point[-1, ...]
image_right = (image_cood[:2,...].T).astype(np.uint)
Intel Real Sense F200 active depth camera: fx color = 607.92271 fy color = 607.88192 tx color = 314.78337 ty color = 236.42484 fx depth = 475.62768 fy depth = 474.77709 tx depth = 336.41179 ty depth = 238.77962 rotation vector = [0.00531 -0.01196 0.00301] (use Rodrigues‘ rotation formula to transform it into rotation matrix) translation vector = [-24.0381 -0.4563 -1.2326] (rotation and translation vector can transform the coordinates relative to color camera to those relative to depth camera)
fx = [822.79041,607.92271,475.62768]
fy = [822.79041,607.88192,474.77709]
tx = [318.47345,314.78337,336.41179]
ty = [250.31296,236.42484,238.77962]
# 0:Point Grey Bumblebee2 stereo camera
# 1:Intel Real Sense F200 active depth camera COLOR
# 2:Intel Real Sense F200 active depth camera DEPTH
index = 1
inter_mat = np.asarray([[fx[index], 0, tx[index], 0],
[0, fy[index], ty[index], 0],
[0, 0, 1, 0]]) #camera intrinsic param
#matrix_R,_ = cv2.Rodrigues((0.00531, -0.01196, 0.00301)) #calculate rotation matrix from rotate vector
#matrix_R_inv = np.linalg.inv(matrix_R)
#matrix_extrinsic是matrix_R_inv和平移矩阵的结合,
#TODO 注意平移矩阵得加负号
matrix_extrinsic = np.asarray([[ 0.99992395, 0.002904166, 0.01195165, +24.0381],
[ -0.00304, 0.99998137, 0.00532784, +0.4563],
[ -0.01196763, -0.00529184, 0.99991438, +1.2326],
[ 0, 0, 0, 1]])
#世界坐标系点,4*21矩阵,[x,y,z,1]增广矩阵,计算方便
points=XXX
#depth 3D
image_depth = np.dot(inter_mat , points)
#color 3D
image_color = np.dot(transfrom_matrix , points)
#3D to 2D
image_cood = image_depth or image_color
image_cood = image_cood / image_cood[-1, ...]
image_cood = (image_cood[:2,...].T).astype(np.uint)
icip17_stereo_hand_pose_dataset
原文:https://www.cnblogs.com/wjy-lulu/p/12857249.html