인공지능로봇연구실

[3d object detection] 모델 선정 시각화

작성자이승현|작성시간25.05.14|조회수55 목록 댓글 0

1. SAFDNet

GitHub - zhanggang001/HEDNet: HEDNet (NeurIPS 2023) & SAFDNet (CVPR 2024 Oral)

HEDNet (NeurIPS 2023) & SAFDNet (CVPR 2024 Oral). Contribute to zhanggang001/HEDNet development by creating an account on GitHub.

github.com

목적: nuScenes 데이터셋으로 학습된 SAFDNet 모델을 그대로 활용해, Waymo 포인트클라우드에 대해 추론 및 시각화를 수행하고 모델의 일반화 성능을 평가

구성파일:safdnet_20e_nuscenes.yaml

CLASS_NAMES: 10개
- - POINT_FEATURE_ENCODING: absolute_coordinates_encoding (x,y,z,intensity + dummy timestamp)
  - DATA_CONFIG: nuScenes dataset 설정
  - MODEL: TransFusion + SparseHEDNet2D + SparseTransFusionHead 등

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
CLASS_NAMES: ['car','truck', 'construction_vehicle', 'bus', 'trailer',
              'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone']
 
DATA_CONFIG:
    _BASE_CONFIG_: cfgs/dataset_configs/nuscenes_dataset.yaml
    POINT_CLOUD_RANGE: [-54.0, -54.0, -5.0, 54.0, 54.0, 3.0]
 
    DATA_AUGMENTOR:
        DISABLE_AUG_LIST: ['placeholder']
        AUG_CONFIG_LIST:
          - NAME: gt_sampling
            DB_INFO_PATH:
              - nuscenes_dbinfos_10sweeps_withvelo.pkl
 
            USE_SHARED_MEMORY: True
            DB_DATA_PATH:
              - nuscenes_10sweeps_withvelo_lidar.npy
 
            PREPARE: {
              filter_by_min_points: [
                  'car:5','truck:5', 'construction_vehicle:5', 'bus:5', 'trailer:5',
                  'barrier:5', 'motorcycle:5', 'bicycle:5', 'pedestrian:5', 'traffic_cone:5'
              ],
            }
 
            SAMPLE_GROUPS: [
              'car:2','truck:3', 'construction_vehicle:7', 'bus:4', 'trailer:6',
              'barrier:2', 'motorcycle:6', 'bicycle:6', 'pedestrian:2', 'traffic_cone:2'
            ]
 
            NUM_POINT_FEATURES: 5
            DATABASE_WITH_FAKELIDAR: False
            REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0]
            LIMIT_WHOLE_SCENE: True
 
          - NAME: random_world_flip
            ALONG_AXIS_LIST: ['x', 'y']
 
          - NAME: random_world_rotation
            WORLD_ROT_ANGLE: [-0.78539816, 0.78539816]
 
          - NAME: random_world_scaling
            WORLD_SCALE_RANGE: [0.9, 1.1]
 
          - NAME: random_world_translation
            NOISE_TRANSLATE_STD: [0.5, 0.5, 0.5]
 
    DATA_PROCESSOR:
      - NAME: mask_points_and_boxes_outside_range
        REMOVE_OUTSIDE_BOXES: True
 
      - NAME: shuffle_points
        SHUFFLE_ENABLED: {
          'train': True,
          'test': True
        }
 
      - NAME: transform_points_to_voxels_placeholder
        VOXEL_SIZE: [0.3, 0.3, 8.0]
 
MODEL:
    NAME: TransFusion
 
    VFE:
        NAME: DynPillarVFE
        WITH_DISTANCE: False
        USE_ABSLOTE_XYZ: True
        USE_NORM: True
        NUM_FILTERS: [128, 128]
 
    BACKBONE_3D:
        NAME: SparseHEDNet2D
 
        SED_FEATURE_DIM: 128
        SED_NUM_LAYERS: 6
        SED_NUM_SBB: [2, 1, 1]
        SED_DOWN_STRIDE: [1, 2, 2]
        SED_DOWN_KERNEL_SIZE: [3, 3, 3]
 
        AFD_FEATURE_DIM: 128
        AFD_NUM_LAYERS: 1
        AFD_NUM_SBB: [4, 4, 4]
        AFD_DOWN_STRIDE: [1, 2, 2]
        AFD_DOWN_KERNEL_SIZE: [3, 3, 3]
 
        AFD: True
        FEATMAP_STRIDE: 2
        DETACH_FEATURE: True
        FG_THRESHOLD: 0.3
        GREOUP_POOLING_KERNEL_SIZE: [9, 15, 7, 7]   # NDS 70.7~71.1
        GROUP_CLASS_NAMES: [
            ['car', 'truck', 'construction_vehicle'],
            ['bus', 'trailer'],
            ['barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'],
        ]
 
    DENSE_HEAD:
        NAME: SparseTransFusionHead
        CLASS_AGNOSTIC: False
        USE_BIAS_BEFORE_NORM: False
        USE_TENSOR_MASK: True
 
        INPUT_FEATURES: 128
        NUM_PROPOSALS: 200  # set it to 300 when performing inference on the test set (inference only)
        HIDDEN_CHANNEL: 128
        NUM_CLASSES: 10
        NUM_HEADS: 8
        NMS_KERNEL_SIZE: 3
        FFN_CHANNEL: 256
        DROPOUT: 0.1
        BN_MOMENTUM: 0.1
        ACTIVATION: relu
 
        NUM_HM_CONV: 2
        SEPARATE_HEAD_CFG:
            HEAD_ORDER: ['center', 'height', 'dim', 'rot', 'vel']
            HEAD_DICT: {
                'center': {'out_channels': 2, 'num_conv': 2},
                'height': {'out_channels': 1, 'num_conv': 2},
                'dim': {'out_channels': 3, 'num_conv': 2},
                'rot': {'out_channels': 2, 'num_conv': 2},
                'vel': {'out_channels': 2, 'num_conv': 2},
            }
 
        TARGET_ASSIGNER_CONFIG:
            DATASET: nuScenes
            FEATURE_MAP_STRIDE: 2
            GAUSSIAN_OVERLAP: 0.1
            MIN_RADIUS: 2
            HUNGARIAN_ASSIGNER:
                cls_cost: {'gamma': 2.0, 'alpha': 0.25, 'weight': 0.15}
                reg_cost: {'weight': 0.25}
                iou_cost: {'weight': 0.25}
 
        LOSS_CONFIG:
            LOSS_WEIGHTS: {
                    'cls_weight': 1.0,
                    'bbox_weight': 0.25,
                    'hm_weight': 1.0,
                    'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2]
                }
            LOSS_CLS:
                use_sigmoid: True
                gamma: 2.0
                alpha: 0.25
            LOSS_IOU: False
            LOSS_IOU_REG: False
 
        POST_PROCESSING:
            SCORE_THRESH: 0.0
            POST_CENTER_RANGE: [-61.2, -61.2, -10.0, 61.2, 61.2, 10.0]
            USE_IOU_TO_RECTIFY_SCORE: False
            IOU_RECTIFIER: [0.5]
 
        NMS_CONFIG:
            NMS_TYPE: nms_gpu
            NMS_THRESH: 0.2
            NMS_PRE_MAXSIZE: 1000
            NMS_POST_MAXSIZE: 100
            SCORE_THRES: 0.
 
    POST_PROCESSING:
        RECALL_THRESH_LIST: [0.3, 0.5, 0.7]
        SCORE_THRESH: 0.1
        OUTPUT_RAW_SCORE: False
        EVAL‎_METRIC: kitti
 
OPTIMIZATION:
    BATCH_SIZE_PER_GPU: 2
    NUM_EPOCHS: 20
 
    OPTIMIZER: adam_onecycle
    LR: 0.003
    WEIGHT_DECAY: 0.05
    MOMENTUM: 0.9
 
    MOMS: [0.95, 0.85]
    PCT_START: 0.4
    DIV_FACTOR: 10
    DECAY_STEP_LIST: [35, 45]
    LR_DECAY: 0.1
    LR_CLIP: 0.0000001
 
    LR_WARMUP: False
    WARMUP_EPOCH: 1
 
    GRAD_NORM_CLIP: 35
    LOSS_SCALE_FP16: 4.0
 
HOOK:
    DisableAugmentationHook:
        DISABLE_AUG_LIST: ['gt_sampling']
        NUM_LAST_EPOCHS: 4  # yield similar results if set it to 5 following mmdetection3d
cs

추론: wsl에서 GUI를 못써서 추론결과를 .npy로 저장하여 window에서 시각화

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
import argparse
import glob
from pathlib import Path
import os
try:
    import open3d
    from visual_utils import open3d_vis_utils as V
    OPEN3D_FLAG = True
except:
    import mayavi.mlab as mlab
    from visual_utils import visualize_utils as V
    OPEN3D_FLAG = False
 
import numpy as np
import torch
 
from pcdet.config import cfg, cfg_from_yaml_file
from pcdet.datasets import DatasetTemplate
from pcdet.models import build_network, load_data_to_gpu
from pcdet.utils import common_utils
 
 
class DemoDataset(DatasetTemplate):
    def __init__(self, dataset_cfg, class_names, training=True, 
root_path=None, logger=None, ext='.bin'):
        """
        Args:
            root_path:
            dataset_cfg:
            class_names:
            training:
            logger:
        """
        super().__init__(
            dataset_cfg=dataset_cfg, class_names=class_names, training=training, 
root_path=root_path, logger=logger
        )
        self.root_path = root_path
        self.ext = ext
        data_file_list = glob.glob(str(root_path / f'*{self.ext}')) if self.root_path.is_dir() else [self.root_path]
 
        data_file_list.sort()
        self.sample_file_list = data_file_list
 
    def __len__(self):
        return len(self.sample_file_list)
 
    def __getitem__(self, index):
        if self.ext == '.bin':
            points = np.fromfile(self.sample_file_list[index], dtype=np.float32).reshape(-1, 4)
        elif self.ext == '.npy':
            points = np.load(self.sample_file_list[index])
        else:
            raise NotImplementedError
 
        input_dict = {
            'points': points,
            'frame_id': index,
        }
 
        data_dict = self.prepare_data(data_dict=input_dict)
        return data_dict
 
 
def parse_config():
    parser = argparse.ArgumentParser(description='arg parser')
    parser.add_argument('--cfg_file', type=str, default='cfgs/kitti_models/second.yaml',
                        help='specify the config for demo')
    parser.add_argument('--data_path', type=str, default='demo_data',
                        help='specify the point cloud data file or directory')
    parser.add_argument('--ckpt', type=str, default=None, help='specify the pretrained model')
    parser.add_argument('--ext', type=str, default='.bin'
, help='specify the extension of your point cloud data file')
 
    args = parser.parse_args()
 
    cfg_from_yaml_file(args.cfg_file, cfg)
 
    return args, cfg
 
 
def main():
    args, cfg = parse_config()
    logger = common_utils.create_logger()
    logger.info('-----------------Quick Demo of OpenPCDet-------------------------')
    demo_dataset = DemoDataset(
        dataset_cfg=cfg.DATA_CONFIG, class_names=cfg.CLASS_NAMES, training=False,
        root_path=Path(args.data_path), ext=args.ext, logger=logger
    )
    logger.info(f'Total number of samples: \t{len(demo_dataset)}')
 
    model = build_network(model_cfg=cfg.MODEL, num_class=len(cfg.CLASS_NAMES)
, dataset=demo_dataset)
    model.load_params_from_file(filename=args.ckpt, logger=logger, to_cpu=True)
    model.cuda()
    model.eval‎()
    os.makedirs('outputs', exist_ok=True)
    with torch.no_grad():
        for idx, data_dict in enumerate(demo_dataset):
            logger.info(f'Visualized sample index: \t{idx + 1}')
            data_dict = demo_dataset.collate_batch([data_dict])
            load_data_to_gpu(data_dict)
            pred_dicts, _ = model.forward(data_dict)
 
            
            pred_labels = pred_dicts[0]['pred_labels'].cpu().numpy()
            pred_labels = np.clip(pred_labels, 0, 9)
 
            # 저장코드 window에서 시각화 하기위해 
            np.save(f'outputs/{idx:04d}_points.npy', data_dict['points'][:, 1:].cpu().numpy())
            np.save(f'outputs/{idx:04d}_pred_boxes.npy', pred_dicts[0]['pred_boxes'].cpu().numpy())
            np.save(f'outputs/{idx:04d}_pred_scores.npy', pred_dicts[0]['pred_scores'].cpu().numpy())
            np.save(f'outputs/{idx:04d}_pred_labels.npy', pred_labels)
 
            print(f"Saved frame {idx:04d} to outputs/")
 
            # 시각화 (window에서 )
            # V.draw_scenes(
            #     points=data_dict['points'][0, :, 1:],  # batch dim 제거
            #     ref_boxes=pred_dicts[0]['pred_boxes'].cpu(),
            #     ref_scores=pred_dicts[0]['pred_scores'].cpu(),
            #     ref_labels=torch.from_numpy(pred_labels)  # 다시 텐서로
            # )
            # if not OPEN3D_FLAG:
            #     mlab.show(stop=True)
 
    
 
    logger.info('Demo done.')
 
 
if __name__ == '__main__':
    main()
Colored by Color Scripter
cs

시각화: SAFD모델

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
import numpy as np
from pathlib import Path
from tools.visual_utils import open3d_vis_utils as V
 
# nuScenes → Waymo 클래스 매핑
nuscenes_to_waymo_map = {
    0: 0,  # car → Vehicle
    1: 0,  # truck → Vehicle
    2: 0,  # bus → Vehicle
    3: 0,  # trailer → Vehicle
    4: 0,  # construction_vehicle → Vehicle
    5: 1,  # pedestrian → Pedestrian
    6: 2,  # motorcycle → Cyclist
    7: 2,  # bicycle → Cyclist
    8: None,  # traffic cone → skip
    9: None,  # barrier → skip
}
frame_id = "0008"
 
# 경로 설정
BASE_DIR = Path("C:/Users/C/outputs")
points_path = BASE_DIR / f"{frame_id}_points.npy"
boxes_path = BASE_DIR / f"{frame_id}_pred_boxes.npy"
scores_path = BASE_DIR / f"{frame_id}_pred_scores.npy"
labels_path = BASE_DIR / f"{frame_id}_pred_labels.npy"
 
# 파일 불러오기
points = np.load(points_path)         # (N, 3) → 이미 nuScenes 좌표계 
pred_boxes = np.load(boxes_path)
pred_scores = np.load(scores_path)
pred_labels = np.load(labels_path)
 
print(f"[INFO] 로드된 points shape: {points.shape}")
 
 
 
# 라벨 매핑
mapped_boxes, mapped_scores, mapped_labels = [], [], []
 
for i, cls in enumerate(pred_labels):
    mapped_cls = nuscenes_to_waymo_map.get(int(cls), None)
    if mapped_cls is not None:
        mapped_boxes.append(pred_boxes[i])
        mapped_scores.append(pred_scores[i])
        mapped_labels.append(mapped_cls)
 
if len(mapped_boxes) > 0:
    mapped_boxes = np.stack(mapped_boxes)
    mapped_scores = np.array(mapped_scores)
    mapped_labels = np.array(mapped_labels)
 
    print(f"[INFO] 시각화 대상 박스 수: {len(mapped_boxes)}")
 
    # 시각화
    V.draw_scenes(
        points=points,                 # nuScenes 좌표계 (N, 3)
        ref_boxes=mapped_boxes,
        ref_scores=mapped_scores,
        ref_labels=mapped_labels
    )
else:
    print("시각화할 박스가 없습니다.")
 
 
Colored by Color Scripter
cs

좌표계 변환전

좌표계 변환1

좌표계 변환2

2. HEDNET

목적: nuScenes 데이터셋으로 학습된 HEDNET 모델을 그대로 활용해, Waymo 포인트클라우드에 대해 추론 및 시각화를 수행하고 모델의 일반화 성능을 평가

구성파일: hednet_20e_nusences.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
CLASS_NAMES: ['car','truck', 'construction_vehicle', 'bus', 'trailer',
              'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone']
 
DATA_CONFIG:
    _BASE_CONFIG_: cfgs/dataset_configs/nuscenes_dataset.yaml
    POINT_CLOUD_RANGE: [-54.0, -54.0, -5.0, 54.0, 54.0, 3.0]
 
    DATA_AUGMENTOR:
        DISABLE_AUG_LIST: ['placeholder']
        AUG_CONFIG_LIST:
          - NAME: gt_sampling
            DB_INFO_PATH:
              - nuscenes_dbinfos_10sweeps_withvelo.pkl
 
            USE_SHARED_MEMORY: True
            DB_DATA_PATH:
              - nuscenes_10sweeps_withvelo_lidar.npy
 
            PREPARE: {
              filter_by_min_points: [
                  'car:5','truck:5', 'construction_vehicle:5', 'bus:5', 'trailer:5',
                  'barrier:5', 'motorcycle:5', 'bicycle:5', 'pedestrian:5', 'traffic_cone:5'
              ],
            }
 
            SAMPLE_GROUPS: [
              'car:2','truck:3', 'construction_vehicle:7', 'bus:4', 'trailer:6',
              'barrier:2', 'motorcycle:6', 'bicycle:6', 'pedestrian:2', 'traffic_cone:2'
            ]
 
            NUM_POINT_FEATURES: 5
            DATABASE_WITH_FAKELIDAR: False
            REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0]
            LIMIT_WHOLE_SCENE: True
 
          - NAME: random_world_flip
            ALONG_AXIS_LIST: ['x', 'y']
 
          - NAME: random_world_rotation
            WORLD_ROT_ANGLE: [-0.78539816, 0.78539816]
 
          - NAME: random_world_scaling
            WORLD_SCALE_RANGE: [0.9, 1.1]
 
          - NAME: random_world_translation
            NOISE_TRANSLATE_STD: [0.5, 0.5, 0.5]
 
    DATA_PROCESSOR:
      - NAME: mask_points_and_boxes_outside_range
        REMOVE_OUTSIDE_BOXES: True
 
      - NAME: shuffle_points
        SHUFFLE_ENABLED: {
          'train': True,
          'test': True
        }
 
      - NAME: transform_points_to_voxels_placeholder
        VOXEL_SIZE: [0.3, 0.3, 8.0]
 
MODEL:
    NAME: TransFusion
 
    VFE:
        NAME: DynPillarVFE
        WITH_DISTANCE: False
        USE_ABSLOTE_XYZ: True
        USE_NORM: True
        NUM_FILTERS: [128, 128]
 
    BACKBONE_3D:
        NAME: HEDNet2D
 
        SED_FEATURE_DIM: 128
        SED_NUM_LAYERS: 6
        SED_NUM_SBB: [2, 1, 1]
        SED_DOWN_STRIDE: [1, 2, 2]
        SED_DOWN_KERNEL_SIZE: [3, 3, 3]
 
        DED_FEATURE_DIM: 128
        DED_NUM_LAYERS: 2
        DED_NUM_SBB: [2, 1, 1]
        DED_DOWN_STRIDE: [1, 2, 2]
 
    DENSE_HEAD:
        NAME: TransFusionHead
        CLASS_AGNOSTIC: False
        USE_BIAS_BEFORE_NORM: False
 
        INPUT_FEATURES: 128
        NUM_PROPOSALS: 200  # set it to 300 when performing inference on the test set (inference only)
        HIDDEN_CHANNEL: 128
        NUM_CLASSES: 10
        NUM_HEADS: 8
        NMS_KERNEL_SIZE: 3
        FFN_CHANNEL: 256
        DROPOUT: 0.1
        BN_MOMENTUM: 0.1
        ACTIVATION: relu
 
        NUM_HM_CONV: 2
        SEPARATE_HEAD_CFG:
            HEAD_ORDER: ['center', 'height', 'dim', 'rot', 'vel', 'iou']
            HEAD_DICT: {
                'center': {'out_channels': 2, 'num_conv': 2},
                'height': {'out_channels': 1, 'num_conv': 2},
                'dim': {'out_channels': 3, 'num_conv': 2},
                'rot': {'out_channels': 2, 'num_conv': 2},
                'vel': {'out_channels': 2, 'num_conv': 2},
                'iou': {'out_channels': 1, 'num_conv': 2},
            }
 
        TARGET_ASSIGNER_CONFIG:
            DATASET: nuScenes
            FEATURE_MAP_STRIDE: 2
            GAUSSIAN_OVERLAP: 0.1
            MIN_RADIUS: 2
            HUNGARIAN_ASSIGNER:
                cls_cost: {'gamma': 2.0, 'alpha': 0.25, 'weight': 0.15}
                reg_cost: {'weight': 0.25}
                iou_cost: {'weight': 0.25}
 
        LOSS_CONFIG:
            LOSS_WEIGHTS: {
                    'cls_weight': 1.0,
                    'bbox_weight': 0.25,
                    'hm_weight': 1.0,
                    'iou_weight': 0.5,
                    'iou_reg_weight': 0.5,
                    'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2]
                }
            LOSS_CLS:
                use_sigmoid: True
                gamma: 2.0
                alpha: 0.25
            LOSS_IOU: True
            LOSS_IOU_REG: True
 
        POST_PROCESSING:
            SCORE_THRESH: 0.0
            POST_CENTER_RANGE: [-61.2, -61.2, -10.0, 61.2, 61.2, 10.0]
            USE_IOU_TO_RECTIFY_SCORE: True
            IOU_RECTIFIER: [0.5]
 
        NMS_CONFIG:
            NMS_TYPE: nms_gpu
            NMS_THRESH: 0.2
            NMS_PRE_MAXSIZE: 1000
            NMS_POST_MAXSIZE: 100
            SCORE_THRES: 0.
 
    POST_PROCESSING:
        RECALL_THRESH_LIST: [0.3, 0.5, 0.7]
        SCORE_THRESH: 0.1
        OUTPUT_RAW_SCORE: False
        EVAL‎_METRIC: kitti
 
OPTIMIZATION:
    BATCH_SIZE_PER_GPU: 2
    NUM_EPOCHS: 20
 
    OPTIMIZER: adam_onecycle
    LR: 0.003
    WEIGHT_DECAY: 0.05
    MOMENTUM: 0.9
 
    MOMS: [0.95, 0.85]
    PCT_START: 0.4
    DIV_FACTOR: 10
    DECAY_STEP_LIST: [35, 45]
    LR_DECAY: 0.1
    LR_CLIP: 0.0000001
 
    LR_WARMUP: False
    WARMUP_EPOCH: 1
 
    GRAD_NORM_CLIP: 35
    LOSS_SCALE_FP16: 4.0
 
HOOK:
    DisableAugmentationHook:
        DISABLE_AUG_LIST: ['gt_sampling']
        NUM_LAST_EPOCHS: 4  # yield similar results if set it to 5 following mmdetection3d
cs

시각화 결과

hednet1

hednet2

3. Cross‐Dataset 결과

문제	원인 분석
일부 박스 위치 불일치	nuScenes 학습 시 사용된 sensor 위치·자세 정보 미반영
포인트 피처 차원 불일치	timestamp (추가 채널) → dummy데이터 0으로 처리
클래스 분류 성능 저하	학습된 클래스 분포와 Waymo 분포 차이

1)라이다 스캔 ,해상도 차이

nuScenes: 32-beam, 360° , 1Hz -> 포인트수 적음
Waymo: 64-beam, 최대 10Hz, 더 치밀한 스캔
→ 포인트 분포(density)·높낮이 분포(elevation)가 완전히 달라서, 같은 VFE(DynPillarVFE) 입장에선 “nuScenes 스타일의 포인트”만 보도록 학습돼 있습니다

2)좌표계·전처리 차이

nuScenes 모델의 cfgs/nuscenes_dataset.yaml 은 (x: 오른쪽, y: 앞, z: 위) 기준으로 학습
Waymo는 (x: 앞, y: 왼쪽, z: 위)

정확한 캘리 브레이션 정보없이 좌표계를 맞춰서 성능 저하가 발생한것 으로 추정

domain gap이 심함

다음검색

북마크

신고 센터로 신고

댓글 0
댓글쓰기
답글쓰기

댓글 리스트

CAFE

과제게시판

[3d object detection] 모델 선정 시각화

댓글

카페 검색