研究目的
Revisiting the classical problem of building geometric interpretations of images using supervised deep learning tools to detect primitives in a layered manner.
研究成果
The proposed layered detection model outperforms traditional and other learning-based methods in accuracy and reconstruction for primitive detection in images. It enables applications in image editing and recognition-by-components, but future work should address limitations with overlapping primitives and incorporate real-world data for better generalization.
研究不足
The model cannot distinguish highly-overlapping primitives within the same layer, may produce duplicate detections for intersecting primitives, and is trained only on synthetic data, limiting generalizability to real-world images without annotated datasets.
1:Experimental Design and Method Selection:
The methodology involves modifying the YOLOv2 network for parameter regression, incorporating an RNN for variable parameter prediction, and using a layered detection model inspired by hierarchical image analysis. A novel circular loss function is introduced for RNN training to handle variable control points in splines.
2:Sample Selection and Data Sources:
Synthetic datasets of 150,000 images (416x416 pixels) are generated with primitives like rectangles, triangles, ellipses, and closed spline curves, including variations with noise, textures, and affine transformations to test robustness.
3:List of Experimental Equipment and Materials:
Computational resources for deep learning training and testing, including GPUs for running PyTorch-based implementations. No specific hardware is detailed beyond software tools.
4:Experimental Procedures and Operational Workflow:
The backbone network (Darknet-19) processes images through convolutional layers and detection blocks. RoI pooling is used for layered detection, cropping feature maps based on bounding box predictions. Training is end-to-end with scheduled sampling for stability, and RNN decoder is pre-trained separately.
5:Data Analysis Methods:
Performance is evaluated using precision, recall, mean average precision (mAP), and reconstruction loss (pixel-wise RMSE). Comparisons are made with traditional methods (contour detection, Hough transform) and learning-based baselines (CSGNet, flat model, recursive model).
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容