CMU-Perceptual-Computing-Lab/caffe_rtpose
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
Repository files navigation
This repository is not maintained anymore and it will eventually be closed. Please, move to OpenPose!
C++ code repo for the ECCV 2016 demo, "Realtime Multiperson Pose Estimation", Zhe Cao, Shih-En Wei, Tomas Simon, Yaser Sheikh. Thanks GinĂ©s Hidalgo MartĂnez for restructuring the code.
The full project repo includes matlab and python version, and training code.
This project is under the terms of the license.
- Required: CUDA & cuDNN installed on your machine.
- If you have installed OpenCV 2.4 in your system, go to step 3. If you are using OpenCV 3, uncomment the line
# OPENCV_VERSION := 3on the fileMakefile.config.Ubuntu14.example(for Ubuntu 14) and/orMakefile.config.Ubuntu16.example(for Ubuntu 15 or 16). In addition, OpenCV 3 does not incorporate theopencv_contribmodule by default. Assuming you have manually installed it and you need to use it, appendopencv_contribat the end of the lineLIBRARIES += opencv_core opencv_highgui opencv_imgprocin theMakefilefile. - Build
caffe&rtpose.bin+ download the required caffe models (script tested on Ubuntu 14.04 & 16.04, it uses all the available cores in your machine):**
chmod u+x install_caffe_and_cpm.sh
./install_caffe_and_cpm.sh
./build/examples/rtpose/rtpose.bin --video video_file.mp4
./build/examples/rtpose/rtpose.bin
--help <--- It displays all the available options.
--video input.mp4 <--- Input video. If omitted, will use webcam.
--camera # <--- Choose webcam number (default: 0).
--image_dir path_to_images/ <--- Run on all jpg, png, or bmp images in path_to_images/. If omitted, will use webcam.
--write_frames path/ <--- Render images with this prefix: path/frame%06d.jpg
--write_json path/ <--- Output JSON file with joints with this prefix: path/frame%06d.json
--no_frame_drops <--- Don't drop frames. Important for making offline results.
--no_display <--- Don't open a display window. Useful if there's no X server.
--num_gpu 4 <--- Parallelize over this number of GPUs. Default is 1.
--num_scales 3 --scale_gap 0.15 <--- Use 3 scales, 1, (1-0.15), (1-0.15*2). Default is one scale=1.
(HD)
--net_resolution 656x368 --resolution 1280x720 (These are the default values.)
(VGA)
--net_resolution 496x368 --resolution 640x480
--logtostderr <--- Log messages to standard error.
Run on a video vid.mp4, render image frames as output/frame%06d.jpg and output JSON files as output/frame%06d.json, using 3 scales (1.00, 0.85, and 0.70), parallelized over 2 GPUs:
./build/examples/rtpose/rtpose.bin --video vid.mp4 --num_gpu 2 --no_frame_drops --write_frames output/ --write_json output/ --num_scales 3 --scale_gap 0.15
Each JSON file has a bodies array of objects, where each object has an array joints containing the joint locations and detection confidence formatted as x1,y1,c1,x2,y2,c2,..., where c is the confidence in [0,1].
{
"version":0.1,
"bodies":[
{"joints":[1114.15,160.396,0.846207,...]},
{"joints":[...]},
]
}
where the joint order of the COCO parts is: (see src/rtpose/modelDescriptorFactory.cpp )
part2name {
{0, "Nose"},
{1, "Neck"},
{2, "RShoulder"},
{3, "RElbow"},
{4, "RWrist"},
{5, "LShoulder"},
{6, "LElbow"},
{7, "LWrist"},
{8, "RHip"},
{9, "RKnee"},
{10, "RAnkle"},
{11, "LHip"},
{12, "LKnee"},
{13, "LAnkle"},
{14, "REye"},
{15, "LEye"},
{16, "REar"},
{17, "LEar"},
{18, "Bkg"},
}
We modified and added several Caffe files in include/caffe and src/caffe. In case you want to use your own Caffe distribution, these are the files we added and modified:
- Added folders in
include/caffeandsrc/caffe:include/caffe/cpmandsrc/caffe/cpm. - Modified files in
include/caffe(search for// CPM extra code:to find the modified code):data_transformer.hpp. - Modified files in
src/caffe(search for// CPM extra code:to find the modified code):data_transformer.cpp,proto/caffe.protoandutil/blocking_queue.cpp. - Replaced files:
README.md. - Added files:
install_caffe_and_cpm.sh,Makefile.config.Ubuntu14.example(extracted fromMakefile.config.example) andMakefile.config.Ubuntu16.example(extracted fromMakefile.config.example). - Other added folders:
model/,examples/rtpose,/include/rtposeand/src/rtpose. - Other modified files:
Makefile. - Optional - deleted Caffe files and folders (only to save space):
Makefile.config.example,data/,examples/(do not deleteexamples/rtpose) andmodels/.
We created a few Caffe layers (located in include/caffe/cpm/layers and src/caffe/cpm/layers):
- ImResizeLayer: Only used for testing (backward pass not implemented). This layer performs 2-D resize over the 4-D data. I.e., given a 4-D input of size (
numxchannelsxheight_inputxwidth_input), the layer returns a 4-D output of size (numxchannelsxheight_outputxwidth_output). It is independently applied to each dimension ofnumandchannels. Its parameters are:factor: Scaling factor with respect to the input width and height.factoris the alternative to the pair of variables [target_spatial_width,target_spatial_height]. Iffactor != 0, the latter are ignored.scale_gapandstart_scale: These parameters are related and used for doing scale search in testing mode. Ifstart_scale = 1(default), the CNN input patch size is the net resolution (set with--net_resolution).scale_gapis used to calculate the scale difference between scales. This parameters are related with the flag--num_scales. For instance, using--start_scale 1 --num_scales 3 --scale_gap 0.1means using 3 scales: 1, 1-0.1, 1-2*0.1, hence the different patch sizes correspond to the net resolution multiplied by these scales values.target_spatial_height: Alternative tofactor. It sets the output height. Ignored iffactor != 0.target_spatial_width: Alternative tofactor. It sets the output width. Ignored iffactor != 0.
- NmsLayer: Only used for testing (backward pass not implemented). This layer performs 3-D Non-Maximum Suppression over the 4-D data. I.e., given a 4-D input of size (
numxchannelsxheightxwidth), it returns a 4-D output of size (numxnum_partsxmax_peaks+1x3). It is independently applied to each dimension ofnum. The seconds dimension corresponds to the number of limbs (num_parts). The third dimension indicates the maximum number of peaks to be analyzed (max_peaks+1). Finally, the last one corresponds to thex,yandscorevalues (3). Its parameters are:max_peaks: The number of peaks to be considered. The lasttotal_peaks-max_peakspeaks are discarded.num_parts: The number of limbs to detect (e.g. 15 for MPI and 18 for COCO).threshold: Any input value smaller than this threshold is set to 0.
Please cite the paper in your publications if it helps your research:
@article{cao2016realtime,
title={Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields},
author={Zhe Cao and Tomas Simon and Shih-En Wei and Yaser Sheikh},
journal={arXiv preprint arXiv:1611.08050},
year={2016}
}
@inproceedings{wei2016cpm,
author = {Shih-En Wei and Varun Ramakrishna and Takeo Kanade and Yaser Sheikh},
booktitle = {CVPR},
title = {Convolutional pose machines},
year = {2016}
}