THOR Challenge
Course: Visual Learning and Recognition
GitHub: https://github.com/YuMao1993/DRL
Description:
The goal of the THOR Challenge is to navigate an agent in indoor virtual environment with only visual input. Teamed up with Yu Mao and Jin Zhu, we implemented from scratch six Reinforcement Learning protocols, DQN, double DQN, dueling DQN, Policy Gradient, Actor-Critic, and A3C in TensorFlow. We first deployed these algorithms on OpenAI gym to test their performance. We picked two of them (dueling DQN and A3C) and trained the target-driven network in THOR environment.
Test the protocols on OpenAI gym environment
Navigation in THOR (left: target image, right: navigation)
NIST TRECVid 2017 Video-to-Text Pilot Task
Course: Large-Scale Multimedia Analysis
GitHub: https://github.com/Jim61C/VTT_Show_Atten_And_Tell
Description:
Teamed up with Xiaohan Jin and Yifan Xing, we modified the Show-and-Tell model and built a video captioning system with TensorFlow. To improve the system performance, we experimented with spatial attention model, multi-modal fusion (video and audio), semantic tags (video classification output from DNN), and GAN. We test our system on MSRVTT, MSVD, and TrecVid VTT datasets. Our best performing system achieved state-of-the-art performance on MSVD dataset (METEOR: 0.34, CIDEr: 0.79).
left: a person is folding paper, right (GAN): Minecraft characters are talking to each other
Vision for Sports
Course: MSCV Capstone Project with Disney Research
Description:
Teamed up with Nishant Agrawal, we worked on vision-based basketball game analysis. Specifically, we built a shot detector using Max-Margin Object Detection and RANSAC algorithm (precision: 72.9%, recall: 65.2), a scoring detector based on intensity difference (accuracy: 86%), and an SVM timeout detector to detect these important events in the game. We designed and implemented an DP algorithm to align these detected events to ground truth. This can help us to synchronize video time with game time.
Event alignment results
Instance-Level Image Segmentation
Course: Computer Vision
Description:
The goal of instanced-level image segmentation is to distinguish different objects with the same semantic label, e.g., different chairs in an image.
Teamed up with Tiffany Deng, we developed an instance-level image segmentation algorithm. The algorithm combines semantic-level segmentation, SLIC superpixels, and graphcut. We applied the algorithm on the NYU2 RGBD dataset.
For details, please refer to our poster.
Experimental results of our algorithm.
Low-Rank Matrix Recovery and Application in Computer Vision
Course: Math Fundamentals for Robotics
Description:
Teamed up with Tiffany Deng, we investigated the low-rank matrix recovery and applied algorithms on two computer vision applications.
First, we applied the robust principle component analysis (RPCA) algorithm proposed in this paper to background subtraction.
Second, we implemented the MC-Pos algorithm proposed in this paper and applied it to multi-label image classification on the MSRC dataset. We used dense SIFT features to create a bag-of word dictionary and performed the classification task. We acquired area under curve over 0.91.
3D Facial Model Construction
Course: Network and Multimedia Lab
Description:
Cooperated with two classmates, we developed a cross-platform 3D face modeling mobile app Super Face in Unity3D with 3D software Maya. In the project, I was responsible for 3D face model morphing . I also implemented the image warping algorithm to transform a user photo onto a pre-defined texture map.
For more information, please refer to the slides and demo video.
One of my teammates and his face model.
Projector Array 3D Display
Lab: Multimedia Processing and Communications Lab
Advisor: Prof. Homer H. Chen, National Taiwan University
Description:
Inspired by this project from USC ICT on 3D facial display, my teammate and I implemented a glasses-free 3D display prototype with a projector array and a lenticular sheet. Five pico-projectors were used to project images composed of several images with different view angles. The display provided a narrow viewing band, due to limited number of projectors, with horizontal parallax. Intuitively, it is possible to achieve vertical parallax as well with a two dimensional projector array and a lens array screen.
Images of the display when observed from different view angles.
3D Third-person Tank Shooting Game
Course: Graphics and Interaction
Description:
Teamed up with two classmates, we programmed a third-person shooting Windows APP, Sniper Shoot, in C# using SharpDX library. It accepted either mouse and keyboard control or touch and gyro control. In the project, I was responsible for the 3D models, explosion effect, shading, and computer AI.