SCARF: Stereo Cross-Attention Radiance Fields

This was part of a final project for the CMU 16-824 Visual Learning and Recognition class, in collaboration with Alex Strasser. We explored adding attention-based techniques to Stereo Radiance Fields, a neural-network-based technique to render a 3D scene from a set of multi-view 2D images.

Our best results (downsampled by 4) are shown in the gallery below. For more results and details, please refer to our web report linked below.

We made a web report which details our methodology, and showcases our experimental results of our transformer-based algorithm. The links are found below.

Web Report GitHub Repo

Implementation of RadarSLAM in Python

GitHub Repo

Overview

This was part of a final project for the CMU 16-833 Localization and Mapping class, in collaboration with Kevin Liu, Brian Zhang and Alex Chen. In this project, we present the first open-source implementation of the RadarSLAM [1] [2] algorithm in Python (odometry component only), and tested it on the Oxford Radar RobotCar Dataset. Our reimplementation is actually a combination of techniques used in the 2020 and 2021 papers, and we also attempted various improvements to the paper's original implementation.

Algorithm and Results

The RadarSLAM algorithm is a feature-based algorithm, that extracts features from the 2D Cartesian radar image using a simple Hessian blob detector with adaptive NMS and a graph-based outlier rejection algorithm. It then performs the Kanade-Lucas-Tomansi (KLT) tracking algorithm on these features to obtain odometry measurements. These are put into a keyframe-based pose graph which is then bundle-adjusted with motion compensation.

Diagrams and results are presented in the gallery below.

Evaluating multi-view human pose estimation algorithm on CMU Panoptic Studio and other datasets

GitHub Repo

Overview

From Spring 2020 to mid-Summer 2020, I undertook this small research project under Carnegie Mellon University's Human And Robot Partners (HARP) Lab. Here, I briefly evaluated various state-of-the-art methods for multi-view 3D human pose estimation, and sought to adapt the most suitable one for use on a dataset which the lab had collected prior. The intended end-goal of the project was to automatically and accurately generate ground-truth data for this dataset, for use as part of a larger project by the HARP Lab.

After initial literature review, I found the learnable triangulation algorithm by Iskakov et. al. [1] to be the most suitable, as it was the most accurate algorithm at the time (April 2019) based on Average MPJPE. It uses an interesting and novel multi-stage approach which involves unprojecting heatmaps into 3D in order to increase accuracy.

Check it out on GitHub!

Testing on CMU dataset (using network pretrained on H36M)

As the writers of the paper only provided the code for testing and training on the Human 3.6M dataset (H36M) (and not the CMU Panoptic Studio dataset), the main bulk of the project involved porting over the code for evaluation and training on the CMU Panoptic Studio dataset. This was no easy task as it not only required an understanding of the code base, but also the algorithm itself. In fact, multiple members of the computer vision community had requested the authors of the paper to provide code which would allow for testing and evaluation of the CMU dataset, but there was no code provided.

Eventually, I was able to successfully test Iskakov et. al.'s [1] volumetric triangulation algorithm on the CMU Panoptic Studio dataset, using pretrained weights trained on the Human3.6M dataset. Evaluation results are shown on my Github repository here, with some images provided below.

It is notable that due to the limitations of the Human 3.6M dataset (which does not have views where part of the body is occluded), the network pretained on H36M is not robust to partial-view occlusions. I hypothesised that if I were to train the network on the CMU dataset and purposely include partially-occluded views, the algorithm would be robust against occlusions in some views. This would be the next stage of the project: training of the CMU dataset.

Training on the CMU dataset

Currently training...

References and Acknowledgements

[1] K. Iskakov, E. Burkov, V. Lempitsky, and Y. Malkov, “Learnable triangulation of human pose,” in International Conference on Computer Vision (ICCV), 2019

This was a semester-long research project done under the mentorship of Mr Abhijat Biswas, and faculty direction of Prof Henny Admoni.

Optically illuminated directional sensing for guidance and alignment systems

Abstract

Optical guidance and alignment systems offer commercial applications in smart-home, elderly-aid and prosthetic systems. Conventional solutions that employ digital image processing for guidance and alignment are resource inefficient and thus costly to implement.

This project explored the feasibility of using a common laser pointer for low-power, cost effective optical guidance. In this vein, a compact electronic sensing system, capable of navigating a small land robot using a beam of laser light as a proof of concept, was developed. Primary emphasis was placed on the characterization, design and implementation of the electrical circuitry to facilitate accurate and reliable detection and processing of an optical signal from a laser point on the wall. The system comprised a laser pointer modulated at 511Hz using a discrete 555-timer chip; four low-cost photodiode sensors to detect the modulated light signature; and a comprehensive quadrature lock-in amplifier circuit to filter out ambient noise and undesired light interference. The demodulated analog signal was then digitized and sent to a microcontroller, where a self-developed algorithm was used to actuate the robot in the direction of the reflected light.

The hardware prototype, containing the sensors, circuits, motors and a battery pack, was integrated onto a compact 500g, 15x15x20cm omnidirectional robotic platform for maneuverability. The final system implementation responded successfully to the direction of reflected light in our tests, with acceptable sensitivity and robust noise rejection. Our innovative system could potentially be adapted, with some enhancements, for guiding and directing smart-home robots, prosthetic devices, motorized wheelchairs or vehicle parking.

This project is done in collaboration with DSO National Laboratories, under the kind mentorship of Dr Wee Keng Hoong and Mr Lee Jin Yu.

Awards

This project won the Gold award at the Singapore Science and Engineering Fair.

It was also selected to represent Singapore at the 2016 Intel International Science and Engineering Fair (ISEF), held in Arizona from 8 May 16 to 13 May 16. Photos of my ISEF experience there can be found below.

Video

The video above shows the main features of the electrical circuit as demonstrated on a simple land robot. The robot is able to follow the pulsing (red) laser pointer with accuracy and speed. Unlike digital-based systems, it is also able to distinguish between the user's laser and other laser interference.

In case you were wondering why it's called Yoda's Lightchaser, it's because, first of all, it's (mostly) green like Yoda and his lightsaber. Secondly, one of the possible applications of the circuit is for elderly-friendly robots. Lastly, controlling the robot seems like one is using "The Force", especially if the circuit is adapted for use with an infrared laser.

ISEF Experience

Below are: the poster used for the ISEF 2016 competition; the link to my reflections on the experience; and a photo gallery!

Link to Reflections Photo Gallery below

Analysis of multimodal interactions for simultaneous spatial and cognitive tasks

Abstract

With the advent and increasing popularity of new methods of Human Computer Interaction (HCI) such as eye-tracking, speech and gesture interaction modes, comes the question of the benefits these input modes bring. This project aims to study how multimodal interaction methods can reduce the overall workload and improve performance of the user whilst interacting with complex systems.

The objective of the project was to compare several combinations of input methods and their effect on the overall workload and performance of the user whilst simultaneously performing 2 spatial tasks and 1 cognitive task.

To achieve this, a simulation was developed as the experimental platform. The simulation recreates a combat situation that would require users to defend themselves and attack hostile targets, both of which required spatial localization. Changing of ammunition types, a cognitive task, was also added to increase the difficulty of multi-tasking.

An experiment involving 12 participants across different age groups was conducted with 3 distinct interaction modes, assigned in random order. The first interaction mode is operated via mouse and keyboard; the second employs touch-screen, keyboard, and speech input; and the final interaction mode involves eye-tracking, keyboard, and speech input.

This project is a component of the Research@YDSP module and is a collaborative effort between a team of Hwa Chong Institution students (including myself) and DSO National Laboratories.

Awards

We were selected to present our project to the then Minister of State for Defence, Mr Maliki Osman, who thoroughly enjoyed the live demonstration!

The preliminary project also obtained High Distinction at the Hwa Chong Projects' Competition

Experiment Video

As seen in the video, an experiment was conducted involving 12 participants across different age groups.

There were 3 distinct interaction modes, assigned in random order:

Mouse and Keyboard
Touch-screen, Keyboard, and Speech input
Eye-tracking, Keyboard, and Speech input

Experiment Game

As part of the experiment, the participants were required to play a Flash game as shown below.

The game involved heavy multitasking, with several tasks the user needed to do at once:

Aiming the turret
Shooting
Changing ammo
Defending with the shield

Preliminary Project

This project was actually an extension of a preliminary project, done also in collaboration with DSO National Laboratories. This preliminary project, "I-Focus", was awarded a High Distinction at the Hwa Chong Projects Competition 2014.

We conducted this project to explore the effectiveness, efficiency and intuitiveness of eye-tracking as a replacement for the mouse. To do this, we created a website specifically suited for eye-tracking (of which the template that I coded has been adapted for my personal website). I also made a downloadable Chrome Extension which turns any website into an eye-tracking-friendly one!

We also created a Flash game (programmed by myself) to compare the efficiency of eye-tracking with that of other interaction methods like the mouse. It can be played below on a browser which supports flash:

The Experience

Below is a poster highlighting our key observations, which we used to present to the then Minister of State for Defence, Mr Maliki Osman, at the 2015 YDSP Congress.

There is also a photo gallery below highlighting some key moments at the YDSP Congress event.

Link to Reflections Photo Gallery below Link to Website

Projects

SCARF: Stereo Cross-Attention Radiance Fields

Implementation of RadarSLAM in Python

Overview

Algorithm and Results

Evaluating multi-view human pose estimation algorithm on CMU Panoptic Studio and other datasets

Overview

Testing on CMU dataset (using network pretrained on H36M)

Training on the CMU dataset

References and Acknowledgements

Optically illuminated directional sensing for guidance and alignment systems

Abstract

Awards

Video

ISEF Experience

Analysis of multimodal interactions for simultaneous spatial and cognitive tasks

Abstract

Awards

Experiment Video

Experiment Game

Preliminary Project

The Experience

Options

Options

Oops.