Utkarsh Ojha

Postdoctoral fellow • Carnegie Mellon University

I am a postdoctoral researcher working with Fernando De la Torre at Carnegie Mellon University. Before this, I completed my PhD at UW Madison working with Yong Jae Lee. My past work has explored generative models for computer vision tasks. In general, my philosophy is to view this class of algorithms as a tool to help us go outside the training distribution, and generate something we don't already have. Following is a list of the projects I've worked on so far (* denotes equal contribution):

On the Effectiveness of Dataset Alignment for Fake Image Detection • ICLR 2025

Anirudh Sundara Rajan*, Utkarsh Ojha*, Jedidiah Schloesser, Yong Jae Lee

We present a simple and computationally efficient method to align real and fake images so that the detector looks only at the artifacts introduced by the generative model and is less prone to learn spurious features. Using this alignment principle, one can build a detector without ever seeing a natural-looking real or fake image.

Yo'LLaVA: Your Personalized Language and Vision Assistant • NeurIPS 2024

Thao Nguyen, Haotian Liu, Yuheng Li, Mu Cai, Utkarsh Ojha, Yong Jae Lee

Large multimodal models (LMMs) have gained a lot of popularity recently. But they get are trained using the publicly avaialble data on the internet. Hence, their knowledge is generic; they can recognize that a dog is present an image, but there is no way for them to recognize your specific pet dog. This work presents a method to take such a pretrained LMM and make it personalized using a small set of additional images, so that it can answer more personalized questions - "What is doing?"

Edit One for All: Interactive Batch Image Editing • CVPR 2024

Thao Nguyen, Utkarsh Ojha, Yuheng Li, Haotian Liu, Yong Jae Lee

Most of the focus on image editing has remained on editing single images at a time. We present a method for interactive batch image editing, where given an edit specified by the user in the example image, our method can automatically transfer that edit to other test images so that irrespective of their initial state, they all arrive at the same final state.

What Knowledge Gets Distilled in Knowledge Distillation? • NeurIPS 2023

Utkarsh Ojha*, Yuheng Li*, Anirudh Sundara Rajan*, Yingyu Liang, Yong Jae Lee

When a student tries to mimic a teacher whle classifying an image, we see an improvement in its performance. But what happens in the background? Does the student really inherit teacher-specific properties which it would otherwise not have obtained? What are the ways in which we can study those properties? In these paper, we attempt to shed some light on this dark knowledge that the student inherits during the distillation process.

Visual Instruction Inversion: Image Editing via Visual Prompting • NeurIPS 2023

Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee

Text-conditioned image editing has emerged as a powerful tool for editing images. However, in many situations, language can be ambiguous and ineffective in describing specific image edits. We present a method for image editing via visual prompting, where given a "before" and "after" images of an edit, our goal is to learn a text-based editing direction that can be applied to unseen images.

Towards Universal Fake Image Detectors that Generalize Across Generative Models • CVPR 2023

Utkarsh Ojha*, Yuheng Li*, Yong Jae Lee

The past few years has seen the birth of a plethora of generative models. This work attempts to build systems that can detect fake images as such across different breeds of generative models. We show why training of neural networks for real/fake classification is not a good idea, and consequently show the surprising effectiveness of a feature space not explicitly trained for this task.

Few-shot Image Generation via Cross-domain Correspondence • CVPR 2021

Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, Richard Zhang

If you have 1000s of images from a domain (e.g. human faces), you can typically train a big neural network to generate images resembling its properties. What if you don't have that luxury? What if you only have, say 10 paintings from an artist, and want to generate more? That is the goal of this work: model a bigger distribution of a domain using extremely few training images from it.

Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains • ICLR 2021

Utkarsh Ojha, Krishna Kumar Singh, Yong Jae Lee

Let's say you have data which contains images from not one, but multiple object categories (e.g. dogs and cars). Can you learn a generative model which can still disentangle object shape and its appearance? We proposed a method for this task, where upon learning such a model, we can take the appearance of a furry dog, and transfer it onto a car to create a new species of furry cars.

Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data • NeurIPS 2020

Utkarsh Ojha, Krishna Kumar Singh, Yong Jae Lee

When your data has discrete object categories, a typical assumption for the discrete factors is a uniform multinomial distribution. What happens when the data has a class imbalance? We highlight the shortcomings of existing work in such scenarios, and propose a method which disentangles the discrete factor much more accurately without access to the ground-truth distribution.

MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation • CVPR 2020

Yuheng Li, Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee

Let's say you captured two pictures, one of a red sparrow, and another of a white swan. You're feeling creative, and want to imagine how that white swan would look with that red sparrow's appearance over it. MixNMatch does precisely that: it takes in real images, and extracts the object's shape and appearance independently, and combine them to create a hybrid bird: a red swan.

FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery • CVPR 2019

Krishna Kumar Singh*, Utkarsh Ojha*, Yong Jae Lee

Imagine a collection of natural birds. The goal in this project was to have a model which generates realistic images, and also learns to control its different properties. For example, the proposed method learns to control object shape, appearance, pose, background - without any supervision. We could now borrow the appearance of a colorful hummingbird, and put it over the body of a seagull.

NAG: Network for Adversary Generation • CVPR 2018

Konda Reddy Mopuri*, Utkarsh Ojha*, Utsav Garg, R. Venkatesh Babu

Universal adversarial perturbation describes an image-agnostic noise pattern, which when added to any natural image will fool a neural network based classifier. We proposed a method to generate not one, but a distribution of such noise images for a neural network. These were much stronger in terms of fooling not just the targeted classifier, but also many unseen ones.