Steward Observatory and Department of Astronomy tradition is to spend valuable[citation needed] grad student time concocting plans to amuse, vex, or embarrass the principal investigator.
Note to P.I.: This also means any embarrassing mistakes you’ve seen me make have been absolutely intentional.
We call these pranks, though I’m not sure that’s entirely accurate. In any case, we cannot hope to rival the time someone used computer administrator access to bamboozle a CNN-addicted advisor with a fake homepage. I think of them more as artistic expressions of the self, mediated through the constraints of graduate school and the cult of personality inherent in any advising relationship.
There was that one time that priceless works of art appeared to decorate the office while its occupant was abroad in Chile, and, more recently, the Merry MagAO-Xmas display. Both of these relied on having a group of graduate students with Photoshop™ skills to render 2D images that reveal the essential nature of the subject.
For the next iteration, we had to step things up. Kick it up a notch. Take things into a whole new dimension. Could we photoshop our advisor into… a movie? Haha, just kidding! Even a short clip would be many hundreds of frames. Unless…
What if there were a tool that leveraged image processing, GPU programming, and machine learning to automate this for us? We’re high-contrast imagers; we know these things. I immediately set to work on a literature review.
It just so happened that a fellow graduate student had (unknowingly) answered our prayers in “Motion-supervised Co-Part Segmentation” by Aliaksandr Siarohin et al. from ICPR 2021. Or, more importantly, the associated open-source code. Armed with a bottom-shelf NVIDIA GPU and a refurbished Dell workstation, I dug in to the code. It seemed like I’d be able to get a good “face swap,” but there was one nagging problem.
What does my advisor’s face look like?
In pre-COVID times, one would have simply ambushed him with a camera and sprinted off before he realized what happened. Confined to my home, I was forced to rely on the collective memory of the research group: in other words, this very blog.
I quickly discovered that the meek Dr. Males was camera-shy. How else does one explain his tendency to shrink into the backs of group photos? Or to grace us with only a partial mug? It’s almost as if he doesn’t even want a deep-fake model trained on his appearance! Nevertheless, I found a handful of suitable photos among the thousands, and I moved on to the next question:
Into which clip shall I face-swap my advisor?
After discounting Top Gun (for a lack of suitable pithy quote clips on YouTube), I eventually settled on this one:
“You look terrible. I want you to eat. I want you to rest well.”
Don Corleone
Who wouldn’t want to hear that from their advisor? (Maybe we don’t want to hear the first part, but let’s not lie to ourselves.)
Source material in hand, I fired up the deepfake machine, and…
Yikes. Undaunted, I continued my analysis of the archival image data.
It turned out that Jaredification performed better when the Jared used was clean-shaven, limiting us to vintage blog photography. I found what I was looking for in this post from 2012 and gave it another go.
“It’s too late; they start shooting in a week…”
“I’m gonna make him an offer he can’t refuse.”
Ultimately, I wouldn’t say this was an unqualified success (except in that I’m “unqualified” to do deep learning on videos). There didn’t seem to be any rhyme or reason to which photos segmented well and which did not, but I was unable to acquire additional data without tipping off the subject to what I was doing.
Further investigation is needed, promising directions have been identified, funding priorities elucidated, etc. Until then, it helps if you just kind of squint at it.