With the support of Google’s Cyber NYC Institutional Research Program (IRP), a team of NYU Tandon School of Engineering researchers are developing an interactive system that issues intelligent challenges to differentiate real from deepfake audio and videos during live calls.
The researchers have already demonstrated an approach that can successfully address reliability problems with conventional forensic analysis in complex distribution channels. Their approach — performing end-to-end, joint optimization of a forensic analysis network and a neural imaging pipeline — obtained significant improvements in photo manipulation detection, increasing accuracy from ≈ 45 percent to over 90 percent.
The system uses what is called a “challenge-response” approach, which aims to “arm people with tools to avoid scams and other duplicitous acts.”
Chinmay Hegde, an associate professor in NYU Tandon Computer Science and Engineering and Electrical and Computer Engineering departments, and several of his colleagues, have published two papers that introduce and validate new techniques for real-time detection of deepfake audio and video.
The NYU Tandon School of Engineering project is called, Real-Time Deepfake Detection: Interactive, Multimodal, and Future-Proof, is just one of several projects that is supported by Google’s Cyber NYC Institutional Research Program, which in June 2023, allocated $12 million through Cyber NYC IRP to fund cybersecurity training, education, and cutting-edge research at NYU, City University of New York, Columbia University, and Cornell Tech.
The two papers published by Hegde and his colleagues are, GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response, and, PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response.
In the first paper, Hegde and his colleagues said that AI-enabled Real-Time Deepfakes (RTDFs) “have now made it feasible to replace an imposter’s face with their victim in live video interactions,” and that “such advancement in deepfakes also coaxes detection to rise to the same standard.”
The researchers said that “existing deepfake detection techniques are asynchronous and hence ill-suited for RTDFs,” and that “to bridge this gap, we propose a challenge-response approach that establishes authenticity in live settings. We focus on talking-head style video interaction and present a taxonomy of challenges that specifically target inherent limitations of RTDF generation pipelines.”
The team of researchers explained that they evaluated “representative examples from the taxonomy by collecting a unique dataset comprising eight challenges, which consistently and visibly degrades the quality of state-of-the-art deepfake generators. These results are corroborated both by humans and a new automated scoring function, leading to 88.6 percent and 80.1 percent AUC, respectively.”
They said their “findings underscore the promising potential of challenge-response systems for explainable and scalable real-time deepfake detection in practical scenarios.”
In a separate study of the use of RTDFs in social engineering attacks that use real-time voice impersonation to bypass conventional enrollment-based authentication, the researchers proposed used what they call PITCH, which they said is “a robust challenge-response method to detect and tag interactive deepfake audio calls.”
The team said they “developed a comprehensive taxonomy of audio challenges based on the human auditory system, linguistics, and environmental factors, yielding 20 prospective challenges. These were tested against leading voice-cloning systems using a novel dataset comprising 18,600 original and 1.6 million deepfake samples from 100 users. PITCH’s prospective challenges enhanced machine detection capabilities to 88.7 percent AUROC score on the full unbalanced dataset, enabling us to shortlist ten functional challenges that balance security and usability.”
“For human evaluation and subsequent analyses, we filtered a challenging, balanced subset,” Hegde and his team said. “On this subset, human evaluators independently scored 72.6 percent accuracy, while machines achieved 87.7 percent. Acknowledging that call environments require higher human control, we aided call receivers in making decisions with them using machines. Our solution uses an early warning system to tag suspicious incoming calls as ‘Deepfake-likely.’”
The researchers reported that “integrating human intuition with machine precision offers complementary advantages. Our solution gave users maximum control and boosted detection accuracy to 84.5 percent. Evidenced by this jump in accuracy, PITCH demonstrated the potential for AI-assisted pre-screening in call verification processes, offering an adaptable and usable approach to combat real-time voice-cloning attacks.”
Earlier this year, another NYU Tandon study exposed the failures of existing measures to prevent illegal content generation by text-to-image AI models. In a paper that was presented at the Twelfth International Conference on Learning Representations in Vienna in May, Circumventing Concept Erasure Methods For Text-To-Image Generative Models, the research team demonstrated how techniques that claim to “erase” the ability of models like Stable Diffusion to generate explicit, copyrighted, or otherwise unsafe visual content can be circumvented through simple attacks.
“Text-to-image models have taken the world by storm with their ability to create virtually any visual scene from just textual descriptions,” Hegde said. “But that opens the door to people making and distributing photo-realistic images that may be deeply manipulative, offensive and even illegal, including celebrity deepfakes or images that violate copyrights.”
In another study published in the proceedings of the IEEE International Joint Conference on Biometrics, Hegde and his colleagues demonstrated an AI technique that they developed that can change a person’s apparent age in images while maintaining their unique identifying features.
The NYU Center for Cybersecurity at NYU Tandon determined the allocation of funds for NYU’s faculty-led research projects.
In its first year, NYU Tandon unveiled a slate of projects under the Google Cyber NYC IRP research umbrella that are intended to help safeguard people from deepfakes and to build trust in the novel innovations that power industries and shape people’s lives.
“Our vision with our Google Cyber NYC IRP research roster is to foster a dynamic ecosystem that balances ambitious, long-term explorations with targeted, practical studies,” said Senior Vice Dean of NYU Tandon Eray Aydil. “We actively encouraged interdisciplinary collaborations across departments, universities, and research organizations to leverage diverse expertise in tackling complex challenges from multiple angles.”
Google Cyber NYC IRP’s target research areas this year are Trusted Computing, Trustworthy AI, AI for Cybersecurity Defense, and Human and Social Sciences.
In year two, Google Cyber NYC IRP aims to continue its focus on collaborative research in privacy, security, and safety. NYU’s Center for Cybersecurity issued another school-wide call for proposals, inviting faculty to submit their research ideas.
Article Topics
cybersecurity | deepfake detection | deepfakes | fraud prevention | generative AI | Google | synthetic data