NLP Newsletter: Detecting AI-Generated Text, Text-to-4D, ML Papers Explained, MusicLM,...

Detecting AI-Generated Text, Text-to-4D, ML Papers Explained, MusicLM,...

Feb 01, 2023

Hi all. Welcome back to our regular NLP Newsletter issue. I am excited to relaunch the newsletter to keep you informed on the latest in NLP and ML.

Detecting AI-Generated Text

Given how popular text-based generative applications have gotten over the past few months, several tools and frameworks have emerged that help detect content generated by language models. A few recent efforts include:

WaterMarking of LLMs

The watermarked text is expected to contain 9 “green” tokens if a human wrote it but it instead contains 28 of them. This means it’s highly unlikely that it’s written by a human and extremely certain that it’s machine-generated. Source: Kirchenbauer et al. 2023

Kirchenbauer et al. recently proposed a new watermarking framework for proprietary language models. While watermarking can be algorithmically detected, it’s important that it doesn’t affect the text quality.

Ideally, the watermark can be detected algorithmically without access to the LLM API, leading to the possibility of open-sourcing the detection algorithm and reducing costs that could emerge from loading or running the models.

The basic idea of the paper is to use the concept of whitelisting and blacklisting to restrict the model's next token output. The watermark can be detected by counting the whitelist tokens (which are tokens the LLM is allowed to use). This is an effective approach that works well along with other design tricks, even for short text.

Tom Goldstein @tomgoldsteincs

#OpenAI is planning to stop #ChatGPT users from making social media bots and cheating on homework by "watermarking" outputs. How well could this really work? Here's just 23 words from a 1.3B parameter watermarked LLM. We detected it with 99.999999999994% confidence. Here's how 🧵

DetectGPT

The proposed system compares the log probability under p of the original sample x with perturbations obtained from a pre-trained model. Source: Mitchell et al. 2023

DetectGPT is an approach (by Mitchell et al.) for zero-shot machine-generated text detection. Unlike other methods that require classifiers or watermarking generated text, this work uses raw log probabilities from the LLM to determine if the passage was sampled from it.

As demonstrated in the diagram above, DetectGPT compares the log probability under p of the original sample x with the perturbations obtained from a pre-trained model like T5.

DetectGPT improved the detection of fake news articles generated by a 20B parameter GPT-NeoX from 0.81 AUROC (strongest zero-shot baseline) to 0.95 AUROC.

GPTZero

GPTZero is a platform that helps detect AI plagiarism. The system is based on properties like perplexity and burstiness of text.

AI Text Classifier by Open AI

More recently, OpenAI also released a new tool to distinguish between AI-written and human-written text. Try the classifier here.

All these techniques also come with their disadvantages. For instance, DetectGPT relies on the outputs of an LLM that may not be representative. Watermarking requires a strong algorithm that’s robust to potential attacks (e.g., text insertion, generative attacks, etc.) that aim to avoid detection.

We will keep a close eye on related developments and report progress as this becomes an important consideration when developing on top of LLMs.

This issue is brought to you by Monster API. I recently tried Monster API, a new platform based on decentralized computing offering generative AI models as a service. A word from them:

If you are building in the Generative AI space, you can relate with the pain of accessing cutting-edge ML models.

They are super expensive and often only available through centralized clouds like AWS or Google cloud. But not anymore, Monster API gives you access to top-notch models, like DreamBooth, Stable Diffusion, and ChatGPT alternatives, through easy to use & scalable APIs powered by the disruptive force of decentralized computing. And for a limited time, we're giving you the chance to experience this game-changer for yourself with 5000 free API calls for the first 100 users.

This is the future of cloud computing, and it's available today, at a fraction of the cost. Unleash the full potential of your projects and change the world with Monster API.

Text-to-4D

Samples generated by Make-A-Video3D. Source: Singer et al. 2023

A new model called Make-A-Video3D (by Meta AI) is trained to generate 3D dynamic scenes from input text descriptions. This follows previous efforts such as Make-A-Scene and Make-A-Video.

The approach incorporates a 4D dynamic Neural Radiance Field (NeRF), optimized for scene appearance, density, and motion consistency by querying a Text-to-Video diffusion model. No 4D or 3D data is required. The Text-to-Video model is trained only on text-image pairs and unlabeled videos. Find interactive examples here.

Super-resolution fine-tuning is also used to improve the resolution of the model. The authors claim that this is the first AI system to generate 3D dynamic scenes given a text description.

Devi Parikh @deviparikh

Introducing Make-A-Video3D! Generating 3D dynamic (mini) scenes from input text. That is, text --> 4D! Needs no 4D data (i.e., no dynamic 3D data), no static 3D data, no paired text-video data. Paper: arxiv.org/abs/2301.11280 Website: make-a-video3d.github.io

Here is a nice Twitter thread by Jim Fan on some of the recent milestones in generative AI:

Jim Fan @DrJimFan

Generative AI is climbing the *Dimensional Ladder*. I made a figure to show the milestones! Text => 1D: MusicLM, VALL-E 2D: Stable Diffusion, DALL-E, MidJourney 3D (or 2+1D): Imagen-video, Phenaki 3D: Magic3D, DreamFusion, Point-E 4D (or 3+1D): Make-A-Video-3D What’s next? 🤔

MusicLM

Proposed approach architecture overview. Source: Agostinelli et al. 2023

Google Research introduces a new model, MusicLM, for generating high-fidelity music (24 kHz) from text descriptions. The system can be conditioned on both text and melody. The model can generate coherent music up to 5 minutes long.

They also release a new evaluation dataset, MusicCaps, consisting of 5.5k high-quality music captions written by musicians.

The field is moving so fast that there is already another method that can also perform text-to-music generation with long-context latent diffusion. This approach, called Moûsai, can generate high-quality stereo at 48kHz from textual descriptions. Here is the open-source PyTorch-based library and samples to explore.

Audio generation continues to get better but approaches are not as developed as in other areas like image and text generation. This repository contains a nice list of some of the latest AI models for audio generation.

Notable Mentions

This section includes notable mentions of other trending ML resources and papers.

Prompt Engineering Guide
Top ML Papers of the Week - every week we will be publishing a recap of the top trending ML papers. You can also keep track via Twitter or LinkedIn
Open problems in applied deep learning
Google AI research recap for 2022
A new foundation model, ClimaX, for weather and climate
InstructPix2Pix is a method with the capability of editing images from human instructions
Watch out for prompt injections!
ML Papers Explained
Can you replace the backend with an LLM?
LeCun argues that ChatGPT is “not particularly innovative”

If you are interested in sponsoring a future newsletter issue, reach out at ellfae@gmail.com or Twitter.