Image Forensics

Eight client-side instruments for reading image provenance — from EXIF metadata to frequency analysis, edge coherence, colour distribution, and texture statistics.

How do you tell if an image is AI-generated? For a human eyes, this is not a trivial problem. Not to mention, every new image models makes this challenge harder.

But for a computer, seeing the statistical regularities of an image is a relatively easy problem.

Rather than teaching "how to know if image is AI" -- which is a losing battle because it required updated techniques -- it is more important to learn about image forensics.

To understand image forensics, we need to understand how a computer sees an image.

All analysis runs in your browser. No image data is uploaded or transmitted.

Image Forensics Tool

Image A

No image loaded

Image B

No image loaded

How a computer sees an image

To a computer, an image is not what it looks like to you -- it is a grid of numbers.

A 1920×1080 photo is 2,073,600 pixels, each storing three integers: Red, Green, Blue (0-255).

The computer has no concept of "this looks like a face" or "this seems real." It only has those numbers. Every analysis on this page is a mathematical operation on that grid.

Example: Camera

A physical camera sensor does not record perfectly. Light arrives as individual photons (discrete particles) which introduces unavoidable randomness.

Every sensor pixel fires slightly inconsistently due to manufacturing tolerance. The readout amplifier adds electronic noise. These imperfections is how camera photographs own a distinctive noise texture that can be picked up by classifiers.

Spatially random (different every exposure)
Statistically predictable in aggregate (follows Poisson and Gaussian distributions)
Unique per camera model, even per individual unit (PRNU)

What makes an AI image statistically distinct:

Most AI-generated images are produced by a diffusion model.

A diffusion model works by starting with pure random noise and progressively denoising it toward a target description.

Unlike cameras, there is no sensor. There is no physical light path. The output pixel values come from learned weights, not photons. This produces images that are:

Artificially smooth at the finest scale (the model learned to remove high-frequency noise from training images)
Regular at coarser scales (the denoising process operates in frequency bands)
Constrained by training distribution (if the training set skewed toward certain lighting, colour palettes, or edge styles, the output inherits those biases)

The instruments on this page measure precisely these statistical differences.

We don't ask "does this look real?" We ask "do these numbers behave like photons hitting a sensor, or like a learned probability distribution?"

What each instrument reads

Instrument	What the computer measures	Real photograph	AI / screenshot
Metadata	EXIF/XMP tags embedded by the camera at capture	Make, Model, ISO, shutter speed, GPS — written by sensor hardware	Empty or software-only fields. The absence of camera EXIF is the single strongest provenance signal.
Noise Profile	High-frequency residual after subtracting a blurred copy	Fine grain from photon shot noise and thermal noise — histogram spread evenly	Near-zero residual — the denoiser erased fine variation. Histogram spikes near zero.
Block Variance	Local contrast in 8×8 pixel blocks, then coefficient of variation (CV)	Uneven — sky blocks near zero variance, textured surfaces high. CV > 1.2 typical	Uniform complexity — AI fills every region with "moderate" detail. CV 0.4–0.8 typical.
FDA — Frequency	Variance at each downsampling level (100%, 50%, 25%, 12.5%)	Steep drop — fine grain disappears rapidly when averaged down	Flatter curve — artificial detail added at generation time persists across scales
EC — Edges	Sobel gradient magnitude and orientation histogram	Edges from physical scene geometry — high entropy, all directions roughly equal	Edges from learned priors — may cluster in characteristic directions for a generation style
CCC — Colour	How many of 192 HSV colour buckets are occupied; saturation distribution	Wide gamut — natural scenes span many hues and saturations	Constrained palette — models trained on filtered datasets inherit colour biases
GLCM — Texture	How often each brightness-level pair appears side by side (co-occurrence matrix)	Diagonal-biased matrix — smooth gradients dominate; high homogeneity	Pattern depends on generation model and content — each architecture has a characteristic texture fingerprint
Feature Profile	All of the above as raw numbers, grouped for side-by-side comparison	—	—

How AI images produce statistical tells

Understanding why the instruments find anything requires understanding what goes wrong during generation:

The denoising smoothness problem. Diffusion models are trained to remove noise. They do this so effectively that the output has less high-frequency variation than any real photograph. The noise tab measures this directly: an AI image's residual histogram spikes near zero because the model has erased the photon-level randomness that a camera always produces.

The frequency scale problem. A real photograph has strong texture variation at the original scale that falls off naturally as you zoom out (averaging removes detail). AI images add detail synthetically at every scale the model operates on, which can produce an unusually flat or even increasing variance curve across scales. The FDA tab shows this by measuring variance at four zoom levels.

The edge coherence problem. Real edges come from physical scene geometry — an object boundary, a cast shadow, a specular highlight. These produce edges in all directions roughly equally, depending on scene content. AI models learn edge patterns from training data and may overrepresent certain orientations (smooth skin transitions in portrait models; right-angle architecture in photorealism models). The EC tab measures orientation entropy: a flat radar = natural distribution; a spiked radar = orientation bias.

The colour distribution problem. Real-world photography spans a wide range of illumination conditions, colour temperatures, and scene types. AI models trained on curated internet images inherit their training set's colour biases — often slightly over-saturated, with certain hue clusters overrepresented. The CCC tab measures how many of 192 HSV cells are occupied; a narrow cluster suggests a constrained generation palette.

The texture regularity problem. The GLCM co-occurrence matrix captures how brightness levels relate to their immediate neighbors. Physical sensors produce certain characteristic transition patterns (photon noise creates predictable local statistics). AI generators produce transition patterns shaped by their architecture — convolution kernels, attention windows, and upsampling methods all leave fingerprints in the co-occurrence statistics.

The fundamental limit. Every model generation corrects for previously discovered tells. Midjourney v6, DALL-E 3, and SD3+ were each trained with adversarial feedback from detectors. The noise and frequency artifacts in early diffusion models are largely gone in current models. What remains reliable is metadata (written by file format, not model) and, to some extent, very fine-grained texture statistics that are difficult to adversarially eliminate without reducing image quality.

Limitations

Limitation	What it means
Preset "real" images are screenshots	Labs thumbnails are browser-rendered UI exports — no sensor noise, no camera EXIF. They show digital rendering vs AI photorealism, not camera vs AI. Upload a smartphone photo for the strongest signal across all instruments.
Modern diffusion models defeat pixel analysis	Adversarial training against detectors has eliminated most frequency and noise artifacts. Texture statistics (GLCM) and colour distribution are the remaining discriminatory signals — but they require training data to interpret reliably.
JPEG recompression changes noise	Each re-save (Twitter, Slack, CMS upload) accumulates quantization artifacts that mimic sensor grain. The noise residual reflects compression history, not sensor origin. Lossless PNG gives cleaner signal.
Digital art is not AI	Human-made digital illustrations have no sensor noise and may have constrained colour palettes — the same statistical profile as AI images on several instruments. These tools cannot distinguish AI from human digital art.
No trained classifier means no verdict	The feature measurements are real. Mapping them to "AI vs real" requires a trained classifier calibrated to specific model families and image types. Without that, the numbers describe the image — they do not classify it.

What should I look for?

There is no threshold to cross. Anyone who tells you "above X% CV it's real" is overfitting to a specific image generation model that will be obsolete in six months.

Forensic principles:

Principle	What it means in practice
Converge across instruments	No single instrument is conclusive. Metadata, noise, variance, frequency, edge, and colour must agree. Three signals pointing the same direction is a composite case. One anomalous reading out of six is noise.
Establish provenance chains	The reliable question is not "does this look real?" but "can its origin be traced?" A smartphone photo has a chain: device → capture → file → metadata. An AI image has a gap where the sensor should be.
Work with the earliest copy	Every JPEG re-save changes the noise profile. Work from the original file, not a social media repost.
Never rely on visual inspection	AI images are designed to be visually indistinguishable. These instruments measure below the perceptual threshold. "It just looks off" is a different category from forensics.

Reading each tab:

Instrument	What to compare	Rough guide
Metadata	Camera fields present vs absent	Make/Model/ISO present = real camera provenance. Empty = screenshot, AI, or stripped. Both empty = metadata tells you nothing.
Noise	Histogram shape, not just mean	Tight spike near zero = AI or vector. Spread histogram = sensor grain. Mean < 3 = unusually smooth. Mean > 12 = real sensor.
Block Variance	CV number, not mean	CV > 1.2 = uneven complexity (sky + texture mix, real photo pattern). CV < 0.6 = uniform complexity (AI or flat digital render).
FDA	Shape of the variance drop	Steep monotonic drop = natural fine-to-coarse falloff. Flat or bumpy curve = synthetic detail added at multiple scales.
Edges	Radar shape + entropy value	Near-circular radar + entropy near 1.0 = edges in all directions (natural scene). Spiked radar = orientation bias (architectural or portrait-style model artifacts).
Colour	Number of occupied cells + palette shape	Many scattered cells = wide natural gamut. Clustered cells in narrow hue range = constrained generation palette.
GLCM	Contrast and entropy values	High contrast + high entropy = complex, varied texture. Low contrast + high energy = smooth, uniform generation output.

Glossary

How images are stored

Term	Definition
Pixel	The smallest unit of an image — a single coloured dot. A 1920×1080 image has 2,073,600 pixels.
RGB	Red, Green, Blue — the three colour channels stored per pixel, each 0–255. The combination encodes any visible colour.
HSV	Hue (colour angle 0–360°), Saturation (colour purity 0–1), Value (brightness 0–1). A different way to encode the same colour that is more intuitive for colour analysis.
JPEG	A lossy image format. Compresses by discarding high-frequency detail in 8×8 blocks. Each re-save discards more detail and adds quantization artifacts.
EXIF	Exchangeable Image File Format — metadata written into JPEG/TIFF by the camera at capture. Contains sensor settings, lens focal length, GPS, timestamps. Absent from most AI-generated images.
XMP	Adobe's XML metadata format, embeddable in JPEG/PNG/PDF. Sometimes written by generation software. Less diagnostic than camera EXIF.

How cameras work

Term	Definition
Shot noise	Random pixel variation from discrete photon capture. Individual photons arrive randomly — Poisson statistics — creating characteristic grain in real photographs. Cannot be faked cleanly.
PRNU	Photo Response Non-Uniformity — a unique noise fingerprint from manufacturing imperfections in the sensor. Every camera has one. Absent from AI images. Links a photograph to a specific physical device.
Sensor noise	The combined effect of shot noise, thermal noise, and read noise from the amplifier. Creates the grainy texture measured by the noise residual tab.

What the instruments compute

Term	Definition
High-pass filter	Any operation that removes smooth gradients and keeps fine detail. The noise tab uses box-blur subtraction: subtract a blurred copy from the original to isolate grain.
Noise residual	The result of the high-pass filter — what's left after smooth areas are removed. Camera photos have a spread residual; AI images spike near zero.
Block variance	Statistical variance (spread of pixel brightness) within a small region. High = complex texture. Low = smooth or uniform.
CV	Coefficient of variation — standard deviation ÷ mean. Measures relative spread. CV = 0: all blocks identical. CV = 1: spread equals mean. High CV = uneven scene complexity (real photos). Low CV = uniform complexity (AI).
FDA	Frequency Domain Analysis — here computed by measuring variance at multiple downsampled scales. Shows how texture energy decays with scale. Steep = natural. Flat = synthetic.
Spectral slope	The rate at which variance falls across scales. A negative slope = energy falls as expected. A shallow or positive slope = artificial detail at small scales.
Sobel filter	A convolution kernel that estimates the image gradient — how rapidly brightness changes across each pixel. High gradient = edge present. The EC tab runs this to find edges.
Orientation entropy	How evenly edge directions are distributed across 8 compass angles. Maximum (1.0) = edges point in all directions equally (natural scenes). Low = edges cluster in one direction (stylistic bias).
GLCM	Gray-Level Co-occurrence Matrix — records how often pairs of brightness levels appear side by side. A 16×16 matrix: cell (i, j) = frequency of level-i pixel adjacent to level-j pixel. Captures local texture structure.
Haralick features	Four summary statistics computed from the GLCM: Energy (texture uniformity), Contrast (local brightness variation), Homogeneity (similarity to smooth gradients), Entropy (texture complexity). Named after Robert Haralick (1973).
CCC	Colour Cluster Count — the number of distinct HSV colour buckets occupied by an image. High count = wide natural gamut. Low count = constrained or desaturated palette.
Quantization artifact	Structured error from JPEG compression. JPEG rounds DCT frequency coefficients in 8×8 blocks. Repeated saves accumulate these and can mimic sensor grain in the noise residual.

AI generation

Term	Definition
Diffusion model	The dominant AI image generation architecture (Midjourney, DALL-E, Stable Diffusion). Starts with random noise and iteratively denoises it guided by a text prompt. No physical sensor — no photons, no shot noise.
Latent space	The compressed internal representation a diffusion model works in. Generation happens in this lower-dimensional space, then decoded to pixels. Upsampling at decode can introduce characteristic frequency artifacts.
Adversarial training	Training a model to fool a detector while generating. Modern models were trained against the same statistical detectors used here, which is why pixel-level analysis has diminishing returns on current outputs.
ELA	Error Level Analysis — re-compresses at known JPEG quality, measures difference. Edited or composited regions show different error levels. Not implemented here but complementary to the noise residual.