Image Forensics
Eight client-side instruments for reading image provenance — from EXIF metadata to frequency analysis, edge coherence, colour distribution, and texture statistics.
How do you tell if an image is AI-generated? For a human eyes, this is not a trivial problem. Not to mention, every new image models makes this challenge harder.
But for a computer, seeing the statistical regularities of an image is a relatively easy problem.
Rather than teaching "how to know if image is AI" -- which is a losing battle because it required updated techniques -- it is more important to learn about image forensics.
To understand image forensics, we need to understand how a computer sees an image.
All analysis runs in your browser. No image data is uploaded or transmitted.
Image Forensics Tool
Image A
No image loaded
Image B
No image loaded
How a computer sees an image
To a computer, an image is not what it looks like to you -- it is a grid of numbers.
A 1920×1080 photo is 2,073,600 pixels, each storing three integers: Red, Green, Blue (0-255).
The computer has no concept of "this looks like a face" or "this seems real." It only has those numbers. Every analysis on this page is a mathematical operation on that grid.
Example: Camera
A physical camera sensor does not record perfectly. Light arrives as individual photons (discrete particles) which introduces unavoidable randomness.
Every sensor pixel fires slightly inconsistently due to manufacturing tolerance. The readout amplifier adds electronic noise. These imperfections is how camera photographs own a distinctive noise texture that can be picked up by classifiers.
- Spatially random (different every exposure)
- Statistically predictable in aggregate (follows Poisson and Gaussian distributions)
- Unique per camera model, even per individual unit (PRNU)
What makes an AI image statistically distinct:
Most AI-generated images are produced by a diffusion model.
A diffusion model works by starting with pure random noise and progressively denoising it toward a target description.
Unlike cameras, there is no sensor. There is no physical light path. The output pixel values come from learned weights, not photons. This produces images that are:
- Artificially smooth at the finest scale (the model learned to remove high-frequency noise from training images)
- Regular at coarser scales (the denoising process operates in frequency bands)
- Constrained by training distribution (if the training set skewed toward certain lighting, colour palettes, or edge styles, the output inherits those biases)
The instruments on this page measure precisely these statistical differences.
We don't ask "does this look real?" We ask "do these numbers behave like photons hitting a sensor, or like a learned probability distribution?"
What each instrument reads
| Instrument | What the computer measures | Real photograph | AI / screenshot |
|---|---|---|---|
| Metadata | EXIF/XMP tags embedded by the camera at capture | Make, Model, ISO, shutter speed, GPS — written by sensor hardware | Empty or software-only fields. The absence of camera EXIF is the single strongest provenance signal. |
| Noise Profile | High-frequency residual after subtracting a blurred copy | Fine grain from photon shot noise and thermal noise — histogram spread evenly | Near-zero residual — the denoiser erased fine variation. Histogram spikes near zero. |
| Block Variance | Local contrast in 8×8 pixel blocks, then coefficient of variation (CV) | Uneven — sky blocks near zero variance, textured surfaces high. CV > 1.2 typical | Uniform complexity — AI fills every region with "moderate" detail. CV 0.4–0.8 typical. |
| FDA — Frequency | Variance at each downsampling level (100%, 50%, 25%, 12.5%) | Steep drop — fine grain disappears rapidly when averaged down | Flatter curve — artificial detail added at generation time persists across scales |
| EC — Edges | Sobel gradient magnitude and orientation histogram | Edges from physical scene geometry — high entropy, all directions roughly equal | Edges from learned priors — may cluster in characteristic directions for a generation style |
| CCC — Colour | How many of 192 HSV colour buckets are occupied; saturation distribution | Wide gamut — natural scenes span many hues and saturations | Constrained palette — models trained on filtered datasets inherit colour biases |
| GLCM — Texture | How often each brightness-level pair appears side by side (co-occurrence matrix) | Diagonal-biased matrix — smooth gradients dominate; high homogeneity | Pattern depends on generation model and content — each architecture has a characteristic texture fingerprint |
| Feature Profile | All of the above as raw numbers, grouped for side-by-side comparison | — | — |
How AI images produce statistical tells
Understanding why the instruments find anything requires understanding what goes wrong during generation:
The denoising smoothness problem. Diffusion models are trained to remove noise. They do this so effectively that the output has less high-frequency variation than any real photograph. The noise tab measures this directly: an AI image's residual histogram spikes near zero because the model has erased the photon-level randomness that a camera always produces.
The frequency scale problem. A real photograph has strong texture variation at the original scale that falls off naturally as you zoom out (averaging removes detail). AI images add detail synthetically at every scale the model operates on, which can produce an unusually flat or even increasing variance curve across scales. The FDA tab shows this by measuring variance at four zoom levels.
The edge coherence problem. Real edges come from physical scene geometry — an object boundary, a cast shadow, a specular highlight. These produce edges in all directions roughly equally, depending on scene content. AI models learn edge patterns from training data and may overrepresent certain orientations (smooth skin transitions in portrait models; right-angle architecture in photorealism models). The EC tab measures orientation entropy: a flat radar = natural distribution; a spiked radar = orientation bias.
The colour distribution problem. Real-world photography spans a wide range of illumination conditions, colour temperatures, and scene types. AI models trained on curated internet images inherit their training set's colour biases — often slightly over-saturated, with certain hue clusters overrepresented. The CCC tab measures how many of 192 HSV cells are occupied; a narrow cluster suggests a constrained generation palette.
The texture regularity problem. The GLCM co-occurrence matrix captures how brightness levels relate to their immediate neighbors. Physical sensors produce certain characteristic transition patterns (photon noise creates predictable local statistics). AI generators produce transition patterns shaped by their architecture — convolution kernels, attention windows, and upsampling methods all leave fingerprints in the co-occurrence statistics.
The fundamental limit. Every model generation corrects for previously discovered tells. Midjourney v6, DALL-E 3, and SD3+ were each trained with adversarial feedback from detectors. The noise and frequency artifacts in early diffusion models are largely gone in current models. What remains reliable is metadata (written by file format, not model) and, to some extent, very fine-grained texture statistics that are difficult to adversarially eliminate without reducing image quality.
Limitations
| Limitation | What it means |
|---|---|
| Preset "real" images are screenshots | Labs thumbnails are browser-rendered UI exports — no sensor noise, no camera EXIF. They show digital rendering vs AI photorealism, not camera vs AI. Upload a smartphone photo for the strongest signal across all instruments. |
| Modern diffusion models defeat pixel analysis | Adversarial training against detectors has eliminated most frequency and noise artifacts. Texture statistics (GLCM) and colour distribution are the remaining discriminatory signals — but they require training data to interpret reliably. |
| JPEG recompression changes noise | Each re-save (Twitter, Slack, CMS upload) accumulates quantization artifacts that mimic sensor grain. The noise residual reflects compression history, not sensor origin. Lossless PNG gives cleaner signal. |
| Digital art is not AI | Human-made digital illustrations have no sensor noise and may have constrained colour palettes — the same statistical profile as AI images on several instruments. These tools cannot distinguish AI from human digital art. |
| No trained classifier means no verdict | The feature measurements are real. Mapping them to "AI vs real" requires a trained classifier calibrated to specific model families and image types. Without that, the numbers describe the image — they do not classify it. |
What should I look for?
There is no threshold to cross. Anyone who tells you "above X% CV it's real" is overfitting to a specific image generation model that will be obsolete in six months.
Forensic principles:
| Principle | What it means in practice |
|---|---|
| Converge across instruments | No single instrument is conclusive. Metadata, noise, variance, frequency, edge, and colour must agree. Three signals pointing the same direction is a composite case. One anomalous reading out of six is noise. |
| Establish provenance chains | The reliable question is not "does this look real?" but "can its origin be traced?" A smartphone photo has a chain: device → capture → file → metadata. An AI image has a gap where the sensor should be. |
| Work with the earliest copy | Every JPEG re-save changes the noise profile. Work from the original file, not a social media repost. |
| Never rely on visual inspection | AI images are designed to be visually indistinguishable. These instruments measure below the perceptual threshold. "It just looks off" is a different category from forensics. |
Reading each tab:
| Instrument | What to compare | Rough guide |
|---|---|---|
| Metadata | Camera fields present vs absent | Make/Model/ISO present = real camera provenance. Empty = screenshot, AI, or stripped. Both empty = metadata tells you nothing. |
| Noise | Histogram shape, not just mean | Tight spike near zero = AI or vector. Spread histogram = sensor grain. Mean < 3 = unusually smooth. Mean > 12 = real sensor. |
| Block Variance | CV number, not mean | CV > 1.2 = uneven complexity (sky + texture mix, real photo pattern). CV < 0.6 = uniform complexity (AI or flat digital render). |
| FDA | Shape of the variance drop | Steep monotonic drop = natural fine-to-coarse falloff. Flat or bumpy curve = synthetic detail added at multiple scales. |
| Edges | Radar shape + entropy value | Near-circular radar + entropy near 1.0 = edges in all directions (natural scene). Spiked radar = orientation bias (architectural or portrait-style model artifacts). |
| Colour | Number of occupied cells + palette shape | Many scattered cells = wide natural gamut. Clustered cells in narrow hue range = constrained generation palette. |
| GLCM | Contrast and entropy values | High contrast + high entropy = complex, varied texture. Low contrast + high energy = smooth, uniform generation output. |
Glossary
How images are stored
| Term | Definition |
|---|---|
| Pixel | The smallest unit of an image — a single coloured dot. A 1920×1080 image has 2,073,600 pixels. |
| RGB | Red, Green, Blue — the three colour channels stored per pixel, each 0–255. The combination encodes any visible colour. |
| HSV | Hue (colour angle 0–360°), Saturation (colour purity 0–1), Value (brightness 0–1). A different way to encode the same colour that is more intuitive for colour analysis. |
| JPEG | A lossy image format. Compresses by discarding high-frequency detail in 8×8 blocks. Each re-save discards more detail and adds quantization artifacts. |
| EXIF | Exchangeable Image File Format — metadata written into JPEG/TIFF by the camera at capture. Contains sensor settings, lens focal length, GPS, timestamps. Absent from most AI-generated images. |
| XMP | Adobe's XML metadata format, embeddable in JPEG/PNG/PDF. Sometimes written by generation software. Less diagnostic than camera EXIF. |
How cameras work
| Term | Definition |
|---|---|
| Shot noise | Random pixel variation from discrete photon capture. Individual photons arrive randomly — Poisson statistics — creating characteristic grain in real photographs. Cannot be faked cleanly. |
| PRNU | Photo Response Non-Uniformity — a unique noise fingerprint from manufacturing imperfections in the sensor. Every camera has one. Absent from AI images. Links a photograph to a specific physical device. |
| Sensor noise | The combined effect of shot noise, thermal noise, and read noise from the amplifier. Creates the grainy texture measured by the noise residual tab. |
What the instruments compute
| Term | Definition |
|---|---|
| High-pass filter | Any operation that removes smooth gradients and keeps fine detail. The noise tab uses box-blur subtraction: subtract a blurred copy from the original to isolate grain. |
| Noise residual | The result of the high-pass filter — what's left after smooth areas are removed. Camera photos have a spread residual; AI images spike near zero. |
| Block variance | Statistical variance (spread of pixel brightness) within a small region. High = complex texture. Low = smooth or uniform. |
| CV | Coefficient of variation — standard deviation ÷ mean. Measures relative spread. CV = 0: all blocks identical. CV = 1: spread equals mean. High CV = uneven scene complexity (real photos). Low CV = uniform complexity (AI). |
| FDA | Frequency Domain Analysis — here computed by measuring variance at multiple downsampled scales. Shows how texture energy decays with scale. Steep = natural. Flat = synthetic. |
| Spectral slope | The rate at which variance falls across scales. A negative slope = energy falls as expected. A shallow or positive slope = artificial detail at small scales. |
| Sobel filter | A convolution kernel that estimates the image gradient — how rapidly brightness changes across each pixel. High gradient = edge present. The EC tab runs this to find edges. |
| Orientation entropy | How evenly edge directions are distributed across 8 compass angles. Maximum (1.0) = edges point in all directions equally (natural scenes). Low = edges cluster in one direction (stylistic bias). |
| GLCM | Gray-Level Co-occurrence Matrix — records how often pairs of brightness levels appear side by side. A 16×16 matrix: cell (i, j) = frequency of level-i pixel adjacent to level-j pixel. Captures local texture structure. |
| Haralick features | Four summary statistics computed from the GLCM: Energy (texture uniformity), Contrast (local brightness variation), Homogeneity (similarity to smooth gradients), Entropy (texture complexity). Named after Robert Haralick (1973). |
| CCC | Colour Cluster Count — the number of distinct HSV colour buckets occupied by an image. High count = wide natural gamut. Low count = constrained or desaturated palette. |
| Quantization artifact | Structured error from JPEG compression. JPEG rounds DCT frequency coefficients in 8×8 blocks. Repeated saves accumulate these and can mimic sensor grain in the noise residual. |
AI generation
| Term | Definition |
|---|---|
| Diffusion model | The dominant AI image generation architecture (Midjourney, DALL-E, Stable Diffusion). Starts with random noise and iteratively denoises it guided by a text prompt. No physical sensor — no photons, no shot noise. |
| Latent space | The compressed internal representation a diffusion model works in. Generation happens in this lower-dimensional space, then decoded to pixels. Upsampling at decode can introduce characteristic frequency artifacts. |
| Adversarial training | Training a model to fool a detector while generating. Modern models were trained against the same statistical detectors used here, which is why pixel-level analysis has diminishing returns on current outputs. |
| ELA | Error Level Analysis — re-compresses at known JPEG quality, measures difference. Edited or composited regions show different error levels. Not implemented here but complementary to the noise residual. |
Further reading
How cameras produce noise
- Janesick, J. R. — Scientific Charge-Coupled Devices (2001). The standard reference for CCD sensor physics including shot noise, read noise, and dark current — the physical sources measured by the noise residual tab.
- Healey, G., Kondepudy, R. — Radiometric CCD camera calibration and noise estimation (1994). Early paper on modelling sensor noise statistically.
Image forensics foundations
- Farid, H. — Digital Image Forensics (2016). Standard academic reference for image authentication. Covers clone detection, splicing detection, sensor noise fingerprinting, and compression analysis.
- Lukas, J., Fridrich, J., Goljan, M. — Digital camera identification from sensor pattern noise (2006). The paper that established PRNU as a reliable camera fingerprint.
- Haralick, R. M., Shanmugam, K., Dinstein, I. — Textural features for image classification (1973). The original paper for GLCM and the four Haralick features computed in the texture tab.
AI image detection
- Wang, S. Y., Wang, O., Zhang, R., Owens, A., Efros, A. A. — CNN-generated images are surprisingly easy to spot… for now (CVPR 2020). Early work on GAN spectral artifacts. The "for now" in the title aged correctly — adversarial training by newer models largely defeated these detectors.
- Corvi, R. et al. — On the detection of synthetic images generated by diffusion models (ICASSP 2023). More recent work showing that diffusion model artifacts differ substantially from GAN artifacts and require different detection approaches.
Accessible starting points
- The Hany Farid lab at UC Berkeley (farid.berkeley.edu) publishes readable summaries of image forensics methods and maintains tools for practitioners.
- Bellingcat's guides on open-source image verification — focused on investigative journalism: provenance tracing, reverse image search, shadow and lighting consistency checks that complement statistical analysis.
- MIT Media Lab's Detect Fakes project — explores the limits of human perception in distinguishing AI faces from real ones. Context for why instrument-based analysis matters.
Human or AI?
A real-time behavioral profiler. The machine learns your patterns as you type — then tells you how human you look.
Stylometrics
Four instruments for reading your own writing — Zipf fingerprint, predictability waveform, vocabulary drift, and cognitive framing analysis. A teaching tool for understanding how forensic stylometry works, not a production classifier.