NelworksNelworks
Ai

Image Forensics

Eight client-side instruments for reading image provenance — from EXIF metadata to frequency analysis, edge coherence, colour distribution, and texture statistics.

How do you tell if an image is AI-generated? For a human eyes, this is not a trivial problem. Not to mention, every new image models makes this challenge harder.

But for a computer, seeing the statistical regularities of an image is a relatively easy problem.

Rather than teaching "how to know if image is AI" -- which is a losing battle because it required updated techniques -- it is more important to learn about image forensics.

To understand image forensics, we need to understand how a computer sees an image.

All analysis runs in your browser. No image data is uploaded or transmitted.


Image Forensics Tool

Image A

No image loaded

Image B

No image loaded


How a computer sees an image

To a computer, an image is not what it looks like to you -- it is a grid of numbers.

A 1920×1080 photo is 2,073,600 pixels, each storing three integers: Red, Green, Blue (0-255).

The computer has no concept of "this looks like a face" or "this seems real." It only has those numbers. Every analysis on this page is a mathematical operation on that grid.

Example: Camera

A physical camera sensor does not record perfectly. Light arrives as individual photons (discrete particles) which introduces unavoidable randomness.

Every sensor pixel fires slightly inconsistently due to manufacturing tolerance. The readout amplifier adds electronic noise. These imperfections is how camera photographs own a distinctive noise texture that can be picked up by classifiers.

  • Spatially random (different every exposure)
  • Statistically predictable in aggregate (follows Poisson and Gaussian distributions)
  • Unique per camera model, even per individual unit (PRNU)

What makes an AI image statistically distinct:

Most AI-generated images are produced by a diffusion model.

A diffusion model works by starting with pure random noise and progressively denoising it toward a target description.

Unlike cameras, there is no sensor. There is no physical light path. The output pixel values come from learned weights, not photons. This produces images that are:

  • Artificially smooth at the finest scale (the model learned to remove high-frequency noise from training images)
  • Regular at coarser scales (the denoising process operates in frequency bands)
  • Constrained by training distribution (if the training set skewed toward certain lighting, colour palettes, or edge styles, the output inherits those biases)

The instruments on this page measure precisely these statistical differences.

We don't ask "does this look real?" We ask "do these numbers behave like photons hitting a sensor, or like a learned probability distribution?"


What each instrument reads

InstrumentWhat the computer measuresReal photographAI / screenshot
MetadataEXIF/XMP tags embedded by the camera at captureMake, Model, ISO, shutter speed, GPS — written by sensor hardwareEmpty or software-only fields. The absence of camera EXIF is the single strongest provenance signal.
Noise ProfileHigh-frequency residual after subtracting a blurred copyFine grain from photon shot noise and thermal noise — histogram spread evenlyNear-zero residual — the denoiser erased fine variation. Histogram spikes near zero.
Block VarianceLocal contrast in 8×8 pixel blocks, then coefficient of variation (CV)Uneven — sky blocks near zero variance, textured surfaces high. CV > 1.2 typicalUniform complexity — AI fills every region with "moderate" detail. CV 0.4–0.8 typical.
FDA — FrequencyVariance at each downsampling level (100%, 50%, 25%, 12.5%)Steep drop — fine grain disappears rapidly when averaged downFlatter curve — artificial detail added at generation time persists across scales
EC — EdgesSobel gradient magnitude and orientation histogramEdges from physical scene geometry — high entropy, all directions roughly equalEdges from learned priors — may cluster in characteristic directions for a generation style
CCC — ColourHow many of 192 HSV colour buckets are occupied; saturation distributionWide gamut — natural scenes span many hues and saturationsConstrained palette — models trained on filtered datasets inherit colour biases
GLCM — TextureHow often each brightness-level pair appears side by side (co-occurrence matrix)Diagonal-biased matrix — smooth gradients dominate; high homogeneityPattern depends on generation model and content — each architecture has a characteristic texture fingerprint
Feature ProfileAll of the above as raw numbers, grouped for side-by-side comparison

How AI images produce statistical tells

Understanding why the instruments find anything requires understanding what goes wrong during generation:

The denoising smoothness problem. Diffusion models are trained to remove noise. They do this so effectively that the output has less high-frequency variation than any real photograph. The noise tab measures this directly: an AI image's residual histogram spikes near zero because the model has erased the photon-level randomness that a camera always produces.

The frequency scale problem. A real photograph has strong texture variation at the original scale that falls off naturally as you zoom out (averaging removes detail). AI images add detail synthetically at every scale the model operates on, which can produce an unusually flat or even increasing variance curve across scales. The FDA tab shows this by measuring variance at four zoom levels.

The edge coherence problem. Real edges come from physical scene geometry — an object boundary, a cast shadow, a specular highlight. These produce edges in all directions roughly equally, depending on scene content. AI models learn edge patterns from training data and may overrepresent certain orientations (smooth skin transitions in portrait models; right-angle architecture in photorealism models). The EC tab measures orientation entropy: a flat radar = natural distribution; a spiked radar = orientation bias.

The colour distribution problem. Real-world photography spans a wide range of illumination conditions, colour temperatures, and scene types. AI models trained on curated internet images inherit their training set's colour biases — often slightly over-saturated, with certain hue clusters overrepresented. The CCC tab measures how many of 192 HSV cells are occupied; a narrow cluster suggests a constrained generation palette.

The texture regularity problem. The GLCM co-occurrence matrix captures how brightness levels relate to their immediate neighbors. Physical sensors produce certain characteristic transition patterns (photon noise creates predictable local statistics). AI generators produce transition patterns shaped by their architecture — convolution kernels, attention windows, and upsampling methods all leave fingerprints in the co-occurrence statistics.

The fundamental limit. Every model generation corrects for previously discovered tells. Midjourney v6, DALL-E 3, and SD3+ were each trained with adversarial feedback from detectors. The noise and frequency artifacts in early diffusion models are largely gone in current models. What remains reliable is metadata (written by file format, not model) and, to some extent, very fine-grained texture statistics that are difficult to adversarially eliminate without reducing image quality.


Limitations

LimitationWhat it means
Preset "real" images are screenshotsLabs thumbnails are browser-rendered UI exports — no sensor noise, no camera EXIF. They show digital rendering vs AI photorealism, not camera vs AI. Upload a smartphone photo for the strongest signal across all instruments.
Modern diffusion models defeat pixel analysisAdversarial training against detectors has eliminated most frequency and noise artifacts. Texture statistics (GLCM) and colour distribution are the remaining discriminatory signals — but they require training data to interpret reliably.
JPEG recompression changes noiseEach re-save (Twitter, Slack, CMS upload) accumulates quantization artifacts that mimic sensor grain. The noise residual reflects compression history, not sensor origin. Lossless PNG gives cleaner signal.
Digital art is not AIHuman-made digital illustrations have no sensor noise and may have constrained colour palettes — the same statistical profile as AI images on several instruments. These tools cannot distinguish AI from human digital art.
No trained classifier means no verdictThe feature measurements are real. Mapping them to "AI vs real" requires a trained classifier calibrated to specific model families and image types. Without that, the numbers describe the image — they do not classify it.

What should I look for?

There is no threshold to cross. Anyone who tells you "above X% CV it's real" is overfitting to a specific image generation model that will be obsolete in six months.

Forensic principles:

PrincipleWhat it means in practice
Converge across instrumentsNo single instrument is conclusive. Metadata, noise, variance, frequency, edge, and colour must agree. Three signals pointing the same direction is a composite case. One anomalous reading out of six is noise.
Establish provenance chainsThe reliable question is not "does this look real?" but "can its origin be traced?" A smartphone photo has a chain: device → capture → file → metadata. An AI image has a gap where the sensor should be.
Work with the earliest copyEvery JPEG re-save changes the noise profile. Work from the original file, not a social media repost.
Never rely on visual inspectionAI images are designed to be visually indistinguishable. These instruments measure below the perceptual threshold. "It just looks off" is a different category from forensics.

Reading each tab:

InstrumentWhat to compareRough guide
MetadataCamera fields present vs absentMake/Model/ISO present = real camera provenance. Empty = screenshot, AI, or stripped. Both empty = metadata tells you nothing.
NoiseHistogram shape, not just meanTight spike near zero = AI or vector. Spread histogram = sensor grain. Mean < 3 = unusually smooth. Mean > 12 = real sensor.
Block VarianceCV number, not meanCV > 1.2 = uneven complexity (sky + texture mix, real photo pattern). CV < 0.6 = uniform complexity (AI or flat digital render).
FDAShape of the variance dropSteep monotonic drop = natural fine-to-coarse falloff. Flat or bumpy curve = synthetic detail added at multiple scales.
EdgesRadar shape + entropy valueNear-circular radar + entropy near 1.0 = edges in all directions (natural scene). Spiked radar = orientation bias (architectural or portrait-style model artifacts).
ColourNumber of occupied cells + palette shapeMany scattered cells = wide natural gamut. Clustered cells in narrow hue range = constrained generation palette.
GLCMContrast and entropy valuesHigh contrast + high entropy = complex, varied texture. Low contrast + high energy = smooth, uniform generation output.

Glossary

How images are stored

TermDefinition
PixelThe smallest unit of an image — a single coloured dot. A 1920×1080 image has 2,073,600 pixels.
RGBRed, Green, Blue — the three colour channels stored per pixel, each 0–255. The combination encodes any visible colour.
HSVHue (colour angle 0–360°), Saturation (colour purity 0–1), Value (brightness 0–1). A different way to encode the same colour that is more intuitive for colour analysis.
JPEGA lossy image format. Compresses by discarding high-frequency detail in 8×8 blocks. Each re-save discards more detail and adds quantization artifacts.
EXIFExchangeable Image File Format — metadata written into JPEG/TIFF by the camera at capture. Contains sensor settings, lens focal length, GPS, timestamps. Absent from most AI-generated images.
XMPAdobe's XML metadata format, embeddable in JPEG/PNG/PDF. Sometimes written by generation software. Less diagnostic than camera EXIF.

How cameras work

TermDefinition
Shot noiseRandom pixel variation from discrete photon capture. Individual photons arrive randomly — Poisson statistics — creating characteristic grain in real photographs. Cannot be faked cleanly.
PRNUPhoto Response Non-Uniformity — a unique noise fingerprint from manufacturing imperfections in the sensor. Every camera has one. Absent from AI images. Links a photograph to a specific physical device.
Sensor noiseThe combined effect of shot noise, thermal noise, and read noise from the amplifier. Creates the grainy texture measured by the noise residual tab.

What the instruments compute

TermDefinition
High-pass filterAny operation that removes smooth gradients and keeps fine detail. The noise tab uses box-blur subtraction: subtract a blurred copy from the original to isolate grain.
Noise residualThe result of the high-pass filter — what's left after smooth areas are removed. Camera photos have a spread residual; AI images spike near zero.
Block varianceStatistical variance (spread of pixel brightness) within a small region. High = complex texture. Low = smooth or uniform.
CVCoefficient of variation — standard deviation ÷ mean. Measures relative spread. CV = 0: all blocks identical. CV = 1: spread equals mean. High CV = uneven scene complexity (real photos). Low CV = uniform complexity (AI).
FDAFrequency Domain Analysis — here computed by measuring variance at multiple downsampled scales. Shows how texture energy decays with scale. Steep = natural. Flat = synthetic.
Spectral slopeThe rate at which variance falls across scales. A negative slope = energy falls as expected. A shallow or positive slope = artificial detail at small scales.
Sobel filterA convolution kernel that estimates the image gradient — how rapidly brightness changes across each pixel. High gradient = edge present. The EC tab runs this to find edges.
Orientation entropyHow evenly edge directions are distributed across 8 compass angles. Maximum (1.0) = edges point in all directions equally (natural scenes). Low = edges cluster in one direction (stylistic bias).
GLCMGray-Level Co-occurrence Matrix — records how often pairs of brightness levels appear side by side. A 16×16 matrix: cell (i, j) = frequency of level-i pixel adjacent to level-j pixel. Captures local texture structure.
Haralick featuresFour summary statistics computed from the GLCM: Energy (texture uniformity), Contrast (local brightness variation), Homogeneity (similarity to smooth gradients), Entropy (texture complexity). Named after Robert Haralick (1973).
CCCColour Cluster Count — the number of distinct HSV colour buckets occupied by an image. High count = wide natural gamut. Low count = constrained or desaturated palette.
Quantization artifactStructured error from JPEG compression. JPEG rounds DCT frequency coefficients in 8×8 blocks. Repeated saves accumulate these and can mimic sensor grain in the noise residual.

AI generation

TermDefinition
Diffusion modelThe dominant AI image generation architecture (Midjourney, DALL-E, Stable Diffusion). Starts with random noise and iteratively denoises it guided by a text prompt. No physical sensor — no photons, no shot noise.
Latent spaceThe compressed internal representation a diffusion model works in. Generation happens in this lower-dimensional space, then decoded to pixels. Upsampling at decode can introduce characteristic frequency artifacts.
Adversarial trainingTraining a model to fool a detector while generating. Modern models were trained against the same statistical detectors used here, which is why pixel-level analysis has diminishing returns on current outputs.
ELAError Level Analysis — re-compresses at known JPEG quality, measures difference. Edited or composited regions show different error levels. Not implemented here but complementary to the noise residual.

Further reading

How cameras produce noise

  • Janesick, J. R. — Scientific Charge-Coupled Devices (2001). The standard reference for CCD sensor physics including shot noise, read noise, and dark current — the physical sources measured by the noise residual tab.
  • Healey, G., Kondepudy, R. — Radiometric CCD camera calibration and noise estimation (1994). Early paper on modelling sensor noise statistically.

Image forensics foundations

  • Farid, H. — Digital Image Forensics (2016). Standard academic reference for image authentication. Covers clone detection, splicing detection, sensor noise fingerprinting, and compression analysis.
  • Lukas, J., Fridrich, J., Goljan, M. — Digital camera identification from sensor pattern noise (2006). The paper that established PRNU as a reliable camera fingerprint.
  • Haralick, R. M., Shanmugam, K., Dinstein, I. — Textural features for image classification (1973). The original paper for GLCM and the four Haralick features computed in the texture tab.

AI image detection

  • Wang, S. Y., Wang, O., Zhang, R., Owens, A., Efros, A. A. — CNN-generated images are surprisingly easy to spot… for now (CVPR 2020). Early work on GAN spectral artifacts. The "for now" in the title aged correctly — adversarial training by newer models largely defeated these detectors.
  • Corvi, R. et al. — On the detection of synthetic images generated by diffusion models (ICASSP 2023). More recent work showing that diffusion model artifacts differ substantially from GAN artifacts and require different detection approaches.

Accessible starting points

  • The Hany Farid lab at UC Berkeley (farid.berkeley.edu) publishes readable summaries of image forensics methods and maintains tools for practitioners.
  • Bellingcat's guides on open-source image verification — focused on investigative journalism: provenance tracing, reverse image search, shadow and lighting consistency checks that complement statistical analysis.
  • MIT Media Lab's Detect Fakes project — explores the limits of human perception in distinguishing AI faces from real ones. Context for why instrument-based analysis matters.