SecurityBest Practices

Validation vs Sanitization — What's the Difference?

Validation and sanitization are both about handling untrusted input, but they do fundamentally different things. Confusing them — or skipping one — is a source of both data quality bugs and security vulnerabilities.

Definitions — the one-sentence version
Side-by-side comparison
Validation in depth
Sanitization in depth
Which comes first?
When you need both
Never sanitize structured identifiers
Checklist

1. Definitions — the one-sentence version

✅

Validation

Accept or reject input based on whether it meets defined rules. The original data is never modified — you either keep it or return an error.

✂️

Sanitization

Transform or clean input to make it safe for a specific context. The data may be modified — characters removed, encoded, or escaped.

2. Side-by-side comparison

Dimension	Validation	Sanitization
Question it answers	"Is this data valid?"	"Is this data safe to use in context X?"
Output	Accept ✓ or reject ✗	Modified / cleaned data
Modifies input?	Never	Yes — removes or encodes characters
When to use	Before processing any input	Before outputting to HTML, SQL, shell, etc.
Failure action	Return error to caller	Strip / encode problematic characters
Primary concern	Data correctness	Security / injection prevention
Example tools	Zod, Joi, Pydantic, IsValid API	DOMPurify, parameterized queries, bleach

3. Validation in depth

Validation answers a binary question: does this data conform to the rules? There are two layers:

Format validation

Checks structure, type, length, and pattern.

import { z } from 'zod'

const PaymentSchema = z.object({
  iban: z.string().min(15).max(34).regex(/^[A-Z]{2}[0-9]{2}[A-Z0-9]+$/),
  amount: z.number().positive().max(1_000_000),
  currency: z.enum(['EUR', 'GBP', 'USD']),
})

Semantic validation

Checks whether the value is meaningful — a valid checksum, a registered entity, or a consistent combination of fields. This requires domain knowledge that schema validators don't have.

// Format says: "looks like an IBAN"
// Semantic says: "mod-97 passes, bank exists, is SEPA-eligible"
const result = await iv.iban(body.iban)
if (!result.valid) throw new ValidationError('IBAN checksum failed')
if (!result.isSEPA) throw new ValidationError('IBAN is not SEPA-eligible')

4. Sanitization in depth

Sanitization is context-dependent. The same string might be safe in one context and dangerous in another.

HTML context — prevent XSS

import DOMPurify from 'dompurify'
// User-supplied HTML (e.g. rich text editor) — strip malicious tags
const safeHtml = DOMPurify.sanitize(userInput)

SQL context — prevent injection

⚠️The correct answer to SQL injection is parameterized queries, not string sanitization. String-based sanitization is error-prone and easily bypassed.

// ❌ String sanitization — fragile, error-prone
const unsafe = `SELECT * FROM users WHERE id = '${id.replace(/'/g, "''")}'`

// ✅ Parameterized query — correct approach
const safe = await db.query('SELECT * FROM users WHERE id = $1', [id])

Normalisation (not strictly sanitization)

Some identifiers have multiple valid representations. Normalisation converts them to a canonical form before storage — this is data quality, not security.

// IBAN: strip spaces before storage
const normalised = iban.replace(/\s+/g, '').toUpperCase()
// "GB29 NWBK 6016 1331 9268 19" → "GB29NWBK60161331926819"

// ETH address: EIP-55 checksum normalisation
const checksummed = getAddress(address.toLowerCase())
// "0x5aaeb6..." → "0x5aAeb6053F3E94C9b9A09f33669435E7Ef1BeAed"

5. Which comes first?

Validate before you sanitize — and sanitize for the output context, not the input.

Receive input— Raw, untrusted data from request body / query params

Format validation— Type check, length, pattern — reject early if wrong shape

Semantic validation— Checksum, registry lookup — reject if not meaningful

Business logic— Process the validated data

Context-specific sanitization— Escape for HTML output, parameterize for SQL, etc.

Store / respond— Persist clean data, return safe response

6. When you need both

A user registration form typically needs both:

// 1. Validate — is this a structurally valid email with an active MX record?
const emailResult = await iv.email(body.email)
if (!emailResult.valid) throw new ValidationError('Invalid email address')

// 2. Sanitize for display — when rendering the email in a confirmation page
const safeEmail = escapeHtml(body.email)  // prevent XSS if reflected in HTML
// "<script>…</script>@evil.com" → safe to display even if validation passed format check

// 3. Normalise for storage — lowercase canonical form
const canonicalEmail = body.email.toLowerCase().trim()

💡For free-text fields (comments, bios, descriptions): validate length and character set, then sanitize for the output context. For structured identifiers (IBANs, tax IDs, emails): validate semantically, then normalise to canonical form for storage.

7. Never sanitize structured identifiers

⚠️Never sanitize a structured identifier like an IBAN, VAT number, or IMEI — validate it instead. Sanitizing (stripping characters) would turn an invalid identifier into something that looks valid but isn't.

// ❌ Wrong — sanitizing an IBAN destroys its meaning
const "sanitized" = iban.replace(/[^A-Z0-9]/g, '')
// "GB29 NWBK 60161331 WRONG" → "GB29NWBK60161331WRONG" — now looks valid, still wrong

// ✅ Correct — validate the IBAN, reject if invalid
const result = await iv.iban(iban)
if (!result.valid) return res.status(422).json({ error: 'Invalid IBAN' })

// ✅ Then normalise (not sanitize) for storage
const canonical = iban.replace(/\s+/g, '').toUpperCase()