SecurityBest Practices

Validation vs Sanitization — What's the Difference?

Validation and sanitization are both about handling untrusted input, but they do fundamentally different things. Confusing them — or skipping one — is a source of both data quality bugs and security vulnerabilities.

1. Definitions — the one-sentence version

Validation

Accept or reject input based on whether it meets defined rules. The original data is never modified — you either keep it or return an error.

✂️

Sanitization

Transform or clean input to make it safe for a specific context. The data may be modified — characters removed, encoded, or escaped.


2. Side-by-side comparison

DimensionValidationSanitization
Question it answers"Is this data valid?""Is this data safe to use in context X?"
OutputAccept ✓ or reject ✗Modified / cleaned data
Modifies input?NeverYes — removes or encodes characters
When to useBefore processing any inputBefore outputting to HTML, SQL, shell, etc.
Failure actionReturn error to callerStrip / encode problematic characters
Primary concernData correctnessSecurity / injection prevention
Example toolsZod, Joi, Pydantic, IsValid APIDOMPurify, parameterized queries, bleach

3. Validation in depth

Validation answers a binary question: does this data conform to the rules? There are two layers:

Format validation

Checks structure, type, length, and pattern.

import { z } from 'zod'

const PaymentSchema = z.object({
  iban: z.string().min(15).max(34).regex(/^[A-Z]{2}[0-9]{2}[A-Z0-9]+$/),
  amount: z.number().positive().max(1_000_000),
  currency: z.enum(['EUR', 'GBP', 'USD']),
})

Semantic validation

Checks whether the value is meaningful — a valid checksum, a registered entity, or a consistent combination of fields. This requires domain knowledge that schema validators don't have.

// Format says: "looks like an IBAN"
// Semantic says: "mod-97 passes, bank exists, is SEPA-eligible"
const result = await iv.iban(body.iban)
if (!result.valid) throw new ValidationError('IBAN checksum failed')
if (!result.isSEPA) throw new ValidationError('IBAN is not SEPA-eligible')

4. Sanitization in depth

Sanitization is context-dependent. The same string might be safe in one context and dangerous in another.

HTML context — prevent XSS

import DOMPurify from 'dompurify'
// User-supplied HTML (e.g. rich text editor) — strip malicious tags
const safeHtml = DOMPurify.sanitize(userInput)

SQL context — prevent injection

⚠️The correct answer to SQL injection is parameterized queries, not string sanitization. String-based sanitization is error-prone and easily bypassed.
// ❌ String sanitization — fragile, error-prone
const unsafe = `SELECT * FROM users WHERE id = '${id.replace(/'/g, "''")}'`

// ✅ Parameterized query — correct approach
const safe = await db.query('SELECT * FROM users WHERE id = $1', [id])

Normalisation (not strictly sanitization)

Some identifiers have multiple valid representations. Normalisation converts them to a canonical form before storage — this is data quality, not security.

// IBAN: strip spaces before storage
const normalised = iban.replace(/\s+/g, '').toUpperCase()
// "GB29 NWBK 6016 1331 9268 19" → "GB29NWBK60161331926819"

// ETH address: EIP-55 checksum normalisation
const checksummed = getAddress(address.toLowerCase())
// "0x5aaeb6..." → "0x5aAeb6053F3E94C9b9A09f33669435E7Ef1BeAed"

5. Which comes first?

Validate before you sanitize — and sanitize for the output context, not the input.

1
Receive inputRaw, untrusted data from request body / query params
2
Format validationType check, length, pattern — reject early if wrong shape
3
Semantic validationChecksum, registry lookup — reject if not meaningful
4
Business logicProcess the validated data
5
Context-specific sanitizationEscape for HTML output, parameterize for SQL, etc.
6
Store / respondPersist clean data, return safe response

6. When you need both

A user registration form typically needs both:

// 1. Validate — is this a structurally valid email with an active MX record?
const emailResult = await iv.email(body.email)
if (!emailResult.valid) throw new ValidationError('Invalid email address')

// 2. Sanitize for display — when rendering the email in a confirmation page
const safeEmail = escapeHtml(body.email)  // prevent XSS if reflected in HTML
// "<script>…</script>@evil.com" → safe to display even if validation passed format check

// 3. Normalise for storage — lowercase canonical form
const canonicalEmail = body.email.toLowerCase().trim()
💡For free-text fields (comments, bios, descriptions): validate length and character set, then sanitize for the output context. For structured identifiers (IBANs, tax IDs, emails): validate semantically, then normalise to canonical form for storage.

7. Never sanitize structured identifiers

⚠️Never sanitize a structured identifier like an IBAN, VAT number, or IMEI — validate it instead. Sanitizing (stripping characters) would turn an invalid identifier into something that looks valid but isn't.
// ❌ Wrong — sanitizing an IBAN destroys its meaning
const "sanitized" = iban.replace(/[^A-Z0-9]/g, '')
// "GB29 NWBK 60161331 WRONG" → "GB29NWBK60161331WRONG" — now looks valid, still wrong

// ✅ Correct — validate the IBAN, reject if invalid
const result = await iv.iban(iban)
if (!result.valid) return res.status(422).json({ error: 'Invalid IBAN' })

// ✅ Then normalise (not sanitize) for storage
const canonical = iban.replace(/\s+/g, '').toUpperCase()

8. Checklist

Validate all inputs on the server before processing
Use format validation (Zod/Pydantic) as first pass
Add semantic validation for structured identifiers
Reject invalid input — never silently modify it
Sanitize output for HTML (XSS), SQL (parameterize)
Never sanitize structured identifiers — validate them
Normalise identifiers to canonical form for storage
Return 422 with field-level errors for validation failures

Related guides

Semantic validation made easy

50+ identifier types — IBAN, VAT, credit card, GSTIN, IMEI, MAC, crypto addresses, and more.

Get your API key →