Validation vs Sanitization — What's the Difference?
Validation and sanitization are both about handling untrusted input, but they do fundamentally different things. Confusing them — or skipping one — is a source of both data quality bugs and security vulnerabilities.
Contents
1. Definitions — the one-sentence version
Validation
Accept or reject input based on whether it meets defined rules. The original data is never modified — you either keep it or return an error.
Sanitization
Transform or clean input to make it safe for a specific context. The data may be modified — characters removed, encoded, or escaped.
2. Side-by-side comparison
| Dimension | Validation | Sanitization |
|---|---|---|
| Question it answers | "Is this data valid?" | "Is this data safe to use in context X?" |
| Output | Accept ✓ or reject ✗ | Modified / cleaned data |
| Modifies input? | Never | Yes — removes or encodes characters |
| When to use | Before processing any input | Before outputting to HTML, SQL, shell, etc. |
| Failure action | Return error to caller | Strip / encode problematic characters |
| Primary concern | Data correctness | Security / injection prevention |
| Example tools | Zod, Joi, Pydantic, IsValid API | DOMPurify, parameterized queries, bleach |
3. Validation in depth
Validation answers a binary question: does this data conform to the rules? There are two layers:
Format validation
Checks structure, type, length, and pattern.
import { z } from 'zod' const PaymentSchema = z.object({ iban: z.string().min(15).max(34).regex(/^[A-Z]{2}[0-9]{2}[A-Z0-9]+$/), amount: z.number().positive().max(1_000_000), currency: z.enum(['EUR', 'GBP', 'USD']), })
Semantic validation
Checks whether the value is meaningful — a valid checksum, a registered entity, or a consistent combination of fields. This requires domain knowledge that schema validators don't have.
// Format says: "looks like an IBAN" // Semantic says: "mod-97 passes, bank exists, is SEPA-eligible" const result = await iv.iban(body.iban) if (!result.valid) throw new ValidationError('IBAN checksum failed') if (!result.isSEPA) throw new ValidationError('IBAN is not SEPA-eligible')
4. Sanitization in depth
Sanitization is context-dependent. The same string might be safe in one context and dangerous in another.
HTML context — prevent XSS
import DOMPurify from 'dompurify' // User-supplied HTML (e.g. rich text editor) — strip malicious tags const safeHtml = DOMPurify.sanitize(userInput)
SQL context — prevent injection
// ❌ String sanitization — fragile, error-prone const unsafe = `SELECT * FROM users WHERE id = '${id.replace(/'/g, "''")}'` // ✅ Parameterized query — correct approach const safe = await db.query('SELECT * FROM users WHERE id = $1', [id])
Normalisation (not strictly sanitization)
Some identifiers have multiple valid representations. Normalisation converts them to a canonical form before storage — this is data quality, not security.
// IBAN: strip spaces before storage const normalised = iban.replace(/\s+/g, '').toUpperCase() // "GB29 NWBK 6016 1331 9268 19" → "GB29NWBK60161331926819" // ETH address: EIP-55 checksum normalisation const checksummed = getAddress(address.toLowerCase()) // "0x5aaeb6..." → "0x5aAeb6053F3E94C9b9A09f33669435E7Ef1BeAed"
5. Which comes first?
Validate before you sanitize — and sanitize for the output context, not the input.
6. When you need both
A user registration form typically needs both:
// 1. Validate — is this a structurally valid email with an active MX record? const emailResult = await iv.email(body.email) if (!emailResult.valid) throw new ValidationError('Invalid email address') // 2. Sanitize for display — when rendering the email in a confirmation page const safeEmail = escapeHtml(body.email) // prevent XSS if reflected in HTML // "<script>…</script>@evil.com" → safe to display even if validation passed format check // 3. Normalise for storage — lowercase canonical form const canonicalEmail = body.email.toLowerCase().trim()
7. Never sanitize structured identifiers
// ❌ Wrong — sanitizing an IBAN destroys its meaning const "sanitized" = iban.replace(/[^A-Z0-9]/g, '') // "GB29 NWBK 60161331 WRONG" → "GB29NWBK60161331WRONG" — now looks valid, still wrong // ✅ Correct — validate the IBAN, reject if invalid const result = await iv.iban(iban) if (!result.valid) return res.status(422).json({ error: 'Invalid IBAN' }) // ✅ Then normalise (not sanitize) for storage const canonical = iban.replace(/\s+/g, '').toUpperCase()
8. Checklist
Related guides
Semantic validation made easy
50+ identifier types — IBAN, VAT, credit card, GSTIN, IMEI, MAC, crypto addresses, and more.
Get your API key →