URL Validation in Node.js — Beyond Regex
A URL looks simple until you deal with encoding, unicode domains, protocol schemes, relative paths, and query string edge cases. Here's how to validate and parse URLs properly — with a single API call.
In this guide
1. Why URL validation is harder than it looks
URLs appear deceptively simple — a protocol, a domain, maybe a path. In reality, the full URL specification (RFC 3986) covers a surprising number of edge cases that make naive validation unreliable:
Percent-encoding
Spaces become %20, special characters get encoded. A valid URL can contain sequences like %E2%9C%93 that look like garbage but represent valid UTF-8 characters.
Unicode and IDN domains
Internationalised domain names like xn--nxasmq6b.com (Punycode) or direct unicode domains like münchen.de are perfectly valid but break most simple validators.
Protocol schemes
URLs are not just http:// and https://. There are ftp://, mailto:, tel:, data:, custom-app:// schemes, and protocol-relative URLs starting with //.
Relative URLs
Paths like /about, ../images/logo.png, or ?q=search are valid relative URLs but have no protocol or domain — context determines their meaning.
A proper URL validator needs to handle all of these while also decomposing the URL into its component parts — protocol, domain, port, path, query parameters, and fragment identifier.
2. The anatomy of a URL
Every URL is composed of up to seven distinct parts. Understanding these components is essential for proper validation and parsing:
https://example.com:8080/search?q=hello+world&lang=en#results └─┬──┘ └─────┬─────┘└─┬─┘└──┬──┘└──────────┬──────────┘└──┬───┘ protocol domain port path query fragment
| Component | Example | Notes |
|---|---|---|
| Protocol | https | The scheme — http, https, ftp, mailto, etc. |
| Domain | example.com | The hostname — can be an IP, IDN, or standard domain |
| Port | 8080 | Optional — defaults to 80 (HTTP) or 443 (HTTPS) |
| Path | /search | The resource path — can contain encoded characters |
| Query | q=hello+world&lang=en | Key-value pairs after the ? delimiter |
| Fragment | results | Client-side anchor after the # — never sent to server |
3. Why regex fails for URLs
RFC 3986 defines the URL syntax, and the full specification is far too complex for a practical regex. Most regex-based validators fall into the same traps:
// ❌ Too strict — rejects valid URLs const SIMPLE_REGEX = /^https?:\/\/[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(\/.*)?$/; SIMPLE_REGEX.test('https://example.com/path?q=hello world'); // false — space in query SIMPLE_REGEX.test('https://例え.jp'); // false — IDN domain SIMPLE_REGEX.test('ftp://files.example.com/doc.pdf'); // false — non-http scheme SIMPLE_REGEX.test('https://localhost:3000'); // false — no TLD // ❌ Too loose — accepts invalid URLs const LOOSE_REGEX = /^https?:\/\/.+/; LOOSE_REGEX.test('https://'); // true ✗ — no domain LOOSE_REGEX.test('https:// not a url'); // true ✗ — spaces in domain LOOSE_REGEX.test('https://...'); // true ✗ — empty labels
The Punycode problem
Internationalised domain names are encoded as Punycode in DNS. The domain münchen.de becomes xn--mnchen-3ya.de. A regex that only allows ASCII letters will reject either the unicode form or the Punycode form (which contains the xn-- prefix).
Query string complexity
Query strings can contain encoded special characters, nested brackets (e.g. filter[name]=value), empty values, duplicate keys, and plus signs as spaces. A regex cannot meaningfully parse these — it would need a full URL parser.
Node.js URL constructor is not enough
Node.js provides new URL() which parses URLs well, but it throws on invalid input rather than returning a structured validation result. It also accepts many strings that are technically valid per the WHATWG URL spec but are not useful URLs in practice.
// new URL() accepts some surprising inputs new URL('https:///'); // valid — empty host new URL('https://[::1]'); // valid — IPv6 loopback new URL('blob:null/uuid'); // valid — blob URL // And throws on others that seem reasonable new URL('example.com'); // throws — no scheme new URL('//cdn.example.com'); // throws — protocol-relative
try { new URL(str) } catch { ... } as your only validation will accept blob URLs, data URIs, and other technically-valid-but-unusual URLs while rejecting protocol-relative URLs that are common in practice.4. The right solution
The IsValid URL API validates and parses URLs in a single request. It returns a boolean validity flag plus all decomposed components — protocol, domain, path, query parameters as a structured object, port, and fragment.
Full parameter reference and response schema: URL Validation API docs →
5. Node.js code example
Using the IsValid SDK or the native fetch API.
import { createClient } from '@isvalid-dev/sdk'; const iv = createClient({ apiKey: process.env.ISVALID_API_KEY }); // ── Example usage ──────────────────────────────────────────────────────────── const result = await iv.url('https://example.com/search?q=hello+world&lang=en#results'); console.log(result.valid); // true console.log(result.protocol); // 'https' console.log(result.isHttps); // true console.log(result.domain); // 'example.com' console.log(result.path); // '/search' console.log(result.query); // { q: 'hello world', lang: 'en' } console.log(result.hash); // 'results'
In a link-shortener or webhook handler — validate user-submitted URLs before storing:
// routes/links.js (Express) app.post('/shorten', async (req, res) => { const { url } = req.body; let check; try { check = await validateUrl(url); } catch { return res.status(502).json({ error: 'URL validation service unavailable' }); } if (!check.valid) { return res.status(400).json({ error: 'Invalid URL' }); } if (!check.isHttps) { return res.status(400).json({ error: 'Only HTTPS URLs are accepted for security reasons.', }); } // Proceed with URL shortening const shortLink = await createShortLink({ originalUrl: url, domain: check.domain, path: check.path, }); res.json({ shortUrl: shortLink }); });
domain field to build allowlists or blocklists. For example, you can reject URLs pointing to known phishing domains without needing to parse the URL yourself.6. cURL example
Validate a URL with query parameters and fragment:
curl -G -H "Authorization: Bearer YOUR_API_KEY" \ --data-urlencode "value=https://example.com/search?q=hello+world&lang=en#results" \ "https://api.isvalid.dev/v0/url"
Test with a URL that has a port:
curl -G -H "Authorization: Bearer YOUR_API_KEY" \ --data-urlencode "value=https://api.example.com:8080/v1/users" \ "https://api.isvalid.dev/v0/url"
Test with an invalid URL:
curl -G -H "Authorization: Bearer YOUR_API_KEY" \ --data-urlencode "value=not-a-url" \ "https://api.isvalid.dev/v0/url"
7. Understanding the response
Valid HTTPS URL with query parameters and fragment:
{ "valid": true, "protocol": "https", "isHttps": true, "domain": "example.com", "path": "/search", "query": { "q": "hello world", "lang": "en" }, "port": null, "hash": "results" }
Valid URL with explicit port and no query or fragment:
{ "valid": true, "protocol": "https", "isHttps": true, "domain": "api.example.com", "path": "/v1/users", "query": {}, "port": "8080", "hash": null }
Invalid URL:
{ "valid": false }
| Field | Type | Description |
|---|---|---|
| valid | boolean | Whether the URL is structurally valid |
| protocol | string | The URL scheme — e.g. "https", "http", "ftp" |
| isHttps | boolean | true if the protocol is HTTPS |
| domain | string | The hostname portion of the URL |
| path | string | The path component after the domain |
| query | object | Parsed query string as key-value pairs |
| port | string | null | The port number if explicitly specified, null otherwise |
| hash | string | null | The fragment identifier (without the # prefix), null if absent |
8. Edge cases
Internationalised domain names (IDN)
URLs with unicode domains like https://münchen.de/info are valid. They get encoded as Punycode (xn--mnchen-3ya.de) in DNS. The IsValid API handles both forms — you can submit either the unicode or Punycode version and get a valid parse.
// Both forms are accepted const unicode = await iv.url('https://münchen.de/info'); const punycode = await iv.url('https://xn--mnchen-3ya.de/info'); // Both return valid: true with domain parsed correctly
Data URIs
Data URIs (data:text/html;base64,...) are technically valid URIs but are not network URLs. Depending on your use case, you may want to reject them after validation by checking that the protocol field is http or https.
Missing protocol
Users often type example.com without a protocol. This is not a valid URL per RFC 3986. If you want to be user-friendly, prepend https:// before validating:
function normalizeUrl(input) { const trimmed = input.trim(); if (!/^[a-zA-Z][a-zA-Z0-9+.-]*:/.test(trimmed)) { return 'https://' + trimmed; } return trimmed; } const result = await iv.url(normalizeUrl('example.com/path')); // Validates https://example.com/path
Query string encoding
The query object in the response contains decoded key-value pairs. Plus signs in query values are decoded as spaces (e.g. q=hello+world becomes { q: 'hello world' }). Percent-encoded characters are also decoded, so q=caf%C3%A9 becomes { q: 'café' }.
Summary
See also
Validate and parse URLs instantly
Free tier includes 100 API calls per day. No credit card required. Full URL decomposition with protocol, domain, path, query, port, and fragment — under 10ms.