How to Extract Protocol, Domain and Port From URL?

Published July 6, 2024

Problem: Parsing URL Components

When working with URLs in programming, you often need to break them down into their parts. Getting the protocol, domain, and port from a URL is a common task for developers. This process helps handle web addresses better, improves security checks, and makes network communication more efficient. However, manually parsing these elements can lead to errors and take a long time, especially when dealing with different URL formats.

Understanding URL Components

A URL (Uniform Resource Locator) is an address that points to a resource on the internet. It has parts, including the protocol, domain name, and sometimes a port number. These parts work together to find web resources.

The protocol sets how data moves between the client and server. Common protocols are HTTP (Hypertext Transfer Protocol) and HTTPS (HTTP Secure). The domain name is the address of a website, like "example.com". The port number, when included, shows which port on the server to use for communication.

Taking out these parts from a URL is useful for:

  1. Security: Checking the protocol to see if a connection is secure (HTTPS) or not (HTTP).

  2. Routing: Using the domain name to send requests to the right server.

  3. Troubleshooting: Using the port number to find connection problems.

  4. Data processing: Splitting URL parts to work with web addresses more easily.

  5. API integrations: Using specific URL parts for login and getting data from web services.

Extracting Protocol from URL

The protocol is part of a URL, showing how data moves between the client and server. JavaScript offers ways to get this information from a URL.

Using JavaScript to Get Protocol

To extract the protocol from a URL in JavaScript, you can use the URL object:

function getProtocol(url) {
  const urlObject = new URL(url);
  return urlObject.protocol;
}

This function creates a URL object from the given URL string and returns the protocol. The protocol includes the colon at the end, like "http:" or "https:".

For a basic approach, you can use string manipulation:

function getProtocolSimple(url) {
  return url.split(':')[0];
}

This method splits the URL at the first colon and takes the first part, which is the protocol.

When handling different protocol types, remember:

  • HTTP and HTTPS are common for web URLs.
  • Other protocols include FTP, mailto, and file.
  • Some URLs might not have a protocol (protocol-relative URLs).

To handle various cases:

function getProtocolSafe(url) {
  if (url.startsWith('//')) {
    return 'https'; // Assume HTTPS for protocol-relative URLs
  }
  const match = url.match(/^([a-z]+):/i);
  return match ? match[1].toLowerCase() : null;
}

This function checks for protocol-relative URLs, uses a regular expression to find the protocol, and returns it in lowercase. If no protocol is found, it returns null.

Extracting Domain Name from URL

The domain name is part of a URL, identifying the website. Extracting it is useful for many web tasks. JavaScript offers ways to get the domain name from a URL.

JavaScript Techniques for Domain Extraction

Using URL object

The URL object in JavaScript helps extract the domain name:

function getDomain(url) {
  const urlObject = new URL(url);
  return urlObject.hostname;
}

This function creates a URL object and returns its hostname property. For example:

const url = 'https://www.example.com:8080/path?query=value';
console.log(getDomain(url)); // Output: www.example.com

This method works with URLs that have subdomains or IP addresses.

Regular expression for domain extraction

For more control or when the URL object isn't available, you can use a regular expression:

function getDomainRegex(url) {
  const match = url.match(/^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n?]+)/im);
  return match ? match[1] : null;
}

This regex-based function:

  • Works with or without the protocol
  • Handles URLs with or without 'www'
  • Extracts the domain name up to the first slash, colon, or question mark

Example usage:

const url1 = 'https://subdomain.example.com/path';
const url2 = 'http://www.test-site.co.uk:8080/path';

console.log(getDomainRegex(url1)); // Output: subdomain.example.com
console.log(getDomainRegex(url2)); // Output: test-site.co.uk

Both methods extract domain names from URLs in JavaScript. The URL object method is simple and works for most cases, while the regex approach offers more flexibility for complex URL structures.

Extracting Port Number from URL

The port number in a URL specifies the endpoint for communication on the server. It's an important part of network connections. Here are methods to get port information from URLs.

Methods to Get Port Information

To extract the port number from a URL, you can use JavaScript's URL object:

function getPort(url) {
  const urlObject = new URL(url);
  return urlObject.port || null;
}

This function returns the port if it's in the URL, or null if it's not.

For URLs without a port, you can get the default port based on the protocol:

function getPortWithDefault(url) {
  const urlObject = new URL(url);
  if (urlObject.port) {
    return urlObject.port;
  }
  switch (urlObject.protocol) {
    case 'http:':
      return '80';
    case 'https:':
      return '443';
    case 'ftp:':
      return '21';
    default:
      return null;
  }
}

This function returns the port if present, or the default port for common protocols.

Default port numbers for common protocols:

  • HTTP: 80
  • HTTPS: 443
  • FTP: 21
  • SFTP: 22
  • SMTP: 25
  • POP3: 110

For URLs with ports, the port is easy to extract:

const url1 = 'https://example.com:8080/path';
console.log(getPort(url1)); // Output: 8080

const url2 = 'http://localhost:3000';
console.log(getPort(url2)); // Output: 3000

When handling URLs without ports:

const url3 = 'https://example.com/path';
console.log(getPortWithDefault(url3)); // Output: 443

const url4 = 'http://example.org';
console.log(getPortWithDefault(url4)); // Output: 80

These methods let you extract port information from URLs, whether the port is stated or implied by the protocol.