Decoding Short URLs

If there’s one thing that makes the Web unique, it’s the URL. Each URL is a unique address. They make the web. They’re used in hyperlinks to connect one page to another. Good URLs inform the user about the content of the page. In most cases, The title you see in the browser window matches the URL. Sometimes, they can be quite long — particularly for blog posts.

Long URLs can sometimes be a problem. When used in email, social media, or IM, a long URL might wrap across multiple lines. When this happens, the user could get sent to the wrong place. To fix this, URL shorteners were created. A short URL is machine-generated URL that refers or points to a long URL. One popular service is bit.ly, but there are many others. Some companies allow you to use custom URLs that reflect your brand. A URL used properly sends you directly to the desired page.

Do you know the short URL will send you? It’s not always where you think!

There’s a dark side to short URLs. When a short URL is used, it obscures the real address. You can no longer know where it’ll take you. You hope it’ll take you to the content you want, but that’s not guaranteed. Often, you can be sent to another short URL. This happens because brands want you to use their shortening service – for traffic and name recognition. Wanna learn how to get rid of all those layers or redirection? keep reading.

NOTE: a lot of sites do this. I’m just using TechCrunch as an example

Look at this tweet from TechCrunch. It’s a well known brand, so you trust it. You believe it’ll take you directly to the story. Let’s follow what really happens. To demonstrate what happens, I’ll use the Terminal to run a command called cURL. cURL lets you execute commands like a browser would. I’ll use curl to examine the headers, and skip the loading of the page content.

$ curl --head tcrn.ch/1AVALnu

HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Tue, 07 Apr 2015 15:19:52 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 109
Connection: keep-alive
Cache-Control: private, max-age=90
Location: http://trib.al/zGMBvRe
Mime-Version: 1.0
Set-Cookie: _bit=5523f598-00243-01613-321cf10a;domain=.tcrn.ch;expires=Sun Oct 4 15:19:52 2015;path=/; HttpOnly

if you look at the Location line above, it’s another shortcode. Also it’s setting a cookie. This is one of those tricks marketers use to track your movements around the web. Lets take that short URL and feed it back into cUrl.

$ curl --head http://trib.al/zGMBvRe

HTTP/1.1 301 Moved Permanently
Content-Length: 169
Content-Type: text/html;charset=utf-8
Date: Tue, 07 Apr 2015 15:19:54 GMT
Location: http://feedproxy.google.com/~r/Techcrunch/~3/KCW5Ju_epK4/
Server: CherryPy/3.2.4
Set-Cookie: tribal="LaoHargaTeiXGiMzSsJEfw=="; expires=Fri, 06 Jan 2034 23:13:40 GMT; Path=/; Version=2
Connection: keep-alive

This result shows it’ll send you to feedproxy.google.com, it too will set a cookie. The ‘TechCrunch’ name does appear in the path of the URL. So it’s probably related to TechCrunch.
$ curl --head http://feedproxy.google.com/~r/Techcrunch/~3/KCW5Ju_epK4/

HTTP/1.1 301 Moved Permanently
Location: http://techcrunch.com/2014/11/10/hacker-emails-testing-service-browserstacks-customers-says-company-lied-about-security/?ncid=rss&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Techcrunch+%28TechCrunch%29
Content-Type: text/html; charset=UTF-8
Date: Tue, 07 Apr 2015 15:19:54 GMT
Expires: Tue, 07 Apr 2015 15:19:54 GMT
Cache-Control: private, max-age=0
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Server: GSE
Alternate-Protocol: 80:quic,p=0.5
Transfer-Encoding: chunked
Accept-Ranges: none
Vary: Accept-Encoding

The feed proxy URL above shows that it should redirect to the TechCrunch URL, stated in the location. After it executes the redirect, You see the 200 OK response below. This is the final destination.

HTTP/1.1 200 OK
Server: nginx
Date: Tue, 07 Apr 2015 15:19:55 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Vary: Accept-Encoding
Vary: Cookie
X-hacker: If you're reading this, you should visit automattic.com/jobs and apply to join the fun, mention this header.
X-Pingback: http://techcrunch.com/xmlrpc.php
Link: <http://wp.me/p1FaB8-4xb0>; rel=shortlink
X-ac: 4.sjc _dfw

Finally, We’ve arrived at techcrunch.com. Notice all those parameters after the ? in the Location of the 301 redirect. Those are not necessary to look at this page. Their purpose is to provide TechCrunch with information about how you arrived on the site. You could copy that full URL, and delete the question mark and everything after it. You’d still get the same content. However, for other sites, those query parameters might determine the content. Search engines like DuckDuckGo and Bing use query strings to determine the content.

Why would you want to edit the full URL?

  • You don’t want to rely on foreign URL shorteners
  • Perhaps you are uncomfortable passing along marketing information
  • You want to verify the URL before sending it to someone else
  • Perhaps you want to use your own URL shortener

The benefit of going through the above process was for education. There’s a shorter way to do the same thing. Let cURL do all the work for you by using the following:

curl --head --location http://tcrn.ch/1AVALnu

In the curl manpage, under the -L, –location section it states If used together with -i, –include or -I, –head, headers from all requested pages will be shown.. You can make this simpler to use, add the traceurl function below to your .bashrc. If you have the latest Opal on your system, it’s already included.

$ traceurl http://tcrn.ch/1AVALnu

function traceurl(){
    if [[ -n $1 ]]
    then
        curl --location --head $1
    else
        echo 'Whoops! You forgot to specify a short URL'
    fi
}

Sorry, but comments are closed. I hope you enjoyed the article