Decoding Short URLs
If there’s one thing that makes the Web unique, it’s the URL. Each URL is a unique address. They make the web. They’re used in hyperlinks to connect one page to another. Good URLs inform the user about the content of the page. In most cases, The title you see in the browser window matches the URL. Sometimes, they can be quite long — particularly for blog posts.
Long URLs can sometimes be a problem. When used in email, social media, or IM, a long URL might wrap across multiple lines. When this happens, the user could get sent to the wrong place. To fix this, URL shorteners were created. A short URL is machine-generated URL that refers or points to a long URL. One popular service is bit.ly, but there are many others. Some companies allow you to use custom URLs that reflect your brand. A URL used properly sends you directly to the desired page.
Do you know the short URL will send you? It’s not always where you think!
There’s a dark side to short URLs. When a short URL is used, it obscures the real address. You can no longer know where it’ll take you. You hope it’ll take you to the content you want, but that’s not guaranteed. Often, you can be sent to another short URL. This happens because brands want you to use their shortening service – for traffic and name recognition. Wanna learn how to get rid of all those layers or redirection? keep reading.
NOTE: a lot of sites do this. I’m just using TechCrunch as an example
Look at this tweet from TechCrunch. It’s a well known brand, so you trust it. You believe it’ll take you directly to the story. Let’s follow what really happens. To demonstrate what happens, I’ll use the Terminal to run a command called cURL. cURL lets you execute commands like a browser would. I’ll use curl to examine the headers, and skip the loading of the page content.
$ curl --head tcrn.ch/1AVALnu
HTTP/1.1 301 Moved Permanently Server: nginx Date: Tue, 07 Apr 2015 15:19:52 GMT Content-Type: text/html; charset=utf-8 Content-Length: 109 Connection: keep-alive Cache-Control: private, max-age=90 Location: http://trib.al/zGMBvRe Mime-Version: 1.0 Set-Cookie: _bit=5523f598-00243-01613-321cf10a;domain=.tcrn.ch;expires=Sun Oct 4 15:19:52 2015;path=/; HttpOnly
if you look at the Location
line above, it’s another shortcode. Also it’s setting a cookie. This is one of those tricks marketers use to track your movements around the web. Lets take that short URL and feed it back into cUrl.
$ curl --head http://trib.al/zGMBvRe
HTTP/1.1 301 Moved Permanently Content-Length: 169 Content-Type: text/html;charset=utf-8 Date: Tue, 07 Apr 2015 15:19:54 GMT Location: http://feedproxy.google.com/~r/Techcrunch/~3/KCW5Ju_epK4/ Server: CherryPy/3.2.4 Set-Cookie: tribal="LaoHargaTeiXGiMzSsJEfw=="; expires=Fri, 06 Jan 2034 23:13:40 GMT; Path=/; Version=2 Connection: keep-alive
This result shows it’ll send you to feedproxy.google.com, it too will set a cookie. The ‘TechCrunch’ name does appear in the path of the URL. So it’s probably related to TechCrunch.
$ curl --head http://feedproxy.google.com/~r/Techcrunch/~3/KCW5Ju_epK4/
HTTP/1.1 301 Moved Permanently Location: http://techcrunch.com/2014/11/10/hacker-emails-testing-service-browserstacks-customers-says-company-lied-about-security/?ncid=rss&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Techcrunch+%28TechCrunch%29 Content-Type: text/html; charset=UTF-8 Date: Tue, 07 Apr 2015 15:19:54 GMT Expires: Tue, 07 Apr 2015 15:19:54 GMT Cache-Control: private, max-age=0 X-Content-Type-Options: nosniff X-XSS-Protection: 1; mode=block Server: GSE Alternate-Protocol: 80:quic,p=0.5 Transfer-Encoding: chunked Accept-Ranges: none Vary: Accept-Encoding
The feed proxy URL above shows that it should redirect to the TechCrunch URL, stated in the location. After it executes the redirect, You see the 200 OK response below. This is the final destination.
HTTP/1.1 200 OK Server: nginx Date: Tue, 07 Apr 2015 15:19:55 GMT Content-Type: text/html; charset=UTF-8 Connection: keep-alive Vary: Accept-Encoding Vary: Cookie X-hacker: If you're reading this, you should visit automattic.com/jobs and apply to join the fun, mention this header. X-Pingback: http://techcrunch.com/xmlrpc.php Link: <http://wp.me/p1FaB8-4xb0>; rel=shortlink X-ac: 4.sjc _dfw
Finally, We’ve arrived at techcrunch.com. Notice all those parameters after the ? in the Location of the 301 redirect. Those are not necessary to look at this page. Their purpose is to provide TechCrunch with information about how you arrived on the site. You could copy that full URL, and delete the question mark and everything after it. You’d still get the same content. However, for other sites, those query parameters might determine the content. Search engines like DuckDuckGo and Bing use query strings to determine the content.
Why would you want to edit the full URL?
- You don’t want to rely on foreign URL shorteners
- Perhaps you are uncomfortable passing along marketing information
- You want to verify the URL before sending it to someone else
- Perhaps you want to use your own URL shortener
The benefit of going through the above process was for education. There’s a shorter way to do the same thing. Let cURL do all the work for you by using the following:
curl --head --location http://tcrn.ch/1AVALnu
In the curl manpage, under the -L, –location section it states If used together with -i, –include or -I, –head, headers from all requested pages will be shown.
. You can make this simpler to use, add the traceurl function below to your .bashrc. If you have the latest Opal on your system, it’s already included.
$ traceurl http://tcrn.ch/1AVALnu
function traceurl(){ if [[ -n $1 ]] then curl --location --head $1 else echo 'Whoops! You forgot to specify a short URL' fi }