URL, URL, Little Do We Know Thee
By
Razvan Peteanu
By Razvan Peteanu for SecurityPortal
About Schemes and Men
Recently, many smiled and Microsoft got angry at a spoof of its Knowledge Base
articles posted on a URL starting with "http://www.microsoft.com."
Emails went around and people clicked on the link, possibly before looking closer at it. Surprised by the content, they may have checked the URL again, noticing the other "www"-like string in it and figured out it must have something to do with the real host; forwarded the email to friends and then returned to their work.
Today we will look closer at URLs and the associated security implications.
"Interesting" ways of using them have been known by spammers for a
while, but now the KB spoof and the February issue of
Crypto-Gram have made the Internet community more aware of what URLs can do.
Although most Internet users will associate URLs with WWW addresses, or perhaps
FTP, Uniform Resource Locators are more general in scope. URLs are standardized
in RFC1738, and in their most generic form, they are defined
as
<scheme>:<scheme-specific-part>
The best-known scheme is the Common Internet, in which the <scheme>
is the name of a protocol and the <scheme-specific-part> is defined as:
//<user>:<password>@<host>:<port>/<url-path>
in which only the host part is mandatory. The ":" and "@"
characters have a special meaning and thus the server can parse the entire string.
If a user and a password are provided, the host part only comes after the @
character. In the KB spoof mentioned earlier, the link was
http://www.microsoft.com&item=q209354@www.hwnd.net/pub/mskb/Q209354.asp
Understandably, it is no longer available. (In case you find a copy elsewhere,
be aware that the page uses strong language and might trigger some content scanners
as well.) As you have guessed, the real host of the page was www.hwnd.net. The
string "www.microsoft.com" in this case is just a bogus username that is ignored
by the web server.
Although perfectly valid syntactically, the above usage can be considered as
having security relevance. While no technological resource is affected, the
attack is targeted at the other (and often ignored) half of the picture: ourselves.
At the end of most Internet nodes, beyond network cards, modems and computers,
there are human users who, consciously or not, make security decisions every
time they decide to trust what they see on the screen.
Trust is a fundamental
security value. Crafting the URL as above exploits the trust we have in our
understanding of what a URL is like and in whoever provided us the link. It
also exploits the fact that our attention is focused on the content frame and
not on the location although they are equally important in a decision of trust.
In SSL-protected sites, the latter is in part taken care of by the browser, which
compares the domain with the information in the SSL certificate; otherwise mere
encryption would not provide much value if the destination is bogus.
Concealment
The URL analyzed above is just superficially hiding its real destination. Let
us look further into better ways of doing this. For some reasons (probably caused
by the internal handling), some operating systems operate with IP addresses not
only in the form we are used to, aaa.bbb.ccc.ddd, but also as the decimal equivalent.
The above generic address can also be written as the decimal value of aaa*256^3+bbb*256^2+cccc*256+ddd.
Thus, 3633633987 is 216.148.218.195 (belonging to www.redhat.com). You can copy
and paste 3633633987 into your browser, and you will find yourself browsing Red Hat's
main site. The above works with Internet Explorer 5.x and also with Lynx on
Linux, but I have not tested all operating systems, so your mileage may vary.
Some applications may complain of invalid URLs if they parse the domain name
for periods, but if you experiment with a few applications, including standard
utilities like ping, you should be able to figure out whether the OS itself
supports this usage.
Thus more obfuscation could be obtained by creating a URL such as http://www.toronto.com:ontario@3633633987
which still goes to Red Hat. Surfers are used to seeing strings of digits in
a URL because many sites store the HTTP SessionID in the URL instead of in a
cookie, so the above would not appear particularly suspicious. The password
can be absent, so we end up having http://www.toronto.com@3633633987,
"easy to read, easy to misunderstand" at a first glance.
Now, for the final touch, we can use a bit of HTML knowledge: the anchor tag
allows the display text for a link to be different than the target itself, so
the above link can appear as http://www.toronto.com.
In IE 5.5, hovering with the mouse over it displays the number only in the status
bar, not very indicative of a wrong target, so only clicking on it would show
us the real target.
Yet another way of exploiting trust is by using the indirection provided by
genuine websites. A number of well-known sites track if their visitors follow
external links by first creating the links of the form http://www.thisisarespectablesite.com/outsidelinks/http://externalsite,
trapping the request at the server side and then redirecting the user to the
real destination.
The problem with this approach is that anyone can use their
indirection, combined with URL obfuscation, in order to provide more legitimacy
to false URLs. What this can lead to depends both on the attacker and on the
victim. The HTTP REFERER field, limited as it is, can be of some value to reduce
abuses, but not all sites seem to use it.
And if the above was not enough, the characters in the real destination can
be obfuscated themselves through URL and Unicode encoding. so only
the hex codes will be visible. URL encoding is required for many special characters,
but can be applied to regular alphanumeric characters as well.
None of the above is new to knowledgeable spammers, but will likely be quite
successful as an attack targeted to the average unsuspecting user.
One-click Attacks
Let's explore the security implications of the URL even further. One of the "standard"
attacks would be to cause a buffer overflow. As far as the browsers go, however,
by now this would be a very beaten path; many a hacker has tried to crash IE
or Netscape. What about other protocols? Indeed, what other protocols
are recognized on a machine?
To find out the answer for a Windows box, I turned to looking into the registry.
The following keys contain such information: HKEY_LOCAL_MACHINESOFTWAREClassesPROTOCOLSHandler
and those keys under HKEY_CLASSES_ROOTShell that have a subkey named "URL
Protocol." (You will have to do some searching for those in the latter
category, but it does not take long.)
The search results proved interesting:
apart from the expected ftp://, http://, https://, mailto://, news://, pnm://
and several others, I found some schemes I had never heard of before, such as
msee://. A quick experiment showed that it is the scheme used by Microsoft
Encarta, perhaps to refer to articles inside the encyclopedia. Whether Encarta
is safe from buffer overflows and, if not, whether they can be practically exploited,
well, this is something that would need investigation.
The story repeated with other URL schemes that were installed by various applications
(such as copernic:// owned by the Copernic
search tool). There have been other interesting discoveries, but have a look
for yourself.
Apart from the possibility of remote exploitation of applications that are
not otherwise remotely accessible, even more discomfort is caused by the absence
of any administrative interface allowing inspection of the associations between
a URL scheme and the application using it (apart from a very scope-limited dialog
in Internet Explorer under Tools/Options/Programs which only displays a handful
of standard protocols).
It turns out that registering a new URL scheme in Windows is trivial and the
change takes place immediately. It is done by adding the necessary registry
entries as described in this MSDN documentation.
Unfortunately, this also means this can be done by scripted viruses such as KakWorm
(which are executed by simply viewing an email on a vulnerable system).
Associating
a benign protocol with a dangerous command is, well, dangerous. Granted, this
is not a URL-specific attack. It can be done using file associating as well,
but the risk is still there, and the existence of other attack paths does not
mean this one will not be exploited. And, of course, nothing forces an attacker
to use only the techniques described here.
Until there are more mechanisms to inform and protect us from such attacks,
the best defense is to be cautious, and do not follow directions in emails you
cannot trust. Sometimes, you just feel something isn't right.
Now, if you would only click this link for
some free advice :-) ... Did you ?
References:
Bruce Schneier, Crypto-Gram, Feb 2001, "A Semantic
Attack on URLs"
http://www.counterpane.com/crypto-gram-0102.html#7
RFC1738
http://www.securityportal.com/rfc/rfc1738.txt
MSDN, "Registering an Application to a URL Protocol"
http://msdn.microsoft.com/workshop/networking/plugga
ble/overview/appendix_a.asp
SecurityPortal is the world's foremost on-line resource and services
provider for companies and individuals concerned about protecting their
information systems and networks.
http://www.SecurityPortal.com
The Focal Point for Security on the Net (tm)
|