<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Quick and Dirty URL&#160;Validation</title>
	<atom:link href="http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/feed/" rel="self" type="application/rss+xml" />
	<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/</link>
	<description></description>
	<lastBuildDate>Wed, 10 Mar 2010 01:15:13 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Bob Aman</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68204</link>
		<dc:creator>Bob Aman</dc:creator>
		<pubDate>Thu, 12 Feb 2009 02:06:49 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68204</guid>
		<description>If I wanted to break this, I think I&#039;d write a custom &quot;webserver&quot; that responded to a HEAD request with an unending stream of headers.  Server never closes the connection, just keeps sending you headers until the client finally gives up, assuming it ever does.</description>
		<content:encoded><![CDATA[<p>If I wanted to break this, I think I&#8217;d write a custom &#8220;webserver&#8221; that responded to a HEAD request with an unending stream of headers.  Server never closes the connection, just keeps sending you headers until the client finally gives up, assuming it ever does.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Trevor</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68201</link>
		<dc:creator>Trevor</dc:creator>
		<pubDate>Wed, 11 Feb 2009 22:22:31 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68201</guid>
		<description>Hmm... yes... I think I got it now? I pasted it from running code, so I hope so :)</description>
		<content:encoded><![CDATA[<p>Hmm&#8230; yes&#8230; I think I got it now? I pasted it from running code, so I hope so :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: name</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68199</link>
		<dc:creator>name</dc:creator>
		<pubDate>Wed, 11 Feb 2009 21:49:39 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68199</guid>
		<description>There is still one slash left unescaped.</description>
		<content:encoded><![CDATA[<p>There is still one slash left unescaped.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Trevor</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68198</link>
		<dc:creator>Trevor</dc:creator>
		<pubDate>Wed, 11 Feb 2009 20:58:54 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68198</guid>
		<description>This is my favorite URL format validator right now:

http://github.com/henrik/validates_url_format_of/tree/master

URI.regexp didn&#039;t catch a lot of invalid stuff I tried to trow at it.</description>
		<content:encoded><![CDATA[<p>This is my favorite URL format validator right now:</p>
<p><a href="http://github.com/henrik/validates_url_format_of/tree/master" rel="nofollow">http://github.com/henrik/validates_url_format_of/tree/master</a></p>
<p>URI.regexp didn&#8217;t catch a lot of invalid stuff I tried to trow at it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Trevor</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68195</link>
		<dc:creator>Trevor</dc:creator>
		<pubDate>Wed, 11 Feb 2009 15:52:05 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68195</guid>
		<description>Ah yes. Fixed, thanks!</description>
		<content:encoded><![CDATA[<p>Ah yes. Fixed, thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: name</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68192</link>
		<dc:creator>name</dc:creator>
		<pubDate>Wed, 11 Feb 2009 07:11:58 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68192</guid>
		<description>Btw, regexp is wrong. You should escape slashes:
regexp = url.match(/https?:\/\/([^\/]+)(.*)/)</description>
		<content:encoded><![CDATA[<p>Btw, regexp is wrong. You should escape slashes:<br />
regexp = url.match(/https?:\/\/([^\/]+)(.*)/)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Trevor</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68191</link>
		<dc:creator>Trevor</dc:creator>
		<pubDate>Wed, 11 Feb 2009 02:13:47 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68191</guid>
		<description>Thanks, Walter. That looks like a good plugin to consider if you&#039;re worried about overkill but need more than quick and dirty.</description>
		<content:encoded><![CDATA[<p>Thanks, Walter. That looks like a good plugin to consider if you&#8217;re worried about overkill but need more than quick and dirty.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Walter McGinnis</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68190</link>
		<dc:creator>Walter McGinnis</dc:creator>
		<pubDate>Tue, 10 Feb 2009 21:51:58 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68190</guid>
		<description>Sorry, meant to add link to the source.  It can be found here:

https://modzer0.cs.uaf.edu/repos/hank/code/http_url_validation_improved/</description>
		<content:encoded><![CDATA[<p>Sorry, meant to add link to the source.  It can be found here:</p>
<p><a href="https://modzer0.cs.uaf.edu/repos/hank/code/http_url_validation_improved/" rel="nofollow">https://modzer0.cs.uaf.edu/repos/hank/code/http_url_validation_improved/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Walter McGinnis</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68189</link>
		<dc:creator>Walter McGinnis</dc:creator>
		<pubDate>Tue, 10 Feb 2009 21:51:16 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68189</guid>
		<description>This is the same basic concept of the older http_url_validation_improved plugin.  It&#039;s not on github, so perhaps that is why you missed it.

We use it in Kete (http://kete.net.nz) to achieve what you are after.  It does look like it could use that URI.regexp refactoring added to it.

One nice thing is that it will check for allowed content types with configuration.  It also gives more finegrained feedback when validation fails.</description>
		<content:encoded><![CDATA[<p>This is the same basic concept of the older http_url_validation_improved plugin.  It&#8217;s not on github, so perhaps that is why you missed it.</p>
<p>We use it in Kete (<a href="http://kete.net.nz" rel="nofollow">http://kete.net.nz</a>) to achieve what you are after.  It does look like it could use that URI.regexp refactoring added to it.</p>
<p>One nice thing is that it will check for allowed content types with configuration.  It also gives more finegrained feedback when validation fails.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Trevor</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68187</link>
		<dc:creator>Trevor</dc:creator>
		<pubDate>Tue, 10 Feb 2009 20:53:35 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68187</guid>
		<description>As an aside, found this easy way to do basic URL format validation:

http://www.ruby-doc.org/core/classes/URI.html#M004840

require &#039;uri&#039;
validates_format_of :uri, :with =&gt; URI.regexp

Very nice - no need for a custom regexp :)</description>
		<content:encoded><![CDATA[<p>As an aside, found this easy way to do basic URL format validation:</p>
<p><a href="http://www.ruby-doc.org/core/classes/URI.html#M004840" rel="nofollow">http://www.ruby-doc.org/core/classes/URI.html#M004840</a></p>
<p>require &#8216;uri&#8217;<br />
validates_format_of :uri, :with => URI.regexp</p>
<p>Very nice &#8211; no need for a custom regexp :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Trevor</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68186</link>
		<dc:creator>Trevor</dc:creator>
		<pubDate>Tue, 10 Feb 2009 20:24:39 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68186</guid>
		<description>Hmm... yeah, I guess you could use Timeout to help, but it does seem like making requests of arbitrary websites may have unindented consequences. 

I&#039;m not sure what you could do about receiving large header responses. I suppose a timeout could help there, too.

Here&#039;s where I would start looking in terms of doing timeout stuff:

http://www.ruby-doc.org/core/classes/Timeout.html
http://www.slashdotdash.net/2008/02/15/ruby-tidbit-timeout-code-execution/

Although I came across these not too long ago:

http://blog.segment7.net/articles/2006/04/11/care-and-feeding-of-timeout-timeout
http://blog.headius.com/2008/02/rubys-threadraise-threadkill-timeoutrb.html

So, I&#039;m not sure if timeout is safe to use or not :)</description>
		<content:encoded><![CDATA[<p>Hmm&#8230; yeah, I guess you could use Timeout to help, but it does seem like making requests of arbitrary websites may have unindented consequences. </p>
<p>I&#8217;m not sure what you could do about receiving large header responses. I suppose a timeout could help there, too.</p>
<p>Here&#8217;s where I would start looking in terms of doing timeout stuff:</p>
<p><a href="http://www.ruby-doc.org/core/classes/Timeout.html" rel="nofollow">http://www.ruby-doc.org/core/classes/Timeout.html</a><br />
<a href="http://www.slashdotdash.net/2008/02/15/ruby-tidbit-timeout-code-execution/" rel="nofollow">http://www.slashdotdash.net/2008/02/15/ruby-tidbit-timeout-code-execution/</a></p>
<p>Although I came across these not too long ago:</p>
<p><a href="http://blog.segment7.net/articles/2006/04/11/care-and-feeding-of-timeout-timeout" rel="nofollow">http://blog.segment7.net/articles/2006/04/11/care-and-feeding-of-timeout-timeout</a><br />
<a href="http://blog.headius.com/2008/02/rubys-threadraise-threadkill-timeoutrb.html" rel="nofollow">http://blog.headius.com/2008/02/rubys-threadraise-threadkill-timeoutrb.html</a></p>
<p>So, I&#8217;m not sure if timeout is safe to use or not :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim Connor</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68185</link>
		<dc:creator>Tim Connor</dc:creator>
		<pubDate>Tue, 10 Feb 2009 18:12:43 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68185</guid>
		<description>I like tenderlove&#039;s refinement of sending back a very large response (is there somewhere you can cram base64 encoded video into a HTTP response header, maybe, to additionally add in the illegal/copyrighted content problems), when you finely do respond, to help chew up memory, too.</description>
		<content:encoded><![CDATA[<p>I like tenderlove&#8217;s refinement of sending back a very large response (is there somewhere you can cram base64 encoded video into a HTTP response header, maybe, to additionally add in the illegal/copyrighted content problems), when you finely do respond, to help chew up memory, too.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jacob Harris</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68184</link>
		<dc:creator>Jacob Harris</dc:creator>
		<pubDate>Tue, 10 Feb 2009 18:05:05 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68184</guid>
		<description>This does allow for a really simple denial-of-service attack on any site using it. Basically, you write a very simple server (you can do it in Sinatra if you like) that sleeps for a really long time on a head request to a specific URL. Then you just submit the URL multiple times to the form validating this URL. Repeat until you preoccupy ever Mongrel or cause Passenger to spawn enough Apache processes to thrash. Of course, you could add a timeout on the check, but I could also use Mechanize... ;)</description>
		<content:encoded><![CDATA[<p>This does allow for a really simple denial-of-service attack on any site using it. Basically, you write a very simple server (you can do it in Sinatra if you like) that sleeps for a really long time on a head request to a specific URL. Then you just submit the URL multiple times to the form validating this URL. Repeat until you preoccupy ever Mongrel or cause Passenger to spawn enough Apache processes to thrash. Of course, you could add a timeout on the check, but I could also use Mechanize&#8230; ;)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim Connor</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68183</link>
		<dc:creator>Tim Connor</dc:creator>
		<pubDate>Tue, 10 Feb 2009 17:59:05 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68183</guid>
		<description>Trevor, you may be entirely right, I do not know.  My only point is counting on everyone else on the internet on honor &quot;this MUST NOT have side effects&quot; or whatever the language is, causes things like the Google Accelerator deleting lots of people info fiasco.</description>
		<content:encoded><![CDATA[<p>Trevor, you may be entirely right, I do not know.  My only point is counting on everyone else on the internet on honor &#8220;this MUST NOT have side effects&#8221; or whatever the language is, causes things like the Google Accelerator deleting lots of people info fiasco.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim Connor</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68182</link>
		<dc:creator>Tim Connor</dc:creator>
		<pubDate>Tue, 10 Feb 2009 17:57:20 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68182</guid>
		<description>Ah I forgot to add the merely annoying one

http:/serverthattimesout.com</description>
		<content:encoded><![CDATA[<p>Ah I forgot to add the merely annoying one</p>
<p>http:/serverthattimesout.com</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Berger</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68175</link>
		<dc:creator>Daniel Berger</dc:creator>
		<pubDate>Tue, 10 Feb 2009 12:03:35 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68175</guid>
		<description>require &#039;uri&#039;
URI.parse(url).host</description>
		<content:encoded><![CDATA[<p>require &#8216;uri&#8217;<br />
URI.parse(url).host</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Trevor</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68170</link>
		<dc:creator>Trevor</dc:creator>
		<pubDate>Tue, 10 Feb 2009 02:21:33 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68170</guid>
		<description>Yeah, I think the HEAD request is harmless. Maybe I&#039;m wrong, though. 

This technique I&#039;m talking about isn&#039;t a method for preventing spam or anything - it&#039;s just a quick way to validate URLs are accessible (e.g. not http:/sdf38830.com or something nonsensical like that).</description>
		<content:encoded><![CDATA[<p>Yeah, I think the HEAD request is harmless. Maybe I&#8217;m wrong, though. </p>
<p>This technique I&#8217;m talking about isn&#8217;t a method for preventing spam or anything &#8211; it&#8217;s just a quick way to validate URLs are accessible (e.g. not http:/sdf38830.com or something nonsensical like that).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Caden</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68169</link>
		<dc:creator>Caden</dc:creator>
		<pubDate>Tue, 10 Feb 2009 02:10:16 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68169</guid>
		<description>Well that was completely bizarre. Try adding more foil Tim. You need more foil.</description>
		<content:encoded><![CDATA[<p>Well that was completely bizarre. Try adding more foil Tim. You need more foil.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim Connor</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68168</link>
		<dc:creator>Tim Connor</dc:creator>
		<pubDate>Tue, 10 Feb 2009 02:08:43 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68168</guid>
		<description>Umm, I didn&#039;t mean for those to actually be turned into links, sorry.</description>
		<content:encoded><![CDATA[<p>Umm, I didn&#8217;t mean for those to actually be turned into links, sorry.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim Connor</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68167</link>
		<dc:creator>Tim Connor</dc:creator>
		<pubDate>Tue, 10 Feb 2009 02:07:36 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68167</guid>
		<description>My last attempt got swallowed by your spam filter, I think.

Not a problem for your server getting hacked, but more in this line

http:/alqueda.com
http:/www.kiddiepr0n.com

http:/www.wellsfargo (thanks for the loan of your fat pipes for my DDOS)

http:/www.vulnerableserver.com/troublesome_url (at least they got your IP as the one that brought them down).

Basically, while I think HEADs will be mostly harmless, this still does leave you as an anonymous proxy in at least one way, for people who may know what to actually exploit (unlike me).</description>
		<content:encoded><![CDATA[<p>My last attempt got swallowed by your spam filter, I think.</p>
<p>Not a problem for your server getting hacked, but more in this line</p>
<p>http:/alqueda.com<br />
http:/www.kiddiepr0n.com</p>
<p>http:/www.wellsfargo (thanks for the loan of your fat pipes for my DDOS)</p>
<p>http:/www.vulnerableserver.com/troublesome_url (at least they got your IP as the one that brought them down).</p>
<p>Basically, while I think HEADs will be mostly harmless, this still does leave you as an anonymous proxy in at least one way, for people who may know what to actually exploit (unlike me).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Trevor</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68163</link>
		<dc:creator>Trevor</dc:creator>
		<pubDate>Tue, 10 Feb 2009 01:28:39 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68163</guid>
		<description>Err... I&#039;m not sure how exactly this could be a security problem. Perhaps an example would help?</description>
		<content:encoded><![CDATA[<p>Err&#8230; I&#8217;m not sure how exactly this could be a security problem. Perhaps an example would help?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim Connor</title>
		<link>http://almosteffortless.com/2009/02/09/quick-and-dirty-url-validation/comment-page-1/#comment-68162</link>
		<dc:creator>Tim Connor</dc:creator>
		<pubDate>Tue, 10 Feb 2009 01:22:30 +0000</pubDate>
		<guid isPermaLink="false">http://almosteffortless.com/?p=1155#comment-68162</guid>
		<description>Abusable?  Sure HEAD isn&#039;t *supposed* to do anything, but I&#039;ll bet money there are sites out there that have URLs you shouldn&#039;t be blindly hitting.  I guess a simple timeout wouldn&#039;t be the end of the world, but could be moderately annoying.  Also, I could make your servers show connections to TERRORISTS or kiddie porn servers, if I knew where to connect to them (which I don&#039;t).

All in all, I&#039;m not sure you want to leave your server connecting up to another one to a blind check.</description>
		<content:encoded><![CDATA[<p>Abusable?  Sure HEAD isn&#8217;t *supposed* to do anything, but I&#8217;ll bet money there are sites out there that have URLs you shouldn&#8217;t be blindly hitting.  I guess a simple timeout wouldn&#8217;t be the end of the world, but could be moderately annoying.  Also, I could make your servers show connections to TERRORISTS or kiddie porn servers, if I knew where to connect to them (which I don&#8217;t).</p>
<p>All in all, I&#8217;m not sure you want to leave your server connecting up to another one to a blind check.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
