This is my first tutorial on Ruby programming. I want to have a simple web document reader that gathers news titles from web and saves them to a text file.
So the plan is: ruby based client to download stuff from web and extracting relevant parts and saving it to a text file. This involves more that can be crammed to one tutorial so in this tutorial only the downloading part is covered:
I assume you have ruby installed, if not consult here: http://www.ruby-lang.org/en/downloads/
Once installed goto command prompt.
If you are on windows xp open start > Run…
and type ‘cmd’ to open
When command prompt opens, type ‘irb’ to enter ‘Interactive Ruby’:
..and we are ready to go!
Type: puts “hello web!”
irb(main):001:0> puts "hello web!" hello web! => nil irb(main):002:0>
So that’s our obligatory first test. Irb (Interactive Ruby) does what it’s expected. ‘puts’ command tells ruby to print that string of letters. As Ruby in Twenty Minutes tells us nil “is Ruby’s absolutely-positively-nothing value.” never mind what that means I want irb to print out a contents of a website.
I’ll go to http://stdlib.rubyonrails.org/ and look for http:
net/http > Net::HTTP
here is the link for the frame: http://stdlib.rubyonrails.org/libdoc/net/http/rdoc/classes/Net/HTTP.html
let’s try the first example and put something existing into the url placeholders:
require 'net/http' Net::HTTP.get_print 'www.juy.fi', '/kurssit/index.html'
copy paste (or type) that to irb and it should print the contents of that html document into the console. If so, success!
(By the way to copy paste code into the irb does not seem to be too user friendly in windows command prompt. Usually you can find a paste command in right-click context menu but now it only seems to work on the title bar. Right click title bar like this:
That’s not too user friendly but still more convenient than typing everything by hand. (Thinking of making an Autohotkey script to simulate linux shell functionlity to remedy this.. speaking of which it is here)
Another thing to enhance the coding pleasure would be to have some command prompt settings adjusted. I think it comes up with too small font sizes. Go to the properties (bottom line in the context menu on the image above) and make your preferred changes. Well .. I guess there is (/should be) a nicer way to play with irb in windows.. ?
these other Net::HTTP documentation examples look promising but for some reason I have some trouble .. I’m looking for a really simple snippet and resort to The Ruby programming language book by D. Flanagan and Y. Matsumoto. Here is 5 lines that work:
require "open-uri" # Required library f = open("http://www.juy.fi/kurssit/index.html") # put a self promoting web address here webpage = f.read # Read it as one big string f.close # Don't forget to close! puts webpage
If you get bunch of html code running on your screen you can cheer Hurrey! That is what is supposed to happen..
Next step is to do something with this variable called webpage that contains the html code. For example how do I extract the title tag? I would suppose there is a number of ways to do it in Ruby. Python has ‘beautiful soup’ and others like it, I’m sure Ruby must have something like that too. If this website is valid xhtml I should be able to parse it with some xml parsing tools too. But I will take a rudimentary string processing option.
So strings and Ruby.. That will be a subject for the next post..