Hello Website 2 and some Work with Strings Ruby tutorial

Continuing with the previous tutorial we are now here:

require "open-uri"                         # Required library
f = open("http://www.juy.fi/kurssit/index.html")  # put a self promting web address here
webpage = f.read                           # Read it as one big string
f.close                                    # Don't forget to close!
puts webpage

The aim is to extract title tag from the webpage which is now stored into variable called ‘webpage’ (you may certainly try it with any web address you wish, I’ll continue with this slightly self centered approach..)

Let’s first check that we are dealing with a string type:

webpage.class

Here is the result:

irb(main):006:0> webpage.class
=> String
irb(main):007:0>

String class it is. Take a look of what methods this class has to offer:

webpage.methods

outputs a long list of methods. ‘find’ from the list seems promising:

where do I go for reference?
Starting here:
http://www.ruby-lang.org/en/documentation/
That page lists an overwhelming number of links

http://pine.fm/LearnToProgram/
looks promising as a general tutorial

But this must be what I’m looking for..
http://www.ruby-doc.org/core/

Frames again! ūüė¶
those with .c extensions look like something for c developers so I’ll search the middle column:
Sure enough there is a String section
lets get out of the frames:
http://www.ruby-doc.org/core/classes/String.html

but where is the find method? Be that where ever it may.. let’s try index method instead, try:

webpage.index('<title>')

we get a number! reporting the location index number of our search string: ‘< title>’
Now query the length of the string:

webpage.length

lets try to fetch a substring:

webpage[0,20]

produces:

irb(main):135:0> webpage[0,20]
=> "<!DOCTYPE html PUBLI"
irb(main):136:0>

next I’ll nest these things together:

webpage[webpage.index('<title>'),webpage.index('</title>')]

Not exactly the contents of title here. Another approach should do better:

titleStartIndex = webpage.index('<title>')
titleEndIndex = webpage.index('</title>')
titleLength = titleEndIndex- titleStartIndex
webpage[titleStartIndex,titleLength]

We create three variables to hold numerical values indicating text positions. titleStartIndex holds the numerical value of the beginning of that string sequence in webpage. titleEndIndex indicates the beginning of the closing tag. Then we have a mathematical operatíon with variables to get the titleLength. And finally print out the requested sequence with webpage[titleStartIndex,titleLength].

But it’s not quite there. We need to shift to the end of title tag.and try again:

titleStartIndex = titleStartIndex + '</title>'.length
titleLength = titleEndIndex- titleStartIndex
webpage[titleStartIndex,titleLength]

Off one step and here’s the cure:

titleStartIndex = webpage.index('<title>')
titleStartIndex = titleStartIndex + '</title>'.length
titleStartIndex -= 1
titleEndIndex = webpage.index('</title>')
titleLength = titleEndIndex- titleStartIndex
webpage[titleStartIndex,titleLength]

And that’s it. Next step is going to be working with the actual news site and news titles.

Advertisements

About learnprogramruby

My name is Jukka Ylitalo. I live in Finland Helsinki metropoly area. I'm a philosophy major, media artist and a travelling lecturer on "digital design". At the moment one of the things I want to do is to learn program ruby.
This entry was posted in Programming, Programming tutorial, Ruby. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s