Chapter 3. Programming

Table of Contents

1. The Apache Portable Runtime
1.1. APR Tables
1.2. APR Arrays
2. The Apache API
2.1. The Request Class
2.2. Queries and Parameters
2.3. Cookies
2.4. Redirection
2.5. Request Termination
2.6. Output: Buffering and Content Generation
2.7. Logging

You now understand how to configure Apache to route requests to your handlers. Now that you have that squared away, your need to actually code something in your handler(s), which leads us to the API. And so this chapter is about just that.

If you are familiar with the Apache C API, a lot of this material will be very familiar. If you are not, well ... that's the point of this chapter. While it is easy to document an API, it can be difficult to figure out how to use an API to best implement complex (and even common) tasks. There are many common scenarios and issues that arise again and again in developing web applications. They mainly center around how to use an API to manipulate various aspects of the underlying hypertext transport protocol (HTTP) — things like redirection, managing URL encoding, parsing query parameters, reading HTTP headers or POST parameters or CGI variables, setting the HTTP status codes or content type, setting cookies ... the list goes on. The best way to illustrate the API is by looking at concrete examples that accomplish such common tasks.

To that end, this section is a cookbook of sorts. It takes a bottom-up approach, starting with the design and construction of the core Apache classes as the basic ingredients, explaining how they work and fit together. It then reverses, taking a top-down approach using those ingredients to illustrate different recipes common to many web applications.

Through many practical examples, you will hopefully get a solid grasp of how everything fits together, and how best to use the API. Part of the design goals in the framework was to create a useful level of abstraction, making things as simple as possible, but at the same time to never hide (or discourage the use of) the low-level components of the framework. As such, we will be spending a lot of time with the lowest levels of the framework — namely the Apache::Request object. We are going to look at it and the rest of the framework to manage things like:

1. The Apache Portable Runtime

Before we get started into the Apache API, we need to cover two important data structures that are used heavily within it, which are part of the Apache Portable Runtime (APR). These are tables (APR::Table) and arrays (APR::Array). These are the low-level data structures that the Apache C API frequently uses to organize groups of data.

1.1. APR Tables

The most important of these by far the the APR::Table class. This class is a little bit weird at first, but when you get the hang of it, you’ll start to see why it’s so useful. APR::Table basically works like both an array and a hash, depending on how you use it. There are methods that make it work as an array, and methods that make it work as a hash.

No matter how it is used, internally is stores all of its data as key/value pairs. What differentiates APR::Table from a normal Hash is how it handles collisions. A collision is when you add a value for a key that is already present in the Hash. For example:

table['Set-Cookie', 'Domain=.jujifruit.com')
table['Set-Cookie', 'Path=/')

In this example, a normal Hash will only have one key/value pair. The key is Set-Cookie, and the value is Path=/, which is the last assignment made. The previous value was overwritten. Hashes therefore manage collisions by overwriting the old value with the new one. We can say that in Hashes, keys are unique. You cannot have multiple key/value pairs that share the same key.

This is not the case with APR::Table. It’s keys are not unique. You can have multiple key/value pairs with the same key. And the weird thing is, you can still access the data as if it were a Hash. The only problem is that when you access the data for a given key, you will only get the first match. If you want all of the associated values, then you have to iterate over the table like an array. Are you confused yet? Time for an example.

Let’s look at collision handling, iteration, and hash access all at once (assume this is in a file called test.rhtml):

table = APR::Table.new(APR::Pool.new())

# Dumps the contents of a Table
def dump(t)
  puts 'Dumping Table:'
  t.each do |key, value|
    puts "  #{key}=#{value}"
  end
end

table.add('Set-Cookie', 'lp=13ds4; Domain=.jujifruit.com; Path=/')
table.add('Set-Cookie', 'edge=453fd; Domain=.jujifruit.com; Path=/')

dump(table)

puts
puts "table['Set-Cookie']='#{table['Set-Cookie']}'"

We add values with the same key, just like before. When we iterate over the table, we get:

Dumping Table:
  Set-Cookie=lp=13ds4; Domain=.jujifruit.com; Path=/
  Set-Cookie=edge=453fd; Domain=.jujifruit.com; Path=/

table[’Set-Cookie’]=’lp=13ds4; Domain=.jujifruit.com; Path=/’

See what happens? Both key/value pairs are retained. Now we have duplicate keys with unique values. When we iterate over the table like we would a normal hash, all of the key/value pairs come out. Yet if we access a single value, we only get the first match. Now you are starting the get the essence of the APR:Table: it can act like both an array and a hash at the same time.

But there’s more! APR::Table can handle collisions an alternate way. Notice we used the add() method. That method handles collisions by simply tacking on more key/value pairs. But there is also the merge() method. It handles collisions by keeping the key unique, but appending the new data to the current value. Consider the following example:

table = APR::Table.new(APR::Pool.new())

# Dumps the contents of a Table
def dump(t)
  puts 'Dumping Table:'
  t.each do |key, value|
    puts "  #{key}=#{value}"
  end
end

table.merge('Accept-Charset', 'ISO-8859-1')
table.merge('Accept-Charset', 'utf-8')

dump(table)

puts
puts "table['Accept-Charset']='#{table['Accept-Charset']}'"

Here is what we get:

Dumping Table:
  Accept-Charset=ISO-8859-1, utf-8

table[’Accept-Charset’]=’ISO-8859-1, utf-8’

See what happens here? Same key, but the value keeps growing. merge() simply appends the new value on to the end. So you can decide how you want to handle collisions by which methods you use to set the data. You can choose array-sh behavior with add(), an aggregating hash behavior with merge(), and then there is the pure hash behavior with a third method: set(). So what you really have to adjust to is just the set/get methods of APR::Table. Here are the three methods again in review:

  • add(key, value): This method adds a new key/value pair to the table. In the case of a collision, add() will just tack on the new key/value pair, causing the table to take on array-like characteristics.

  • merge(key, value): This works similarly to add for a new key/value pair: it will simply add it to the table. In the case of a collision, however, merge() will append value to the end of the current value of the (first) key/value pair already in the table. There operative word here is first. If there are already several key/value pairs added from add(), a subsequent call to merge() will operate only on the first match it finds for key. So keep in mind what happens when you mix these two.

  • set(key, value): This works like add for new key/value pairs. In the event of a collision, it will replace the old value (scalar or array) with the current value. That is, even if there are multiple key/value pairs for the given value of key, set() will first wipe them all out.

So to be absolutely clear how these methods work indepently and together, let’s look at one final example:

table = APR::Table.new(APR::Pool.new())

# Dumps the contents of a Table
def dump(t)
  puts 'Dumping Table:'
  t.each do |key, value|
    puts "  #{key}=#{value}"
  end
end

table.add('Set-Cookie', 'lp=13ds4; Domain=.launchpad.net; Path=/')
table.add('Set-Cookie', 'edge=453fd; Domain=.launchpad.net; Path=/')
dump(table)

table.merge('Set-Cookie', 'ISO-8859-1')
table.merge('Set-Cookie', 'utf8')
dump(table)

table.set('Set-Cookie', 'wiped out')
dump(table)

Here is what we get:

Dumping Table:
  Set-Cookie=lp=13ds4; Domain=.jujifruit.com; Path=/
  Set-Cookie=edge=453fd; Domain=.jujifruit.com; Path=/
Dumping Table:
  Set-Cookie=lp=13ds4; Domain=.jujifruit.com; Path=/, ISO-8859-1, utf8
  Set-Cookie=edge=453fd; Domain=.jujifruit.com; Path=/
Dumping Table:
  Set-Cookie=wiped out

Additionally, APR::Table supports the []= operator as well, which is just another way of calling set() (as you would expect).

Now, you may still have the question “Why? — What’s the point?” Well, the previous examples give some hint to that. Notice that they all involved HTTP headers. Some HTTP headers — like Set-Cookie — can have duplicate keys. Thus in this case, APR:Table’s add() comes in handy. In other cases, some HTTP headers — like Accept-Charset — have a single key, but multiple comma-delimited values. In this case, APR:Table’s merge() comes in handy. The Apache developers recognized this and said “To keep our code clean, we need something that can easily handle both situations elegantly.” And thus the APR table (known as the apr_table_t struct in the APR C API) was born. When you do much work with HTTP headers, it doesn’t take long for you to appreciate APR:Table’s flexibility.

You’ll probably be glad to know that the remaining methods of APR::Table are straightforward. They all follow the APR table C API. The complete documentation for APR::Table is in Section 6, “The APR Table Class”.

1.2. APR Arrays

Finally, a word on the APR::Array class. For all intents and purposes, it’s a read-only Ruby array. It has two methods: each() and size() (ideally, it also needs at least a [] operator). Currently, the only method in the API that returns an APR::Array is Request::content_languages().