2. The Apache API

2. The Apache API
Prev	Chapter 3. Programming	Next

2.1. The Request Class

The Apache::Request class is the principal class you will use to do most of the low-level work, as it contains the most common information (queries, POST parameters, cookies, etc.) which in turn comes primarily from the HTTP headers and/or the CGI environment.

2.1.1. HTTP Headers

As you now understand how APR::Tables work, let’s take an opportunity to explore how they and everything else fits together. We will look specifically specifically at how HTTP headers are managed. If you understand this, everything else is easy. This demonstrates not only the subtleties of using APR tables, but also the core linkage between the APR classes and the Apache::Request classes. HTTP headers (in and out) are a large component of handling a request, second only to content, which we will look at later.

To start out, let’s look at part of the Ruby implementation of the Apache::Request class:

module Apache

class Request

  def setCookie( name, value,
                 days=0, minutes=0, path=nil )
      
    path = '/' if path == nil

    cookieString = "%s=%s;path=%s;domain=.%s" 
    cookieString = cookieString % [name, value, path, domainName()]
    
    if days != 0 or minutes != 0
      if days==-1 and minutes==-1
        cookieString += ";Expires=Sun, 17-Jan-2038 19:14:07 -0600"
      else
        cookieString += ";Expires=%s" % expirationDate(days, minutes)
      end
    end

    # Append this cookie to the array
    self.headers_out.add('Set-Cookie', cookieString)
  end

  # Redirect the client to url.
  def redirect(url)

    # Set the Apache req.status
    self.set_status(302)

    # Also set it in the headers, just in case
    self.headers_out['Status']   = "302"
    self.headers_out['Location'] = url

    # This will terminate subsequent request processing
    raise ModRuby::Redirect.new(url)
  end

end

Look at setCookie(). Cookies are stored in the @headers_out member, which is where all HTTP headers are stored. This is an APR::Table object. All cookie values are stored in a list associated with the key "Set-Cookie". Similarly, the redirect(), expires(), and dontCache() methods all operate on the @headers_out table as well. Note that these are outgoing headers — response headers, headers sent back to the client.

So at some point these headers have to be shipped out — sent back to the client. And they have to be sent before any of the content. This happens in one of two ways:

Explicitly: Using the Request::flush() method will explicitly send them on their way.
Implicitly: Sending any kind of content will internally trigger Apache to first send the headers out.

Look at the implementation of flush(). It calls self.rflush(). This is where we cross into the C implementaion of Apache::Request. This is basically a wrapper over the ap_rflush() function in Apache C API. When this is called, the headers are automatically flushed and sent to the client.

Error Headers

Note also that the Apache::Request class has both a headers_out() and err_headers_out(). Each returns a reference to the APR::Table in the Apache request struct. They are two distinctly different tables, and it is important that you understand the difference. When an error or internal redirect takes place, the @headers_out are cleared out and only the values in @err_headers_out are retained.

It mainly centers around the HTTP status code sent back to the client. Error fields are sent back if the module aborts or returns an error status code. So if the status code is success (200), then the contents of @headers_out are sent back as the HTTP headers. However, for any other status code the @err_headers_out are used. This has very important implications. For example, in doing a HTTP redirect, you have to use the error headers rather than the regular headers because that employs returning a 302 status code. The only exception to this is when you redirect by setting location header, which is a special case, and then you may use @headers_out. That is why the Apache::Request::copyErrorHeaders() method exists. It is a convenience function that copies all entries in @headers_out to @err_headers_out.

So just remember, if you manually set the HTTP status code, you need to make sure that you copy any relevant headers you want to send back to the error headers and not to the regular headers (as they will not be sent in this case), the only exception being if you set the location header.

And that’s the story — HTTP headers. If you understand this, everything else is cake, as you can now see the pathway from the high-level Apache::Request object all the way into the APR::Tables of the Apache C API. Think of the Apache::Request class as one big API that contains the low-level API to do pretty much everything.

2.1.2. CGI Environment

Apache request objects have what is called an "environmental table." This is an internal table which holds the environmental variables set for the web server process. This happens to be the canonical way that web servers pass HTTP parameters to CGI processes. As such, the Apache::Request object contains both standard CGI environment variables and HTTP headers for convenience. As this behavior is optional in Apache, internally, the ModRuby module calls two Apache C API functions (ap_add_cgi_vars() and ap_add_common_vars()) which cause the Apache to load CGI variables into the request’s environmental table. The high-level data structure containing this information is the Apache::Request::cgi member which in turn is a APR::Table. The following code illustrates this:

<%
@request.cgi.each do |key, value|
  puts "%20s: %s" % [key, value]
end
%>

This example is in http/cgi.rhtml. It yields the following:

                VLOG: moduby
   GATEWAY_INTERFACE: CGI/1.1
     SERVER_PROTOCOL: HTTP/1.1
      REQUEST_METHOD: GET
        QUERY_STRING:
         REQUEST_URI: /http/cgi.rhtml
         SCRIPT_NAME: /http/cgi.rhtml
           HTTP_HOST: www.modruby.org
     HTTP_USER_AGENT: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3pre)
         HTTP_ACCEPT: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
HTTP_ACCEPT_LANGUAGE: en-us,en;q=0.5
HTTP_ACCEPT_ENCODING: gzip,deflate
 HTTP_ACCEPT_CHARSET: ISO-8859-1,utf-8;q=0.7,*;q=0.7
     HTTP_KEEP_ALIVE: 300
     HTTP_CONNECTION: keep-alive
        HTTP_REFERER: http://www.modruby.org/
         HTTP_COOKIE: test=1; sid=439sdkkfdjks; galations=6:1-10
  HTTP_CACHE_CONTROL: max-age=0
                PATH: /usr/local/bin:/usr/bin:/bin
    SERVER_SIGNATURE:
Apache/2.2.8 (Ubuntu) Server at www.modruby.org Port 80

     SERVER_SOFTWARE: Apache/2.2.8 (Ubuntu)
         SERVER_NAME: www.modruby.org
         SERVER_ADDR: 127.0.0.1
         SERVER_PORT: 80
         REMOTE_ADDR: 127.0.0.1
       DOCUMENT_ROOT: /var/www/modruby-example/content
        SERVER_ADMIN: webmaster@modruby.org
     SCRIPT_FILENAME: /var/www/modruby-example/content/http/cgi.rhtml
         REMOTE_PORT: 51183

There can be more or less variables depending on the OS and the nature of the request (whether there are cookies are not, etc.).

2.2. Queries and Parameters

Queries and parameters are stored in @request.queries and @request.params, both of which are APR tables in the Apache::Request object. Again, you can iterate over APR tables just like a Ruby hash:

<form name="input" action="post.rhtml?queryarg=junior+mints" method="post">
<input type="text" name="text" value="jujifruit">
<input type="submit" value="Submit">
</form>

<pre>
<%
puts "Queries: "
queries = @request.queries()

queries.each do |key, value|
  puts "  #{key}=#{value}" 
end

puts "Params: "
params = @request.params

params.each do |key, value|
  puts "  #{key}=#{value}" 
end
%>

This example is in http/post.rhtml. Click the Submit button. It yields the following:

Queries:
  queryarg=junior mints
Params:
  text=jujifruit

Parameters that come from POST methods have content type application/x-www-form-urlencoded. For all other content types, the value of @request.params will be nil. This means you will have to access the raw content via @request.content() and process it yourself, be it XML, JSON, etc.

Additionally, the Apache::Request class has three shotgun methods called value(), values(), and hasValue?(). Their purpose is to provide a single interface with which to pull a value from any of the queries, parameters, or CGI variables (in that order). value() and values() are essentially the same except the former always returns the result in scalar form and the latter in array form. hasValue?() returns a boolean of whether or not there is any such value for a given name in any of the respective sources.

Again, keep in mind that all of this information — queries, cookies and other various stuff (all but POST parameters) — is available straight from the CGI interface (@request.cgi). But it can be easier to use the high-level Apache::Request methods to get it.

2.3. Cookies

The Apache::Request class provides some convenience methods for cookie handling via the cookie(), setCookie() and clearCookie() methods. cookie() takes as a single argument the name of the cookie and returns its value, if a cookie by that name exists. clearCookie() takes the name of a cookie and clears it. setCookie() creates a cookie. It takes the following arguments:

name: The name of the cookie
value: The value of the cookie
days: The days until the cookie expires
minutes: The minutes until the cookie expires (added to days)
path: The path of the cookie

If days and minutes are both -1, then setCookie() will set the cookie expiration date to the maximum — the year 2038. By default, the expiration is set to the duration of the current browser session. setCookie() automatically handles setting the domain name for you based on the current virtual host the request is running under. By default, the path parameter is set to the document root (/).

Consider the following example:

<%
@request = @env['request']
@request.setCookie('larry', '1')
@request.setCookie('mo',    @request.cookie('moe').to_i + 1)
@request.setCookie('curly', @request.cookie('curly').to_i + 1)

puts "Cookies: "
@request.cookies.each do |key, value|
  puts "  #{key}=#{value}" 
end
%>

This example is in http/cookies.rhtml. Click the Submit button and then refresh your browser once. It yields the following:

Cookies:
  mo=1
  curly=1
  larry=1

Each time you refresh the page, curly will increase monotonically, as the code feeds off the previous value, incrementing the cookie’s value.

Note

Remember that cookies are transmitted in HTTP headers. Therefore, once the headers have been sent to the client, setCookie() will not work. Thus the first call to @request.flush() will effectively disable setCookie() as it will send the headers out. Therefore, the safest thing to do is to always try to set your cookie(s) and headers before any content. Apache will buffer content (forestalling sending out headers), but the amount of buffering it will do before flushing is not something you can know for sure or bank on.

2.4. Redirection

Redirection can be done in two ways: external or internal. An external redirect is done by sending the client an HTTP 302 response and providing the URL through the response headers. For example, if you want to send a redirect to Google, you can do this in your view as follows:

<%
@request.headers['status']   = "302"
@request.headers['location'] = "http://www.google.com"
%>

The Apache::Request class will do this for you however by simply calling @request.redirect(url).

Internal redirects are done using the Apache::Request::internal_redirect(). This causes Apache to change the request internally and rerun the request handler, bypassing the trip back to the client. Apache just backs up a step and runs a new request. Remember that when this is done, the contents of the current @headers_out table will be flushed before processing the new, redirected request. If you need to transfer the contents, you can copy them to @err_headers_out and the subsequent request can get to them via @request.prev.err_headers_out().

The following is an example of both forms of redirection;

<%if @request.queries['type'] == 'external'%>
<meta http-equiv="refresh" content="5;url=http://www.ubuntu.com">
This is an external redirect. You will be sent to ubuntu.com in 5 seconds.
<%end%>

<%
if @request.queries['type'] == 'internal'
  @request.internal_redirect('/redirected.rhtml')
end
%>

<pre>
Click <a href="redirect.rhtml?type=internal">here</a> for internal redirect.
Click <a href="redirect.rhtml?type=external">here</a> for external redirect.

This example is in http/redirect.rhtml. The internal redirect goes to redirected.rhtml, and the external redirect to www.devuan.org.

Which method is preferable depends on what you are trying to do. If you have a request that will take a while and you want to put up a progress bar, external redirects may be the way to go. If you are having to take evasive action or just altogether change to a different request handler (or controller), and internal redirect may be better.

2.5. Request Termination

There are times when you may want to cease all processing of a request but you can’t easily stop the flow of control without adding a bunch of additional messy code. Consider the following example:

  def ensureAuthorized()    

    return true if @accessToken != nil

    # Else we are not authoized. Redirect to the FB authorization page.

    # This is a redirect hack needed for for iframes
    @request.puts "<script>top.location.href='#{redirect_uri}';</script>"

    # Indicates that we are not authorized and we need to stop further
    # processing of this request.
    return false

  end

  # Pull data from Facebook via graph API.
  def graph(path)

    # Make sure out FB accessToken is valid
    if ensureAuthorized() == false

      # Oops. Return false and hope that the caller knows what to do to properly
      # handle this error case ... and the caller aboe it, and above it ... all
      # the way up the stack.
      return false
    end

    # Code we would run if ensureAuthorized() succeeds
      .
      .
      .
  end

Here we have a method that needs to check whether we have the requisite access token from Facebook in order to proceed. It calls ensureAuthorized() to make this check. Well, if that method sees that we are not authorized, the way to go about it is to send a chunk of Javascript back to cause the browser to redirect, and then stop doing everything. The problem here, is that when ensureAuthorized() returns, that code will continue running, and we now have to shift into handling an edge case. So we write a conditional to check the result, and then just pass up the false value to the caller. But then what about all the caller? He has to deal with the false and pass it up to his caller, if there is one, and now we have to hope that every caller in the stack is prepared to deal with this edge case. We now have to write code all over the place to check for an error condition that should really just be contained here. But now all sorts of other code has to be aware ... all the way up every possible path in the stack. What a pain.

Wouldn’t it be nice if we could just say "STOP processing right here!" in ensureAuthorized()? If the user is not authorized, we do our redirect and have the server just send what its got back to the client? Well, this is exactly what the Request::terminate() is for. Internally it thows a unique exception (ModRuby::RequestTermination) which propagates up the stack to the ModRuby handler, which then catches it and does nothing but finalize the request. Whenever you call it, it just halts the code right there. Using it, we can rewrite our previous example as follows:

  def ensureAuthorized()    

    return true if @accessToken != nil

    # Else we are not authoized. Redirect to the FB authorization page.

    # This is a redirect hack needed for for iframes
    @request.puts "<script>top.location.href='#{redirect_uri}';</script>"

    # Terminate the request. Code stops here -- does not return to graph() --
    # everything just shuts down right here and the request is sent back to the
    # client.
    @request.terminate()
  end

  # Pull data from Facebook via graph API.
  def graph(path)

    # Make sure out FB accessToken is valid.
    ensureAuthorized()

    # No longer have to worry about return value. If ensureAuthorized() is
    # false, request does not get this far ... it just stops.

    # Code we would run if ensureAuthorized() succeeds
      .
      .
      .
  end

Much simpler. Now our edge case is handled here and nobody else has to know about it or code for it.

2.6. Output: Buffering and Content Generation

Within the context of RHTML and Ruby script pages, the content generated is collected in an output buffer managed by the handler handling the request. By default, anything from puts, print or whatever else that prints to standard out in Ruby is collected in that buffer. After a RHTML page or controller completes, the handler flushes the contents of the buffer through Apache, which in turn sends it on to the client.

Despite buffering, you still have access to the low-level output via the Apache::Request object. It can therefore subvert buffering if you wish and send content directly through the wire. The question is then why would you want buffering at all? The answer is in handling errors. There may be times when you don’t want half of a page to render correctly and then run into an error and then try to figure out what to do. Rather, you would want to be able to clear the generated content and perhaps send back a completely different page — an error page with a stack trace perhaps. Buffering allows you to completely change the outcome if and when you encounter and error.

Before the page is processed, the handler redirects Ruby standard out to a StringIO instance. This is buffer is accessible via the @request.out member. Thus, at any point if you want to clear out the output buffer, you just use the StringIO::reopen() method. Consider the following example:

Theoretically, this should show up in the content.
<%
puts 'This should as well'

# But is won't because of this:
@request.out.reopen('', 'a')
%>

Only this should render on a page.

This example is in http/clearbuffer.rhtml. Only the content after the call to StringIO::reopen() is rendered.

When you want to bypass the default output buffer and write directly to Apache, you use the Apache::Request object as well. There are several methods you can use depending on what you want to do. The equivalent of puts is Request::puts. The equivalent of print is Request::print or Request::rputs (which calls the same underlying C function ap_puts()). You can send binary data using Request::write() which requires both data and number of bytes. When you write with any of these, you completely bypass the output buffer and go directly to Apache. Consider the following example:

Theoretically, this line should come first, but it was buffered.

<%
@request.puts 'This was sent through Apache I/O'
@request.puts '<hr>'
%>

This example is in http/bypassbuffer.rhtml. The Apache I/O content comes first, followed by the page content.

By default, even output in Apache is buffered up to a point. Normally, you most likely don’t care as long as all of the output gets back to the client (which it will eventually). There are times however when you want to ensure that the content does get to the client at specific times. Sometimes you might have a long-running request and you want to send javascript chunks which update the screen, and perhaps ultimately send a META REFRESH to redirect the client.

This is where chunked encoding comes in handing. To switch Apache into chunked encoding mode, all you have to do is call @request.flush(), which calls the low-level ap_rflush() function in the C API. The first call will send out the headers and any buffered content in Apache. From this point on, you will still have buffered writes (in Apache) but you can force them out with subsequent calls to @request.flush(). To reiterate, the buffer being referred to here is the buffering within Apache, not the ModRuby page buffer (@request.out).

There may be times when you want to send not just content through the wire, but perhaps entire files. Apache::Request has a method called send_file() which takes as a single argument — the (relative or absolute) path of a file. It in turn calls the low-level Apache ap_send_file() which, on operating systems that support sendfile() or equivalent, sends the contents of the file back very efficiently. It can be called arbitrary many times in a given request, sending the content of multiple files out in the order that the method is called. To see this in action, consider the following example:

<%
@request.puts "<pre>"

@request.puts "\n/etc/password:\n"
@request.send_file('/etc/passwd')

@request.puts "\n/etc/shadow:\n"
@request.send_file('/etc/shadow')

@request.puts "\n/etc/group:\n"
@request.send_file('/etc/group')
%>

This example is in http/sendfile.rhtml. The /etc/passwd and /etc/group files make it, but /etc/shadow does not, because Apache is not running under a priviledged account (at least it better not be! — if you do see the contents, you may need to get a SysAdmin).

So to recap, the normal standard output in the framework is collected in a page buffer which is flushed after the request completes. This buffer is accessible to you via the @request.out member. It is a standard Ruby StringIO instance, so you can manipulate it however you want. Besides this, you can also use the native Apache I/O functions and send content directly over the wire. These functions completely bypass the request buffer. While they may be buffered to an extent in Apache, you can force them out over the network using the the @request.flush() method.

2.7. Logging

ModRuby provides access to native Apache logging via the Apache::Request::log(), which takes two arguments. The first is the log level, which can be on of the constants listed in Table 3.1, “Log Levels”. The second is a string containing the message text.

Table 3.1. Log Levels

Constant	Description	Example
`APLOG_EMERG`	Emergencies - system is unusable.	"Child cannot open lock file. Exiting"
`APLOG_ALERT`	Action must be taken immediately.	"getpwuid: couldn’t determine user name from uid"
`APLOG_CRIT`	Critical Conditions.	"socket: Failed to get a socket, exiting child"
`APLOG_ERR`	Error conditions.	"Premature end of script headers"
`APLOG_WARNING`	Warning conditions.	"child process 1234 did not exit, sending another SIGHUP"
`APLOG_NOTICE`	Normal but significant condition.	"httpd: caught SIGBUS, attempting to dump core in ..."
`APLOG_INFO`	Informational.	"Server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers)..."
`APLOG_DEBUG`	Debug-level messages	"Opening config file ..."

The error levels correspond to the LogLevel directive in the Apache configuration file. See the Apache documentation for more information.