The Apache::Request
class is the principal class
you will use to do most of the low-level work, as it contains the most common
information (queries, POST parameters, cookies, etc.) which in turn comes
primarily from the HTTP headers and/or the CGI environment.
As you now understand how APR::Table
s work, let’s
take an opportunity to explore how they and everything else fits together. We
will look specifically specifically at how HTTP headers are managed. If you
understand this, everything else is easy. This demonstrates not only the
subtleties of using APR tables, but also the core linkage between the APR
classes and the Apache::Request
classes. HTTP headers (in
and out) are a large component of handling a request, second only to content,
which we will look at later.
To start out, let’s look at part of the Ruby implementation of the
Apache::Request
class:
module Apache class Request def setCookie( name, value, days=0, minutes=0, path=nil ) path = '/' if path == nil cookieString = "%s=%s;path=%s;domain=.%s" cookieString = cookieString % [name, value, path, domainName()] if days != 0 or minutes != 0 if days==-1 and minutes==-1 cookieString += ";Expires=Sun, 17-Jan-2038 19:14:07 -0600" else cookieString += ";Expires=%s" % expirationDate(days, minutes) end end # Append this cookie to the array self.headers_out.add('Set-Cookie', cookieString) end # Redirect the client to url. def redirect(url) # Set the Apache req.status self.set_status(302) # Also set it in the headers, just in case self.headers_out['Status'] = "302" self.headers_out['Location'] = url # This will terminate subsequent request processing raise ModRuby::Redirect.new(url) end end
Look at setCookie()
. Cookies are stored in the
@headers_out
member, which is where all HTTP headers are
stored. This is an APR::Table
object. All cookie values
are stored in a list associated with the key "Set-Cookie". Similarly, the
redirect()
, expires()
, and
dontCache()
methods all operate on the
@headers_out
table as well. Note that these are outgoing
headers — response headers, headers sent back to the client.
So at some point these headers have to be shipped out — sent back to the client. And they have to be sent before any of the content. This happens in one of two ways:
Explicitly: Using the
Request::flush()
method will explicitly send them on
their way.
Implicitly: Sending any kind of content will internally trigger Apache to first send the headers out.
Look at the implementation of flush()
. It calls
self.rflush()
. This is where we cross into the C
implementaion of Apache::Request
. This is basically a
wrapper over the ap_rflush()
function in Apache C API. When
this is called, the headers are automatically flushed and sent to the client.
Note also that the Apache::Request
class has both a
headers_out()
and
err_headers_out()
. Each returns a reference to the
APR::Table
in the Apache request struct. They are two
distinctly different tables, and it is important that you understand the
difference. When an error or internal redirect takes place, the
@headers_out
are cleared out and only the values in
@err_headers_out
are retained.
It mainly centers around the HTTP status code sent back to the
client. Error fields are sent back if the module aborts or returns an error
status code. So if the status code is success (200), then the contents of
@headers_out
are sent back as the HTTP headers. However,
for any other status code the
@err_headers_out
are used. This has very important
implications. For example, in doing a HTTP redirect, you have to use the error
headers rather than the regular headers because that employs returning a 302
status code. The only exception to this is when you redirect by setting
location
header, which is a special case, and then you may
use @headers_out
. That is why the
Apache::Request::copyErrorHeaders()
method exists. It
is a convenience function that copies all entries in
@headers_out
to @err_headers_out
.
So just remember, if you manually set the HTTP status code, you need to
make sure that you copy any relevant headers you want to send back to the error
headers and not to the regular headers (as they will not be sent in this case),
the only exception being if you set the location
header.
And that’s the story — HTTP headers. If you understand this,
everything else is cake, as you can now see the pathway from the high-level
Apache::Request
object all the way into the
APR::Table
s of the Apache C API. Think of the
Apache::Request
class as one big API that contains the
low-level API to do pretty much everything.
Apache request objects have what is called an "environmental table." This
is an internal table which holds the environmental variables set for the web
server process. This happens to be the canonical way that web servers pass HTTP
parameters to CGI processes. As such, the Apache::Request
object contains both standard CGI
environment variables and HTTP
headers for convenience. As this behavior is optional in Apache,
internally, the ModRuby module calls two Apache C API functions
(ap_add_cgi_vars()
and
ap_add_common_vars()
) which cause the Apache to load CGI
variables into the request’s environmental table. The high-level data structure
containing this information is the Apache::Request::cgi
member which in turn is a APR::Table
. The following code
illustrates this:
<% @request.cgi.each do |key, value| puts "%20s: %s" % [key, value] end %>
This example is in http/cgi.rhtml. It yields the following:
VLOG: moduby GATEWAY_INTERFACE: CGI/1.1 SERVER_PROTOCOL: HTTP/1.1 REQUEST_METHOD: GET QUERY_STRING: REQUEST_URI: /http/cgi.rhtml SCRIPT_NAME: /http/cgi.rhtml HTTP_HOST: www.modruby.org HTTP_USER_AGENT: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3pre) HTTP_ACCEPT: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 HTTP_ACCEPT_LANGUAGE: en-us,en;q=0.5 HTTP_ACCEPT_ENCODING: gzip,deflate HTTP_ACCEPT_CHARSET: ISO-8859-1,utf-8;q=0.7,*;q=0.7 HTTP_KEEP_ALIVE: 300 HTTP_CONNECTION: keep-alive HTTP_REFERER: http://www.modruby.org/ HTTP_COOKIE: test=1; sid=439sdkkfdjks; galations=6:1-10 HTTP_CACHE_CONTROL: max-age=0 PATH: /usr/local/bin:/usr/bin:/bin SERVER_SIGNATURE: Apache/2.2.8 (Ubuntu) Server at www.modruby.org Port 80 SERVER_SOFTWARE: Apache/2.2.8 (Ubuntu) SERVER_NAME: www.modruby.org SERVER_ADDR: 127.0.0.1 SERVER_PORT: 80 REMOTE_ADDR: 127.0.0.1 DOCUMENT_ROOT: /var/www/modruby-example/content SERVER_ADMIN: webmaster@modruby.org SCRIPT_FILENAME: /var/www/modruby-example/content/http/cgi.rhtml REMOTE_PORT: 51183
There can be more or less variables depending on the OS and the nature of the request (whether there are cookies are not, etc.).
Queries and parameters are stored in @request.queries
and
@request.params
, both of which are APR tables in the
Apache::Request
object. Again, you can iterate over APR
tables just like a Ruby hash:
<form name="input" action="post.rhtml?queryarg=junior+mints" method="post"> <input type="text" name="text" value="jujifruit"> <input type="submit" value="Submit"> </form> <pre> <% puts "Queries: " queries = @request.queries() queries.each do |key, value| puts " #{key}=#{value}" end puts "Params: " params = @request.params params.each do |key, value| puts " #{key}=#{value}" end %>
This example is in
http/post.rhtml. Click the Submit
button. It yields
the following:
Queries: queryarg=junior mints Params: text=jujifruit
Parameters that come from POST methods have content type
application/x-www-form-urlencoded
. For all other content
types, the value of @request.params
will be
nil
. This means you will have to access the raw content via
@request.content()
and process it yourself, be it XML,
JSON, etc.
Additionally, the Apache::Request
class has three
shotgun methods called value()
,
values()
, and
hasValue?()
. Their purpose is to provide a single
interface with which to pull a value from any of the queries, parameters, or CGI
variables (in that order). value()
and
values()
are essentially the same except the former
always returns the result in scalar form and the latter in array
form. hasValue?()
returns a boolean of whether or not
there is any such value for a given name in any of the respective sources.
Again, keep in mind that all of this information — queries, cookies
and other various stuff (all but POST parameters) — is available straight
from the CGI interface (@request.cgi
). But it can be easier to
use the high-level Apache::Request
methods to get
it.
The Apache::Request
class provides some convenience
methods for cookie handling via the cookie()
,
setCookie()
and clearCookie()
methods. cookie()
takes as a single argument the name
of the cookie and returns its value, if a cookie by that name
exists. clearCookie()
takes the name of a cookie and
clears it. setCookie()
creates a cookie. It takes the
following arguments:
name
: The name of the cookie
value
: The value of the cookie
days
: The days until the cookie expires
minutes
: The minutes until the cookie expires (added to days)
path
: The path of the cookie
If days
and minutes
are both -1,
then setCookie()
will set the cookie expiration date to
the maximum — the year 2038. By default, the expiration is set to the
duration of the current browser session. setCookie()
automatically handles setting the domain name for you based on the current
virtual host the request is running under. By default, the
path
parameter is set to the document root
(/
).
Consider the following example:
<% @request = @env['request'] @request.setCookie('larry', '1') @request.setCookie('mo', @request.cookie('moe').to_i + 1) @request.setCookie('curly', @request.cookie('curly').to_i + 1) puts "Cookies: " @request.cookies.each do |key, value| puts " #{key}=#{value}" end %>
This example is in
http/cookies.rhtml. Click the Submit
button and then
refresh your browser once. It yields the following:
Cookies: mo=1 curly=1 larry=1
Each time you refresh the page, curly
will increase
monotonically, as the code feeds off the previous value, incrementing the
cookie’s value.
Remember that cookies are transmitted in HTTP headers. Therefore,
once the headers have been sent to the client,
setCookie()
will not work. Thus the first call to
@request.flush()
will effectively disable
setCookie()
as it will send the headers out. Therefore,
the safest thing to do is to always try to set your cookie(s) and headers before
any content. Apache will buffer content (forestalling sending out headers), but
the amount of buffering it will do before flushing is not something you can know
for sure or bank on.
Redirection can be done in two ways: external or internal. An external redirect is done by sending the client an HTTP 302 response and providing the URL through the response headers. For example, if you want to send a redirect to Google, you can do this in your view as follows:
<% @request.headers['status'] = "302" @request.headers['location'] = "http://www.google.com" %>
The Apache::Request
class will do this for you
however by simply calling
@request.redirect(url)
.
Internal redirects are done using the
Apache::Request::internal_redirect()
. This causes
Apache to change the request internally and rerun the request handler, bypassing
the trip back to the client. Apache just backs up a step and runs a new
request. Remember that when this is done, the contents of the current
@headers_out
table will be flushed before processing the new,
redirected request. If you need to transfer the contents, you can copy them to
@err_headers_out
and the subsequent request can get to them
via @request.prev.err_headers_out()
.
The following is an example of both forms of redirection;
<%if @request.queries['type'] == 'external'%> <meta http-equiv="refresh" content="5;url=http://www.ubuntu.com"> This is an external redirect. You will be sent to ubuntu.com in 5 seconds. <%end%> <% if @request.queries['type'] == 'internal' @request.internal_redirect('/redirected.rhtml') end %> <pre> Click <a href="redirect.rhtml?type=internal">here</a> for internal redirect. Click <a href="redirect.rhtml?type=external">here</a> for external redirect.
This example is in
http/redirect.rhtml. The internal redirect goes to
redirected.rhtml
, and the external redirect to
www.devuan.org
.
Which method is preferable depends on what you are trying to do. If you have a request that will take a while and you want to put up a progress bar, external redirects may be the way to go. If you are having to take evasive action or just altogether change to a different request handler (or controller), and internal redirect may be better.
There are times when you may want to cease all processing of a request but you can’t easily stop the flow of control without adding a bunch of additional messy code. Consider the following example:
def ensureAuthorized() return true if @accessToken != nil # Else we are not authoized. Redirect to the FB authorization page. # This is a redirect hack needed for for iframes @request.puts "<script>top.location.href='#{redirect_uri}';</script>" # Indicates that we are not authorized and we need to stop further # processing of this request. return false end # Pull data from Facebook via graph API. def graph(path) # Make sure out FB accessToken is valid if ensureAuthorized() == false # Oops. Return false and hope that the caller knows what to do to properly # handle this error case ... and the caller aboe it, and above it ... all # the way up the stack. return false end # Code we would run if ensureAuthorized() succeeds . . . end
Here we have a method that needs to check whether we have the requisite
access token from Facebook in order to proceed. It calls
ensureAuthorized()
to make this check. Well, if that
method sees that we are not authorized, the way to go about it is to send a
chunk of Javascript back to cause the browser to redirect, and then
stop doing everything. The problem here, is that when
ensureAuthorized()
returns, that code will continue
running, and we now have to shift into handling an edge case. So we write a
conditional to check the result, and then just pass up the false value to the
caller. But then what about all the caller? He has to deal with the false and
pass it up to his caller, if there is one, and now we have to hope that every
caller in the stack is prepared to deal with this edge case. We now have to
write code all over the place to check for an error condition that should really
just be contained here. But now all sorts of other code has to be aware ... all
the way up every possible path in the stack. What a pain.
Wouldn’t it be nice if we could just say "STOP processing right here!" in
ensureAuthorized()
? If the user is not authorized, we
do our redirect and have the server just send what its got back to the client?
Well, this is exactly what the Request::terminate()
is
for. Internally it thows a unique exception
(ModRuby::RequestTermination
) which propagates up the stack
to the ModRuby handler, which then catches it and does nothing but finalize the
request. Whenever you call it, it just halts the code right there. Using it, we
can rewrite our previous example as follows:
def ensureAuthorized() return true if @accessToken != nil # Else we are not authoized. Redirect to the FB authorization page. # This is a redirect hack needed for for iframes @request.puts "<script>top.location.href='#{redirect_uri}';</script>" # Terminate the request. Code stops here -- does not return to graph() -- # everything just shuts down right here and the request is sent back to the # client. @request.terminate() end # Pull data from Facebook via graph API. def graph(path) # Make sure out FB accessToken is valid. ensureAuthorized() # No longer have to worry about return value. If ensureAuthorized() is # false, request does not get this far ... it just stops. # Code we would run if ensureAuthorized() succeeds . . . end
Much simpler. Now our edge case is handled here and nobody else has to know about it or code for it.
Within the context of RHTML and Ruby script pages, the content generated
is collected in an output buffer managed by the handler handling the request. By
default, anything from puts
, print
or
whatever else that prints to standard out in Ruby is collected in that
buffer. After a RHTML page or controller completes, the handler flushes the
contents of the buffer through Apache, which in turn sends it on to the
client.
Despite buffering, you still have access to the low-level output via the
Apache::Request
object. It can therefore subvert
buffering if you wish and send content directly through the wire. The question
is then why would you want buffering at all? The answer is in handling
errors. There may be times when you don’t want half of a page to render
correctly and then run into an error and then try to figure out what to
do. Rather, you would want to be able to clear the generated content and perhaps
send back a completely different page — an error page with a stack trace
perhaps. Buffering allows you to completely change the outcome if and when you
encounter and error.
Before the page is processed, the handler redirects Ruby standard out to a
StringIO
instance. This is buffer is accessible via the
@request.out
member. Thus, at any point if you want to clear out
the output buffer, you just use the StringIO::reopen()
method. Consider the following example:
Theoretically, this should show up in the content. <% puts 'This should as well' # But is won't because of this: @request.out.reopen('', 'a') %> Only this should render on a page.
This example is in
http/clearbuffer.rhtml. Only the content after the call to
StringIO::reopen()
is rendered.
When you want to bypass the default output buffer and write directly to
Apache, you use the Apache::Request
object as well. There
are several methods you can use depending on what you want to do. The equivalent
of puts
is Request::puts
. The
equivalent of print
is
Request::print
or
Request::rputs
(which calls the same underlying C
function ap_puts()
). You can send binary data using
Request::write()
which requires both data and number of
bytes. When you write with any of these, you completely bypass the output
buffer and go directly to Apache. Consider the following example:
Theoretically, this line should come first, but it was buffered. <% @request.puts 'This was sent through Apache I/O' @request.puts '<hr>' %>
This example is in http/bypassbuffer.rhtml. The Apache I/O content comes first, followed by the page content.
By default, even output in Apache is buffered up to a point. Normally, you most likely don’t care as long as all of the output gets back to the client (which it will eventually). There are times however when you want to ensure that the content does get to the client at specific times. Sometimes you might have a long-running request and you want to send javascript chunks which update the screen, and perhaps ultimately send a META REFRESH to redirect the client.
This is where chunked encoding comes in handing. To switch Apache into
chunked encoding mode, all you have to do is call
@request.flush()
, which calls the low-level
ap_rflush()
function in the C API. The first call will send
out the headers and any buffered content in Apache. From this point on, you will
still have buffered writes (in Apache) but you can force them out with
subsequent calls to @request.flush()
. To reiterate, the
buffer being referred to here is the buffering within Apache, not the ModRuby page
buffer (@request.out
).
There may be times when you want to send not just content through the
wire, but perhaps entire files. Apache::Request
has a
method called send_file()
which takes as a single
argument — the (relative or absolute) path of a file. It in turn calls the
low-level Apache ap_send_file()
which, on operating systems
that support sendfile() or
equivalent, sends the contents of the file back very efficiently. It can be
called arbitrary many times in a given request, sending the content of multiple
files out in the order that the method is called. To see this in action,
consider the following example:
<% @request.puts "<pre>" @request.puts "\n/etc/password:\n" @request.send_file('/etc/passwd') @request.puts "\n/etc/shadow:\n" @request.send_file('/etc/shadow') @request.puts "\n/etc/group:\n" @request.send_file('/etc/group') %>
This example is in
http/sendfile.rhtml. The /etc/passwd
and
/etc/group
files make it, but
/etc/shadow
does not, because Apache is not running under a
priviledged account (at least it better not be! — if you do see the
contents, you may need to get a SysAdmin).
So to recap, the normal standard output in the framework is collected in a
page buffer which is flushed after the request completes. This buffer is
accessible to you via the @request.out
member. It is a standard
Ruby StringIO
instance, so you can manipulate it however
you want. Besides this, you can also use the native Apache I/O functions and
send content directly over the wire. These functions completely bypass the
request buffer. While they may be buffered to an extent in Apache, you can force
them out over the network using the the @request.flush()
method.
ModRuby provides access to native Apache logging via the
Apache::Request::log()
, which takes two arguments. The
first is the log level, which can be on of the constants listed in Table 3.1, “Log Levels”. The second is a string containing the message text.
Table 3.1. Log Levels
Constant | Description | Example |
---|---|---|
APLOG_EMERG | Emergencies - system is unusable. | "Child cannot open lock file. Exiting" |
APLOG_ALERT | Action must be taken immediately. | "getpwuid: couldn’t determine user name from uid" |
APLOG_CRIT | Critical Conditions. | "socket: Failed to get a socket, exiting child" |
APLOG_ERR | Error conditions. | "Premature end of script headers" |
APLOG_WARNING | Warning conditions. | "child process 1234 did not exit, sending another SIGHUP" |
APLOG_NOTICE | Normal but significant condition. | "httpd: caught SIGBUS, attempting to dump core in ..." |
APLOG_INFO | Informational. | "Server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers)..." |
APLOG_DEBUG | Debug-level messages | "Opening config file ..." |
The error levels correspond to the LogLevel
directive
in the Apache configuration file. See the Apache documentation for more
information.