Some background
I've some security needs on a shared hosting web server at work and I've ended up installing Apache-mpm-itk in place of my old vanilla Apache server. MPM-ITK is a piece of software (a set of patches in fact) you apply onto Apache source code to change it's natural behavior.
Out of the box, Apache spawns a handful of children httpd
belonging to user www:www or whatever your config is using. Only the parent httpd
belongs to root.
Hence, every single httpd
must be able to read (and sometimes write) web site files. Now imagine you exploit a vulnerability into a php CMS, successfully injecting a php shell. Now through this php shell, you are www
on the server, you can do everything www
can, and it's bad, because you can even hack the other web sites of the server that have no known vulnerability.
With MPM-ITK, Apache spawns a handfull of master processes running as root
, and accordingly to your config files, each httpd
serving a particular virtual host or directory will switch from root
to a user:group
of your choice. So, one httpd
process currently serving files from web site "foo" cannot access file from web site "bar": an attacker exploiting a vulnerability of one particular web site won't be able to hack every other web site on the server.
More background
That's a world you could dream of. In real world, that's not so simple. In particular, you'll start having troubles as soon as you make use of fancy features, especially when you fail to provide a dedicated virtual host per user.
On the shared server we host about 35 vhosts for 250 web sites, and we can't afford to provide every user with his dedicated vhost. The result is a given virtual host with a default value for the fallback user:group
(say www:www), and each web site configured via Directory
to use a different dedicated user.
When a client GET
a resource (web page, img, css...) it generally keeps the connection opened with the httpd process. And it can happen that a resource linked from a web page sits into another directory, belonging to another user. The httpd
process has already switched from root
to user1
to serve the web page, it can't switch to user2
to serve the linked image from user2's directory. So Apache drops the connection, spawns a new httpd
process, switches to user2
, and serves the requested resource.
When it happens, you can read things like this into your Apache error log:
[warn] (itkmpm: pid=38130 uid=1002, gid=80) itk_post_perdir_config():
initgroups(www, 80): Operation not permitted
[warn] Couldn't set uid/gid/priority, closing connection.
That's perfectly "legal" behavior, don't be afraid, unless you read hundreds of new warning every minute.
If you host various web sites, belonging to various users, into the same vhost, you're likely to see many of these triggered by the /favicon.ico
request.
Where it just breaks
When things are getting ugly is the moment a user tries to use one of your available mod_auth* variant to add some user authentication (think .htaccess
). Remember, I host many web sites in a single vhost, each one into its own directory with its own user:group
.
Suddenly every single visitor trying to access the protected directory or subdirectory is disconnected. Their http client reports something like this:
the server unexpectedly dropped the connection...
and nothing else is available. The error, server-side, is the same initgroups
error as above, and it does not help at all. How would you solve this? truss
is your friend.
Where I fix it
One thing I love about FreeBSD is the availability of many powerful tools out of the box. When I need to track down a strange software behavior, I feel very comfortable on FreeBSD (it doesn't mean I'm skilled). truss
is one of my favorites, it's simple, straightforward and powerful.
What you need to use truss
is the PID of your target. With Apache + MPM-ITK, processes won't stay around very long, and you can't tell which one you will connect to in advance. So the first step is to buy yourself some precious seconds so that you can get the PID of your target before the httpd
process dies. Remember, it dies as soon as the .htaccess
file is parsed. Being in production, I could not just kill everything and play alone with the server, so I choose another way. I've created a php script that would run for few seconds before ending. Server side, I've prepared a shell command that would install the .htaccess
file I need to test, and start truss
while grabbing the PID
of my target. On FreeBSD, something like this should do the trick:
cd /path/to/user1/web/site
mv .htaccess_inactive .htaccess && truss -p $(ps auxw|awk '/^user1/ {print $2}')
First http GET request, the .htaccess
file is not present, an httpd
process switches from root
to user1
, starts serving the php script. I launch my command server-side: it puts .htaccess
in place, gets the PID
of my httpd
process, and starts truss
.
The php script ends and returns its result, client-side I refresh immediately (second GET request), so that I stay on the same httpd
process. My client is disconnected as soon as the httpd
process has parsed the .htaccess
file. At this point, truss
should already be dead. I've the complete trace of the event. The best is to read the trace backward from the point where httpd
process issue an error about changing UID
or GID
:
01: setgroups(0x3,0x80a8ff000,0x14,0x3,0x566bc0,0x32008)
ERR#1 'Operation not permitted'
02: getgid() = 80 (0x50)
03: getuid() = 8872 (0x22a8)
04: getpid() = 52942 (0xcece)
05: gettimeofday({1364591872.453335 },0x0) = 0 (0x0)
06: write(2,"[Fri Mar 29 22:17:52 2013] [warn"...,142) = 142 (0x8e)
07: gettimeofday({1364591872.453583 },0x0) = 0 (0x0)
08: write(2,"[Fri Mar 29 22:17:52 2013] [warn"...,85) = 85 (0x55)
09: gettimeofday({1364591872.453814 },0x0) = 0 (0x0)
10: shutdown(51,SHUT_WR) = 0 (0x0)
Line 01 is the one I'm looking for: the httpd
process is trying to change groups and fails, line 02 to 05 it's gathering data for the log entry, line 06 it's writing the error to the log file. 07 & 08: same deal for the second line of log.
From that point in time, moving up shows that it tried to access an out-of-directory resource, and that resource is an html error page! Of course, it makes sense, and it's an hard slap on the head (RTFM!).
01: stat("/user/user1/public_html/bench.php",{
mode=-rw-r--r-- ,inode=4121,size=7427,blksize=7680 }) = 0 (0x0)
02: open("/user/user1/public_html/.htaccess",0x100000,00) = 53 (0x35)
03: fstat(53,{ mode=-rw-r--r-- ,inode=4225,size=128,blksize=4096 }) = 0 (0x0)
04: read(53,"AuthType Basic\nAuthName "Admin "...,4096) = 128 (0x80)
05: read(53,0x80a8efd88,4096) = 0 (0x0)
06: close(53) = 0 (0x0)
07: open("/user/user1/public_html/bench.php/.htaccess",0x100000,00)
ERR#20 'Not a directory'
08: getuid() = 8872 (0x22a8)
09: getgid() = 80 (0x50)
10: stat("/usr/local/www/apache22/error/HTTP_UNAUTHORIZED.html.var",{
mode=-rw-r--r-- ,inode=454787,size=13557,blksize=16384 }) = 0 (0x0)
11: lstat("/usr/local/www/apache22/error/HTTP_UNAUTHORIZED.html.var",{
mode=-rw-r--r-- ,inode=454787,size=13557,blksize=16384 }) = 0 (0x0)
12: getuid() = 8872 (0x22a8)
13: setgid(0x50,0x805d43d94,0x64,0x800644767,0x101010101010101,0x808080808080
8080) = 0 (0x0)
line 13 shows the beginning of setgid
process, and 10/11 shows the culprit. Up from here is the regular processing of the .htaccess file.
RTFM
When you use mod_auth* to present visitors with authentication, the server issues an error, and most of the time, this error is sent to the client with a dedicated header, and a dedicated html document (think "404"). When the error is about authentication (error 401), most clients hide the html part, and present the user with an authentication popup.
But the html part is almost always a physical file somewhere in the server directory tree. And it's this particular file the httpd
process was trying to reach, issuing an initgroups
command, and dying for not being allowed to switch users.
I've found in my Apache config the definition of ErrorDocument
:
ErrorDocument 400 /error/HTTP_BAD_REQUEST.html.var
ErrorDocument 401 /error/HTTP_UNAUTHORIZED.html.var
ErrorDocument 403 /error/HTTP_FORBIDDEN.html.var
...
and replaced them all by a file-less equivalent, so Apache won't have any error file to read and will just send a plain ASCII error body (it saves bandwidth too):
ErrorDocument 400 "400 HTTP_BAD_REQUEST"
ErrorDocument 401 "401 HTTP_UNAUTHORIZED"
ErrorDocument 403 "403 HTTP_FORBIDDEN"
...
I've restarted Apache, and authentication from mod_auth* started to work as usual.
Same approach applies to almost any mpm-itk problem when it's related to a connection loss with Couldn't set uid/gid/priority, closing connection
error log. You locate the resource that makes your server fail, and you find a way to fix the issue.