smithvoice.com
 Y'herd thisun? 

“Want to hear the last-ever space shuttle landing sonic booms echoing across the Los Angeles basin? It's neat!”

from this av page by smith

MONO/Ubuntu part 13 - Virtual Host Tweak: unmanaged cAse sensitiviTy

TaggedASP.Net, Linux

It could be the hugest annoyance when a Microsoft developer tries Mono. Windows at its core is case insensitive. Linux is not.

This useless carryover from the early days of OSes and compilers affects all code that interacts with the file system, and all code done in VisualStudio using the automatic Propercase style that is then XCopied over to a 'nix box. Specifically for ASP.Net applications it is ready to bite you at any line that references a page, image, stylesheet, downloadable or AV streaming file and on and on. With ASP.Net via Mono on a non-Windows server you have to be absolutely case-correct or your users will get 404 errors.

Because no managed web app is 100% managed we have to address this at both the Apache and Mono layers with both configuration settings and ongoing human effort.

Setting the baseline

If you've set up 404 FileNotFound error handling in your web.config and virtual host files, go in and comment those lines out temporarily so that you can see full errors from the services.

Open a client browser and try to call up a page that doesn't yet exist. I'm requesting the file "case.html" and as long as I'm at it, I'll also ask about a version using the managed extension.

As expected we get back FileNotFound 404s. Do notice the fine print... Apache responds to the unmanaged *.htm/l request and Mono responds to the request for the *.aspx. We'll get to the managed resource issues in the next session.

Log onto your server and create the "case.html" file - all lower case.

sudo nano /home/kilgore/sites/asdf/case.html

Just give it text to say who it is - case sensitively.

Test various casings to see the ramifications.

I think that's stupid, but I guess that's just me. In forums the issue is often defended, for the (sarcasm alert) completely logical reason that a web site creator can make as many files as they "need" with different casings. That is true, under linux you can definitely do it.

Create another html file, this time all caps but the same characters.

sudo nano /home/kilgore/sites/asdf/CASE.HTML

In the file, just change the para text to "THIS IS CASE.HTML". Save, then list the folder.

And call both up.

The defenders are correct. Thanks to case sensitivity you can do it.

What a boon to solution development ;-).

Since I mentioned Linux forum posters, I've just gotta tell you this one.

Back when I was originally working on handling this "feature" I read a thread where one person told everyone that the topic was stupid to even bring up because: 'Even my granny knows that you have to type a url all lower case.'

Did you just fall out of your chair?

First, his statement took for granted that every developer always and without exception uses all lowercase for their file naming. To me, that statement can only lead to the conclusion that case-sensitivity itself is not a good thing. I mean, if 'intelligent people' just know to ONLY use all lower case, then shouldn't an intelligent OS - or at least an intelligent web server - always force file names to all lower case or ignore case altogether? Wouldn't that be a sensible productivity feature, since everyone always has to do it manually anyway?

Second, if this guy's granny had her capslock on by mistake and because of that could not reach a product page and thus could not buy the product then his logic is that it is granny's fault. Fine. She doesn't get what she wants and his company make no profit, everybody wins.

 

Fix trick 1: mod_speling (yes, that is correctly spelled)

Case sensitivity, when you think about it, is a spelling issue. When you press a keyboard key you send a message (a key code) for that specific key to the OS. It's up to the OS and interested programs to figure out what that key means.

Unfortunately, because Linux puts no effort into context of a string of characters and instead considers the characters to be an array of unrelated stuff, just as "a" is a different character code from "b", Linux considers "A" to be a totally different thing from "a".

Apache's designers did try to get around this issue and the advised fix is seen a number of tipsites and forums as the phrase "just turn on mod_speling."

mod_speling is an easy to use Apache module, you enable it at the commandline:

sudo a2enmod speling

And in your httpd.conf or Virtual Hosts file, add this line to switch it on (remember it is case-sensitive!) :

CheckSpelling on

Reload Apache and give it a try. Calling up our "http://www.asdf.com/case.html" and http://www.asdf.com/CASE.HTML" still works fine, but when you try to mix it up, you get a hint of the module's downside.

mod_speling was intended - with good intentions - to help users. By the docs, its rules appear ok.

  • If a user types a resource name exactly then all is good
  • If the user types the url with a single wrong character (case or spelling) and a file does exist that is only off by that character then Apache helps by sending them the "correct" file
  • If the user types a request with a case error andor spelling error and there are multiple files that could be a match then Apache shows the list.

Those three documented cases alone should make mod_speling acceptable. but there's another aspect of the module that's not as widely touted, file extensions are handled differently.

For a quick example, I've deleted the CASE.HTML file from the server leaving just the lower cased version and then tried to call up a url "http://www.asdf.com/case" with NO File Extention and also made a mistake of forgetting the "ml" part of the "html" extention.

So, even if you make it a rule that you will never mix cases (thus will never name two files the same in a folder), your users can still get that generic Apache "suggestion" page.

For some sites, such as intranet ones, a user seeing the bland page is fine, but most web developers don't like the "magic" or the out of character look of the suggestion page. mod_speling's "help" is full on or full off and the suggestion page is out of your control to customize or override.

If you enabled mod_speling, you can get rid of it by removing the line "CheckSpelling on" from the /etc/apache2/httpd.conf or /sites-available/ Virtual Hosts file. Then run the command "sudo a2dismod speling" and reload Apache. Don't forget to do the a2dismod, the default is for the function to be running when the module is enabled, so if you just remove the line from your virtual hosts file it's the same as explicitly setting it to "on".

A working fix.. with two prongs

  1. The guy in the boxed anecdote above was right about one thing. Because of the way linUx is designed, to avoid 404s you and everyone on your team has to follow a naming convention for file resources. camelCase and ProperCase being too subjective, ALL UPPER and all lower are the safe choices with lower being the tradition.
    • This naming convention has to be enforced in all directory and file names including those for images, styles, scripts, downloadables and streams.

      Accepting that this will at some point bite anyone who copies files over from the Propercase-defaulting Windows world is the first step to decreasing debugging session time when it happens.
    • The convention has to be enforced in all page code that references those resources.

      Accepting that this will at some point bite anyone who codes with the Propercase-defaulting VisualStudio is the second step to decreasing debugging session time when it happens.
  2. Have the web server force user request strings to be altered to match the naming convention before processing the file requests by name. In Windows IIS this is done with a custom ISAPI filter or the new IIS7 URLRewrite module, on Apache the module is part of the core and is aptly named "mod_rewrite"

 

The Apache Kahuna: mod_rewrite

mod_rewrite is a powerhouse. It's the gateway to SEO-friendly and "pretty" URLs for dynamic sites. My site is using it right now. The url you see in your browser does not match any existing files on my server. Instead, as that url comes to my Ubuntu, I use mod-rewrite to intercept the request, parse the URL pieces and use them as lookups to a database where the content physically exists. That way, the site really only has one physical page and site management is far eaiser.

What's that got to do with Case sensitivity? mod_rewrite lets you read, munge and mangle the incoming request string. And the easiest manglation of any string in most any code is a force to lower case.

First, enable the module with the command:

sudo a2enmod rewrite

Now open the site's Virtual Hosts file ...

sudo nano /etc/apache2/sites-available/asdf

Getting it working takes four steps.

  1. Use the "RewriteEngine on" command to enable the module in the Vhost
  2. Use the "RewriteMap" command to alias the module's built-in tolower function to a custom method name (call your alias anything, I'll use "manglelow" for the example)
  3. Use the "RewriteCond" command to specify what parts of what string to act upon. We'll pass all [A-Z] letters to the variable named $1.
  4. Lastly, do the substitution with a regex pattern for the input and the method alias in the output.

Give that a try with a mixed case request and you see that you're almost there.

The browser retains the casing that the user typed. If you want to force the requester to know the "correct" url including the correct file name, which is all lower case, you just have to tack on a 301 redirect message:

The R=301 in brackets does the trick. The mixed case url requests will now show in the browser "correctly." (btw: the "L" after the comma is just a flag to tell the rewrite engine to stop processing if it matched the rule, best added to the last rule in a set even if there is only one rule.)

The above use of mod_rewrite for gets past the linux case sensitivity. Too bad you have to go to all that work just to get logical behaviour out of the OS, isn't it?

Logging rewrites

One last thing before we move on to the option of Mono-managed case-sensitivity - mod_rewrite is extremely powerful and extremely documented but it's full of logic landmines that can do nasty things to your site usability. Be careful with it, do your best to test like a user and until you are sure it's doing what you want, log its actions and check those logs.

Setting up a rewrite log takes only picking a location and adding a couple more lines in your VHost file.

RewriteRule ^/(.*)$ /${manglelow:$1} [R=301,L]
#log the rewrites during testing, remove before release
RewriteLog /home/kilgore/logs/rewrite.log
RewriteLogLevel 9
</VirtualHost>

"RewriteLog" servers two purposes, it's used to point the logging to a path for the file (I've created a "logs" folder in /home/kilgore/, the file rewrite.log will be created automatically as needed), and the existence of the line itself turns on the logging.

The next line is "RewriteLogLevel", this sets the verbosity of the logging and the values are from 0 (the default) to 9 (most verbose). Not adding this line is the same as adding it with a value of zero; careful with this because while not adding it (or setting it to zero) will likely not put anything in the log file, the logging effort is still taking place. That effort can be a drag on your site's performance

Here's an example of the log content for one hit of our demo page, with the verbosity set to 9.

 

172.121.0.16 [04/Apr/2009:09:19:06 --0700] [www.asdf.com/sid#b][rid#b/initial] (2) init rewrite engine with requested uri /Case.HTML
172.121.0.16 [04/Apr/2009:09:19:06 --0700] [www.asdf.com/sid#b][rid#b/initial] (3) applying pattern '^/(.*)$' to uri '/Case.HTML'
172.121.0.16 [04/Apr/2009:09:19:06 --0700] [www.asdf.com/sid#b][rid#b/initial] (4) RewriteCond: input='Case.HTML' pattern='[A-Z]' => matched
172.121.0.16 [04/Apr/2009:09:19:06 --0700] [www.asdf.com/sid#b][rid#b/initial] (5) map lookup OK: map=manglelow key=case.html -> val=case.html
172.121.0.16 [04/Apr/2009:09:19:06 --0700] [www.asdf.com/sid#b][rid#b/initial] (2) rewrite '/Case.HTML' -> '/case.html'
172.121.0.16 [04/Apr/2009:09:19:06 --0700] [www.asdf.com/sid#b][rid#b/initial] (2) explicitly forcing redirect with http://www.asdf.com/case.html
172.121.0.16 [04/Apr/2009:09:19:06 --0700] [www.asdf.com/sid#b][rid#b/initial] (1) escaping http://www.asdf.com/case.html for redirect
172.121.0.16 [04/Apr/2009:09:19:06 --0700] [www.asdf.com/sid#b][rid#b/initial] (1) redirect to http://www.asdf.com/case.html [REDIRECT/301]
172.121.0.16 [04/Apr/2009:09:19:06 --0700] [www.asdf.com/sid#c][rid#c/initial] (2) init rewrite engine with requested uri /case.html
172.121.0.16 [04/Apr/2009:09:19:06 --0700] [www.asdf.com/sid#c][rid#c/initial] (3) applying pattern '^/(.*)$' to uri '/case.html'
172.121.0.16 [04/Apr/2009:09:19:06 --0700] [www.asdf.com/sid#c][rid#c/initial] (4) RewriteCond: input='case.html' pattern='[A-Z]' => not-matched
172.121.0.16 [04/Apr/2009:09:19:06 --0700] [www.asdf.com/sid#c][rid#c/initial] (1) pass through /case.html

 

You can see every step of the rewrite process, including the added load of using the R=301 redirect which causes a second full hit to the site.

These logged events are only related to our simple case.html page, which has no images or external styles or external javascript needs. Remember though that every such reference in a more typical web page is going to be a separate request made to the server and each one will be processed individually by your mod_rewrite code. Because of that, you'll want to hone your rewrite rules so that they run as efficiently as possible. Also, you will want to turn off the rewrite logging before you put your server into production so that your disk space isn't eaten up in the first day.

To turn logging off fully remove both log related lines (or comment them out by putting a "#" character in front of them), reload/restart Apache and hit a few pages to make sure that no new log lines are being added to the files.


Coming next:Managed cAse sensitiviTy .


jump to:

  • 1: Why?
  • 2: Installation
  • 3: Update the OS with APT
  • 4: Remoting to the box with SSH
  • 5: NANO quickies for 95% of the jobs
  • 6: Firewalling Ubuntu 8.10 Server
  • 7: Installing Apache2 and MONO
  • 8: Test client host files
  • 9: Configure MONO on Apache2
  • 10: Apache default pages
  • 11: Handling Apache and Mono Errors
  • 12: Subdomains and Christian porn
  • 13: Virtual Host Tweak: unmanaged cAse sensitiviTy
  • 14: Managed cAse sensitiviTy
  • 15: Managing files and folders
  • 16: Logging Apache Accesses and Errors
  • 17: Ubuntu Task Managers
  • 18: Ubuntu Services and Service Managers
  • 19: Installing Oracle 10g XE Server
  • 20: Connecting Mono to Oracle


  • home     who is smith    contact smith     rss feed π
    Since 1997 a place for my stuff, and it if helps you too then all the better smithvoice.com