Indexes #announcement

Covered most of what I was doing last time, and 100% of the vote wanted to know more so here goes:

SQL Indexes, they're specifically ordered subsets of the data otherwise contained within the tables. You have a Numeric ID on a table 1-100, you most likely have an index that has nothing but that ID in it to reference that ID as fast as possible. In fact most IDs will have something like that because it makes the random crap developers need to do pretty quick, along with any simple functionality. These indexed columns are the columns your index runs on, when you're looking to make a decision based on certain fields having them indexed means you consult the index and check it in a pre-determined manner rather than checking every piece of data to see if it fits the criteria. The index does a fair bit of the work for you that way, it tells you which record you're looking for as fast as possible if setup correctly.

But that's not all you can do with them, like the old library catalogs you may be too young to remember, there can also be other information stored as an index entry. Summary, publisher, and a whole bunch of other information would be available when you found the index card in the catalog. SQL Indexes can do the same thing. See, in all but the most basic queries just finding the item you're looking for isn't enough, you also need to find information related to that item. Which means you can include columns which are not themselves indexed, but are immediately available from the index when an item is located.

That is what explains the difference in both sizes and effectiveness of the two indexes I was bandying about last post. The small index only contains the columns which are indexed, once the record is found the database engine then has to go consult the table and pull the remaining details. The large index contains ALL of the relevant columns, which happens to be all of the large columns in the comments table, and returns them immediately. As mentioned before, the immediate trade offs are size of the index and speed at which the intended query runs. The large index doubles the size of the table as it contains all the same information, but it runs twice as fast as the smaller index that is a very small fraction of the table size.

However there is one other concern that'll crop up real quick, maintaining the index on a live server. Like a library catalog, when entries are added or modified the index needs to be updated within the space allowed or you need to rebuild everything, including all columns stored within the index if they're part of the specifically indexed columns or not. Using the large index as an example this would mean that every time a comment is added space for the comment, author and various IDs would have to be allocated, plus any time a comment is modified an existing entry would have to be modified, possibly forcing it out of the current order and causing extra over head in pulling the index together. At that point you could rebuild or reorganize the index to effectiveness by rebuilding the index, but the data would be either unavailable or available at a severely reduced rate for the duration.

As it sits, I'm going with the smaller index because it'll hold together for far longer and can be rebuilt in seconds rather than around five minutes and it's still considerably faster than the existing query. Providing a consistent speed is often as important as anything else with databases, and it would do far better in that regard.

In theory I have time to poke at some of the other core code for the site this weekend, we'll see how that works out. If it works out I may be digging up a temp domain to try out a new host and beat the new code around before all that much longer.

Indexes #announcement fstdt.com blog

5 comments

Confused?