I've come across a table that has about 200 columns. About 150 of these can be grouped into 5-10 tables that make real world "sense", and seeing as most of these entries are never used I figured it would save a lot of null pointers and reduce the size of the database drastically if I did this.
e.g. lets say the current main table has these entries:
Id | Person | DOB | Address | FaveColour | LeastFaveColour | MoreColourOpinions
------------------------------------------------------------------------------
1 Jim 1992 Here null null null
2 Bob 1991 There Brown Orange I like purple
3 Bill 1990 Everywhere null null null
So here you might have guessed that I would split the columns relating to colour into a separate table.
Id | Person | DOB | Address
-----------------------------
1 Jim 1992 Here
2 Bob 1991 There
3 Bill 1990 Everywhere
PersonId | FaveColour | LeastFaveColour | MoreColourOpinions
------------------------------------------------------------
2 Brown Orange I like purple
Now, I know that it's totally fine to have 1-1 tables, but my question relates to speed - what's going to be the difference between querying the original gargantuan table vs. querying all the separate tables left joined together?
Let's say the table has half a million rows and I want to query on one thing from EVERY group, e.g.
Select * from Person p
left join ColourOpinions co on p.Id = co.PersonId
-- add another ten+ left joins
where co.FaveColour = 'Brown'
-- and another filter, one for each of the ten+ joins
I assume that querying the original table will be faster, because there's no joins to be made, for all those joins I'm basically recreating the original table before querying it... but how much slower will it be? Is it a good idea to split this table up?
I'm thinking yes because the speed of querying smaller tables and joins separately, as well as the database size difference, will totally offset any occasion where we need to recreate the entire original table and query that?!