1

I'm building a service where I need to filter activities from different products. The actual data itself lies in another key value store. As the service needs to support filtering on the activities as well was planning to use mysql for the filtering usecase alone. As the activities from different products can be different and the fields that need to indexed can also be different I was thinking about creating new tables for each set of activities with the columns the need to be indexed. Do you see any problems in creating tables on fly even for these types of usecases?

Since the service tries to be generic it does not have structure of the data before hand. I'm imagining the schema for each type of activity would be something like this

__________________________________________________________________________
tenant_id| activity_id | filter_1(actor_id) | filter_2(group_id)| filter_3(segment_id)
__________________________________________________________________________

primary key(tenant_id, activity_id) (activity_id is a timebased id to sort activities based on time)
index_1 : tenant_id| filter_1 | activity_id
index_2 : tenant_id| filter_2 | activity_id
index_3 : tenant_id| filter_2 | activity_id

if the activity usecase queries certain fields together then always then we might replace the indexes with a composite index(Eg. filter_2 and filter_3 are always present in the queries)
index_4: tenant_id| filter_2 |filter_3 | activity_id

I'm also worried about the number of indexes that we might end up creating with this approach. Is it ok have so many indexes?

  • 5
    Any time you are considering creating tables like you describe there is usually a basic design problem that should be addressed so you don't have to do that. You haven't really given us enough information, IMO, so adding details on the tables you have in mind would probably be helpful. – Dave Dec 04 '18 at 13:31
  • updated the description. let me know if it has enough info. – dakrpssngr Dec 04 '18 at 14:51
  • 3
    The only absolute in software development is that there are no absolutes. That said, creating database schemas within your own application is generally a technique of last resort, unless you're writing a tool specifically designed to do that (like a database management tool). – Robert Harvey Dec 04 '18 at 16:04
  • 1
    Downvotes are not for disliking an idea. – JimmyJames Dec 04 '18 at 16:23
  • "The actual data itself lies in another key value store": this sounds like a data warehousing OLAP problem space, am I understanding that correctly? – JimmyJames Dec 04 '18 at 16:28
  • @JimmyJames Not exactly an OLAP problem. The actual data is just a blob for so we store the data in a kv store with they key being the activity_id. The mysql tables are for filtering the activities based on certain fields respective to each activities usecase we have. – dakrpssngr Dec 04 '18 at 17:35
  • @RobertHarvey thought this might be a special case as we were unable to create a schema which could accomodate all the types of usecases we might get. If this is bad practice then we might explore other options like ElasticSearch. – dakrpssngr Dec 04 '18 at 17:38
  • 1
    For better or worse, CRM tools often manage SQL schemas in this way. Though there is often a middle ground (maybe something involving a document store ...), the extreme alternative is to create an inner platform. There are no points along that "solution-gradient" that I'd ever rule out on principle ... – svidgen Dec 04 '18 at 17:52
  • @svidgen we did explore mongodb. But was not able to come up with a better design with that as well. Since the data needs to be filterable on all specified fields and sometimes apply an AND or OR filter as well, all columns need to be indexed based on the query patterns. – dakrpssngr Dec 04 '18 at 18:10
  • Have you considered implementing an indexer? Something like ELKS or similar. Maybe a nosql db with full text search support. – Laiv Dec 04 '18 at 18:34
  • @Laiv yes that's the other approach we are considering. – dakrpssngr Dec 04 '18 at 20:35

1 Answers1

2

The problems with generating tables dynamically are not use case specific. It just tends to be problematic in general. Here are some concerns off the top of my head:

  • Security: in order to create tables, a user needs a pretty high level of privileges. This is generally not something you want your code running with. You need to consider how you will mitigate this if you move forward with the idea.

  • Keywords and reserved words: You need to make sure you generate good DDL. There are lots of common words that you can't use as column names etc.

  • Changes to structure: At some point things will change and you need to consider how that will work. Do you create a new table for an existing product? Does continuity matter?

  • Dynamic SQL: If you generate tables dynamically, I presume you will need to generate SQL dynamically. Much has been written about the challenges with this (esp. security) and I'm not going to repeat it here.

My first thought is that you should explore a normalized approach to doing this. The big downside I see to that is that indexing may not be as effective. But without building it and doing some analysis it's hard to say. This would avoid most, if not all, of these downsides. I would suggest trying that first and if you don't get reasonable performance reconsider your idea or look at other options such as you have mentioned.

Another option is to have tooling that generates DDL from your data and manage those in a more traditional way. I suppose that if you feel the need to make it dynamic, you may be getting new products at too high of a rate to make this manageable.

JimmyJames
  • 27,287