Storing Data in a Web App Simply

Here are some thoughts I wrote a while back that might be a useful to someone.

If you’re a web developer making a web app that is very complicated

If you’re a developer building a web app that has an arbitrary complexity, how do you make it as technology independent as possible because you know that you are going to rewrite it in a different language at some point as technology changes.

You also want to make sure that your developers are productive making or changing the app.

For both of those problems a good approach is abstraction abstraction allows a developer to only focus on what is necessary to complete the task rather than having to worry about more specific parts of the application.

For example a common way to abstract a database is through an orm or object relational mapping this allows a developer to not worry about the SQL generation to actually interact with the database but instead focus on the code.

Every orm has basically the same functionality:

  • connecting to a database
  • defining the data model
  • updating and creating the data model
  • creating a record
  • finding one or many records
  • updating a record
  • deleting a record

A common approach for an orm is to generate code with the types of each database table. It gets more complicated than that but that is the basic premise of it and the problem with this is that with a large data model the code generation gets very large. Increasing the size of any code base adds additional complexity and in this case is unnecessary. there is a better way.

Declarative and defining your data model in some sort of configuration file, then having one API for:

  • finding one or many records based on conditions
  • creating, updating, or deleting a record
  • starting, committing, and rolling back transactions

Let’s get a bit into Database theory.

However when dealing with the a data model there are two main ways of looking at the data. One way is normalized and another is denormalized. What does normalize mean in that context? Database normalization is a write optimized form of data. It’s data in its least redundant form for easy updating and creation of records. Database normalization is more fragmented but flexible in different circumstances. Database denormalization is a read optimized form of data. It is data that is one coherent logical unit of data for displaying and understanding in certain contexts. Database denormalization is more coherent but less flexible in different circumstances.

The most common way of storing normalized data is in an atomicity consistency isolation durability or acid for short compliant relational database. Acid is all about transactional databases. A transactional database is a database that must only have one state of data and that state of data persisting despite any problem that can happen. This type of data store is essential for financial accounting, asset Management, account authentication and account authorization.

There are some problems with the acid relational database model when scaling it naively.

Scaling data storage is hard because:

The best way that I have found to scale data storage is several different techniques. The first method is sharing. When one computer can’t store all of the data you need to have multiple computers to store the data. Sharding is figuring out which data should be stored on which computer. There are many ways to do this but some of the most common are partitioning based on location, time, or custom logical separation. Any or all of these can be applied. Custom logical separation in this context means defining what data is often used together and what isn’t often used together. There may be other ways of doing this but these are the ones that make most sense to me.