If you need to create a database, you need a data model first. Don't even think of looking at an existing database, without studying the data model first. The data model is the map of a database. It is the first user manual you have to look at before trying to understand the data in the database. The data model will tell you a lot about what you will find in the database, for example the tables, columns and other entities. No matter what type of database, relational or other types, you will need a data model to start with. This post is about the data models you find in relational databases like Oracle, MS SQL Server, DB2, MySQL and many others.
A data model is a diagram, a schematic representation of an understanding of the data requirements in a process. If you are in the business domain, your data models will describe the data structures and their relations with each other, in your business processes. For example customers and orders. Data models are the manuscripts- the plans of a database. Therefore, the data models are created before the databases and not the other way around. You have to grasp the requirement and the problem first before attempting to solve it with a data model and then a database. To create a data model you will need to have a problem to solve first, hence you will need a problem statement.
The creation of a data model doesn't happen all at once- it is created in stages. The process of its creation usually comes in three stages. After you have read the requirement in the problem statement, you first create a conceptual data model. Then you evolve it to a logical data model, which is sometimes called a relational schema. Lastly, the final stage is the physical data model, a model ready to be implemented in SQL and in relational databases.
Now, let's walk through on how to create a data model for a business problem statement, say in a commercial business which sells products to customers.
Problem statement
Customers buy products of the business, via orders. All orders must be recorded in the database indicating which product was bought by which customer.
Here are the possible data models which convey understanding of the above problem statement:
From left to right, you can see the evolution of the data model as it solves the problem statement. The conceptual model is very 'high level'. It barely records the entities of Customers, Orders and Products involved in the problem statement. The lines indicate some sort of relationships between the entities. The second stage, the logical model, puts the unique identifiers of future records in these entities- the primary keys. The lines indicate relationships a bit clearer and you can see the primary keys of Customers and Products posted in Orders as foreign keys to construct the relationship. Chen's notation is used to show the participation of the entities in the relationships here. For example, a product can be ordered several times. Finally, the physical data model is the one which is ready to be scripted and put in the database. It has physical attributes and their data types, such as customer name and product name of data type VARCHAR.
Below are some more techie comments on the three stages of data modelling:
What is conceptual data model?
It is the data model which records the data requirements of the business processes. It is technology independent, captures initial requirements in a business process. Say you are a business selling products to customers as seen above. The conceptual model, at a very high level, describes objects or entities which will store data to help you accomplish your business process. Conceptual data models are straightforward simple box-and-line diagrams.
What is logical data model?
This model is a bit more specific. The logical data model is still technology independent. The logical model tries to identify entities, their relationships with other entities and their primary keys and foreign keys. The logical data model is a bit more involved with the specifics of the problem statement and a single conceptual data model can require multiple logical data models.
What is a physical data model?
It is the data model for transforming the logical data model into physical data model. It is the data model which organises the data into tables and user accounts and security. The physical data model describes all entities and the data types of their specific attributes and columns. The physical data model can also have technology specific elements such as special data types and other constraints used only by a certain technology or even vendor. The physical model is implementation ready. Finalised to the greatest detail and ready to be picked up for implementation by DBAs or database developers.
The data models are living schemata. There is no final data model. They evolve and they extend with time. Data models are great instruments which help us understand the meaning of data and databases.