Data Modeling interview questions and answers 👇
Data Modelling Interview Questions
What is Normalization?↑
Normalization is a formal approach that applies a set of rules to associate attributes with entities.
What is Denormalization?↑
Denormalization is the process of adding precomputed redundant data to an otherwise normalized relational database to improve read performance of the database. Normalizing a database involves removing redundancy so only a single copy exists of each piece of information.
What is ERD?↑
An entity-relationship diagram (ERD) is a data modeling technique that graphically illustrates an information system's entities and the relationships between those entities. An ERD is a conceptual and representational model of data used to represent the entity framework infrastructure.
What is a Surrogate Key?↑
A surrogate key uniquely identifies each entity in the dimension table, regardless of its natural source key. This is primarily because a surrogate key generates a simple integer value for every new entity. Surrogate keys are necessary to handle changes in dimension table attributes.
What are the Critical Relationship Types in a Data Model?↑
There are 3 different types of relations in the database:
What is an Enterprise Data Model?↑
An enterprise data model is a type of data model that presents a view of all data consumed across the organization. It provides an integrated yet broad overview of the enterprise's data, regardless of the data management technology used.
What are the 2 Design Schemas?↑
The two popular dimensional data models are:
- Star schema: data is organized into facts (measurable items) and dimensions (reference information), where each fact is surrounded by its associated dimensions in a star-like pattern
- Snowflake schema: resembles the star schema but includes additional layers of associated dimensions, making the branching pattern more complex
What is Slowly Changing Dimension?↑
A Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse. It is considered and implemented as one of the most critical ETL tasks in tracking the history of dimension records.
What is a Data Mart?↑
A data mart is a subject-oriented database that is often a partitioned segment of an enterprise data warehouse. The subset of data held in a data mart typically aligns with a particular business unit like sales, finance, or marketing.
What is Granularity?↑
In dimensional modeling, granularity refers to the level of detail stored in a table. For example, a dimension such as Date (with Year and Quarter hierarchies) has a granularity at the quarter level but does not have information for individual days or months.
What is Data Sparsity?↑
Data sparsity signifies how much data we have for a particular dimension/entity of the model.
What are Subtype and Supertype Entities?↑
A supertype is a generic entity type that has a relationship with one or more subtypes. A subtype is a sub-grouping of the entities in an entity type that is meaningful to the organization and that shares common attributes or relationships distinct from other subgroups.
Why is Metadata important?↑
Metadata management is critical for organizations looking to understand the context, definition and lineage of key data assets. Data models play a key role in metadata management, as many of the key structural and business definitions are stored within the models themselves.
What is forward and reverse engineering?↑
Forward engineering is the process of generating the physical database schema from a physical model. You use the Schema Generation dialog to forward engineer a model and generate the schema. The schema that you generate includes all options that are supported in your target server.
Reverse engineering is the process of creating a data model from a database or a script. The modeling tool creates a graphical representation of the selected database objects and the relationships between the objects. This graphical representation can be a logical or a physical model.
What are Recursive Relationships?↑
A recursive relationship is a non-identifying relationship between two entities or tables that represents the fact that one company can own another company. In this type of relationship, the parent entity or table and the child entity or table are the same.
What is a Conformed Dimension?↑
In data warehousing, a conformed dimension is a dimension that has the same meaning to every fact with which it relates. Conformed dimensions allow facts and measures to be categorized and described in the same way across multiple facts and/or data marts, ensuring consistent reporting across the enterprise.
In what way are NoSQL databases better than relational databases?↑
NoSQL databases don't need a predefined schema, allowing you to work more freely with “unstructured data.” Relational databases are vertically scalable, but usually more expensive, whereas the horizontal scaling nature of NoSQL databases is more cost-efficient.
What is Junk Dimension?↑
A Junk Dimension is a dimension table consisting of attributes that do not belong in the fact table or in any of the existing dimension tables. The nature of these attributes is usually text or various flags, e.g. non-generic comments or just simple yes/no or true/false indicators.