Introduction to Relational Database

10 October 2017

This blog gives an introduction of Relational database-management system (RDBMS) and different topics which are frequently asked in interview questions.

A RDBMS is a computer-software application that interacts with end-users, other applications, and the database itself to capture and analyze data. A general-purpose DBMS allows the definition, creation, querying, update, and administration of databases.

MultiDimensional database(MDB)

It is a type of database which is optimized for data warehouse and online analytical processing(OLAP) applications. This database are normally created from the inputs obtained from existing relational databases.

Relational database can be accessed using a SQL query whereas a multidimensional database allows a user to answer analytical question from those relational database.Multidimensional database is mainly used to summarize business operations and trends.

Online Analytical Processing (OLAP)

OLAP tool allows user to analyze multidimensional data interactively from multiple perspectives. OLAP is part of the broader category of business intelligence, which also encompasses relational database, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications coming up, such as agriculture.

Online transaction processing (OLTP).

It is used to manage transaction-oriented applications typically used for data entry and retrieving transactional processing on a relational database management system.

Natural/Business Key

A Natural (Business) Key is the column or column(s) in the source data that identify a unique row.It can be exposed on a business report and have business value in the company.

Surrogate Key

A Surrogate Key is an artificially produced value, most often a system-managed, incrementing counter whose values can range from one to n. It has no business meaning and should not be exposed in reports.

Surrogate vs Natural key

It is recommended to use the natural keys of the table throughout the ETL process.Assigning and maintaining a surrogate key is complex and leads to lot of errors.If invalid surrogate keys are assigned ,it might cause severe data integrity issue in the data ware house .Upon re-loading a table ,it is extremely difficult to guarantee the same Surrogate Key for a particular row.This complexity and risk can be eliminated by using natural keys and keeping the surrogate key from source system as data attribute .

Normalization

It is a process of organizing the data in database to avoid data redundancy, insertion anomaly, update anomaly & deletion anomaly.

Most commonly used normal forms:

  • First normal form(1NF)
  • Second normal form(2NF)
  • Third normal form(3NF)
  • Boyce & Codd normal form (BCNF)

Third normal form(3NF)

It is a normal form that is used in normalizing a database design to reduce the duplication of data and ensure referential integrity by ensuring that (1) the entity is in second normal form, and (2) all the attributes in a table are determined only by the candidate keys of that relation and not by any non-prime attributes.

Isolation level in DataBase

There are four isolation levels

  • Read uncommitted or Read Dirty Buffers
  • Read Committed or repeated reads
  • Phantom reads
  • Serializable

Synonyms in Database

A synonym is an alias or alternate name for a table, view, sequence, or other schema object. They are used mainly to make it easy for users to access database objects owned by other users. They hide the underlying object’s identity and make it harder for a malicious program or user to target the underlying object. Because a synonym is just an alternate name for an object, it requires no storage other than its definition There are two major uses of synonyms:

  • Object invisibility: Synonyms can be created to keep the original object hidden from the user.
  • Location invisibility: Synonyms can be created as aliases for tables and other objects that are not part of the local database.

Types of JDBC Statements

1.Statement -It is Used for SQL Queries

2.Prepared Statement

It is mainly used for repetitive SQL queries .We cannot use Stored procedure in Prepared Statement . Instances of Prepared Statement contain an SQL statement that has already been compiled. This is what makes a statement “prepared”

Because Prepared Statement objects are precompiled, their execution can be faster than that of Statement objects. The prepared statement is used to execute sql queries

3.Callable Statement

A Callable Statement object provides a way to call stored procedures in a standard way for all RDBMSs. A stored procedure is stored in a database; the call to the stored procedure is what a Callable Statement object contains. It is Used for Stored Procedures and functions

Junction Table

Junction table is a place to store attributes of the relationships between two lists of entities.

A junction table allows us to create the many-to-many relationship and prevents us from adding duplicate entries

Performance tuning of Queries

  • Do delete and update in batches
  • Tracing facility that enables you to identify bottlenecks in indexing and querying.
  • Add Optimizer hints Provide optimizer hints in the SQL that indicate:
    • which optimizer mode to use
    • whether or not to use available indexes
    • the order in which tables are to be joined
    • the method by which Oracle should join the tables

Database Performance Optimization

  • Optimizer mode
  • Use of Index
  • Type of Join and Driving Table
  • Order of Joins

Data Wrangling

It outstrips traditional data integration (DI), which tends to focus primarily on the selection, movement, and transformation of strictly-structured data from OLTP or other, similar sources.

Stages of Data Wrangling Steps/Stages

  • Discovering what is actually in each of the data sets to determine how they can be used as part of an analysis.
  • Assessing the data quickly, to ensure the fit and accuracy of data for analysis to avoid erroneous analytic results downstream.
  • Shaping the data to ensure that it fits the requirements and is at the appropriate level of aggregation for the specific analysis being performed.
  • Enriching the content of an individual data set by joining multiple data sources together to provide additional context, or create derived fields with calculations that highlight business opportunities or gaps.
  • Distilling the data into the format required by the analysis or the downstream analytics tool being used.

Database Cursors

Database cursor is a control structure that enables traversal over the records in a database. Cursors facilitate subsequent processing in conjunction with the traversal, such as retrieval, addition and removal of database records. A cursor can be viewed as a pointer to one row in a set of rows.

Database Index

An index in database creates an entry for each value which makes it faster to retrieve data. It is one of the performance tuning method in database that allows faster retrieval of records from the table.

Types of Indexes

There are 3 types of Indexes in Database

  • Unique Index
  • Clustered Index
  • NonClustered Index

Relationship in Database

Relationship in database defines how different tables are connected together.

There are mainly 4 types of relationship in database

  • One to One Relationship
  • One to Many Relationship
  • Many to One Relationship
  • Self-Referencing Relationship

User Defined function in Database

User defined functions are the custom functions that can be used in queries.

  • Scalar Functions.
  • Inline Table valued functions.
  • Multi statement valued functions.

Aggregate Functions in SQL

An aggregate function is used to perform a calculation on a set of values to return a single scalar value. We often use aggregate functions with the GROUP BY and HAVING clauses of the SELECT statement.

Below are the most commonly used SQL aggregate functions:

AVG – calculates the average of a set of values. COUNT – counts rows in a specified table or view. MIN – gets the minimum value in a set of values. MAX – gets the maximum value in a set of values. SUM – calculates the sum of values.

View in SQL

View is like a subset of table which are stored logically in a database. A view is a virtual table. It contains rows and columns similar to a real table. The fields in the view are fields from one or more real tables. Views do not contain data of their own.


CREATE VIEW <view-name> AS SELECT <column_name1,column_name2> FROM <table_name> WHERE
<condition>

Constraints in SQL

Constraints are rules you can place on columns or tables to limit the type of data that can be entered into a table. This prevents errors and can improve the accuracy and reliability of the database as a whole.

Common constraints include:

Constrains Definition
NOT NULL It prevents a column from having a NULL value
DEFAULT It specifies a default value for a column where none is specified
PRIMARY KEY It Uniquely identifies rows/records within a database table
FOREIGN KEY Uniquely identifies rows/records from external database tables
UNIQUE It makes sure that all values are unique
CHECK Checks values within a column against certain conditions
INDEX Used to quickly create and retrieve data from a database

WHERE vs HAVING in SQL

WHERE clause is used when there is a need of condition on individual rows .Where clause is used to fetch data from database that specifies a particular criteria .It cannot be used with aggregrate function.

HAVING clause is used when we need to introduce conditions on aggregrate results . Having clause is used along with ‘GROUP BY’ to fetch data that meets a particular criteria specified by the Aggregate functions. It can be used with aggregrate function

DELETE vs TRUNCATE in SQL

DELETE command is used to remove rows from the table, and WHERE clause can be used for conditional set of parameters. With the execution of DELETE command it is possible to commit and Rollback

TRUNCATE removes all rows from the table such that it cannot be rolled back.

Standard SQL commands

The basic SQL commands can be organized into the following categories:

1. Data Manipulation Language (DML)

Data manipulation Language or DML is used to access or manipulate data in the database.

  • INSERT: Creates records. The “Create” in CRUD.
  • SELECT: Retrieves/fetch records. The “Read” in CRUD.
  • UPDATE: Modifies records. The “Update” in CRUD.
  • DELETE: Deletes records. The “Delete” in CRUD.

2. Data Definition Language (DDL)

  • CREATE: Creates a new object.
  • ALTER: Alters an existing object.
  • DROP: Deletes an existing object.

3. Data Control Language: (DCL)

  • GRANT: Grants privileges to users.
  • REVOKE: Revokes privileges previously granted to a user.

Advantage of Normalization in Tables

  • Size of the database is decreased as duplicate data is eliminated by normalization
  • Performance is improved because of the fine-tuned tables with small size
  • Decrease in column size will lead to fewer indexes in a table which makes it easier for maintenance tasks such as index rebuilding
  • Joins are needed when data is needed across the tables

Disadvantage of Normalization

  • As data are spread out throughout the database there are more tables to join
  • As tables does not contain duplicate data ,joins are required to get the data across the tables .This increases the complexity of the SQL queries making it more slower to read from database.

Difference between CHAR and VARCHAR2 datatype in SQL

Even though both of these data types are used for characters ,VARCHAR2 is used for character strings of variable length whereas CHAR is used for character strings of fixed length

Sub query vs Coorelated Query

A subquery is a select statement that is embedded in a clause of another select statement. An Example

SELECT employeename ,salary FROM employeeTable WHERE salary > (SELECT salary FROM employeeTable WHERE employeename = 'Nitendra');

A Correlated subquery is a subquery that is evaluated once for each row processed by the outer query or main query. Execute the Inner query based on the value fetched by the Outer query all the values returned by the main query are matched. The INNER Query is driven by the OUTER Query.

An Correlated Query Example:

SELECT employeeNumber ,salary from branchid FROM employeeTable et WHERE salary =(SELECT AVG(salary)
FROM employeeTable WHERE branchid = et.brachnid);

First inner query executes and finds a value which is again used by outer query to execute .

Difference among UNION, MINUS and INTERSECT

  • UNION combines the results from 2 tables and eliminates duplicate records from the result set.
  • MINUS operator when used between 2 tables, gives us all the rows from the first table except the rows which are present in the second table.
  • INTERSECT operator returns us only the matching or common rows between 2 result sets.

Difference between clustered and a non-clustered index?

A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages.

A non clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a non clustered index does not consist of the data pages. Instead, the leaf nodes contain index rows.

Transposing a table using SQL (changing rows to column or vice-versa)

In in SQL we can use CASE statement or DECODE statement to transpose a table

Advantages of using Stored Procedures

  • It can reduced network traffic and latency, boosting application performance.
  • Stored procedure execution plans can be reused, staying cached in SQL Server’s memory, reducing server overhead.
  • It help promote code reuse.
  • It can encapsulate logic. You can change stored procedure code without affecting clients. It provide better security to your data.

References

[1]Multi dimensional Database

[2] Online Analytical Processing

[3]Surrogate vs Natural Key

[4] Data Warehouse

[5] Union vs Join

[6] SQL Interview Questions

Share: Twitter Facebook Google+ LinkedIn
comments powered by Disqus