Showing posts with label Tutorial. Show all posts

Normalization

What is normalization?

Normalization is the process of designing a data model to efficiently store data in a database. The end result is that redundant data is eliminated, and only data related to the attribute is stored within the table.

For example, let's say we store City, State and ZipCode data for Customers in the same table as Other Customer data. With this approach, we keep repeating the City, State and ZipCode data for all Customers in the same area. Instead of storing the same data again and again, we could normalize the data and create a related table called City. The "City" table could then store City, State and ZipCode along with IDs that relate back to the Customer table, and we can eliminate those three columns from the Customer table and add the new ID column.

Normalization rules have been broken down into several forms. People often refer to the third normal form (3NF) when talking about database design. This is what most database designers try to achieve: In the conceptual stages, data is segmented and normalized as much as possible, but for practical purposes those segments are changed during the evolution of the data model. Various normal forms may be introduced for different parts of the data model to handle the unique situations you may face.

Whether you have heard about normalization or not, your database most likely follows some of the rules, unless all of your data is stored in one giant table. We will take a look at the first three normal forms and the rules for determining the different forms here.

Rules for First Normal Form (1NF)

Eliminate repeating groups. This table contains repeating groups of data in the Software column.

Computer	Software
1	Word
2	Access, Word, Excel
3	Word, Excel

To follow the First Normal Form, we store one type of software for each record.

Computer	Software
1	Word
2	Access
2	Word
3	Excel
3	Word
3	Excel

Rules for second Normal Form (2NF)

Eliminate redundant data plus 1NF. This table contains the name of the software which is redundant data.

Computer	Software
1	Word
2	Access
2	Word
3	Excel
3	Word
3	Excel

To eliminate the redundant storage of data, we create two tables. The first table stores a reference SoftwareID to our new table that has a unique list of software titles.

Computer	SoftwareID
1	1
2	2
2	1
3	3
3	1
3	3
SoftwareID		Software
1		Word
2		Access
3		Excel

Rules for Third Normal Form (3NF)

Eliminate columns not dependent on key plus 1NF and 2NF. In this table, we have data that contains both data about the computer and the user.

Computer	User Name	User Hire Date	Purchased
1	Joe	4/1/2000	5/1/2003
2	Mike	9/5/2003	6/15/2004

To eliminate columns not dependent on the key, we would create the following tables. Now the data stored in the computer table is only related to the computer, and the data stored in the user table is only related to the user.

Computer		Purchased
1		5/1/2003
2		6/15/2004
User	User Name		User Hire Date
1	Joe		5/1/2003
2	Mike		6/15/2004
Computer				User
1				1
2				1

What does normalization have to do with SQL Server?

To be honest, the answer here is nothing. SQL Server, like any other RDBMS, couldn't care less whether your data model follows any of the normal forms. You could create one table and store all of your data in one table or you can create a lot of little, unrelated tables to store your data. SQL Server will support whatever you decide to do. The only limiting factor you might face is the maximum number of columns SQL Server supports for a table.

SQL Server does not force or enforce any rules that require you to create a database in any of the normal forms. You are able to mix and match any of the rules you need, but it is a good idea to try to normalize your database as much as possible when you are designing it. People tend to spend a lot of time up front creating a normalized data model, but as soon as new columns or tables need to be added, they forget about the initial effort that was devoted to creating a nice clean model.

To assist in the design of your data model, you can use the DaVinci tools that are part of SQL Server Enterprise Manager.

Advantages of normalization

1. Smaller database: By eliminating duplicate data, you will be able to reduce the overall size of the database.

2. Better performance:

a. Narrow tables: Having more fine-tuned tables allows your tables to have less columns and allows you to fit more records per data page.

b. Fewer indexes per table mean faster maintenance tasks such as index rebuilds.

c. Only join tables that you need.

Disadvantages of normalization

1. More tables to join: By spreading out your data into more tables, you increase the need to join tables.

2. Tables contain codes instead of real data: Repeated data is stored as codes rather than meaningful data. Therefore, there is always a need to go to the lookup table for the value.

3. Data model is difficult to query against: The data model is optimized for applications, not for ad hoc querying.

Performance Advantages of Denormalization

The Reason for Denormalization

Only one valid reason exists for denormalizing a relational design - to enhance performance. However, there are several indicators which will help to identify systems and tables which are potential denormalization candidates. These are:

•Many critical queries and reports exist which rely upon data from more than one table. Often times these requests need to be processed in an on-line environment.

•Repeating groups exist which need to be processed in a group instead of individually.

•Many calculations need to be applied to one or many columns before queries can be successfully answered.

•Tables need to be accessed in different ways by different users during the same timeframe.

•Many large primary keys exist which are clumsy to query and consume a large amount of disk space when carried as foreign key columns in related tables.

•Certain columns are queried a large percentage of the time causing very complex or inefficient SQL to be used.

Denormalization can improve performance by:

Minimizing the need for joins

Reducing the number of foreign keys on tables

Reducing the number of indexes, saving storage space, and reducing data modification time

Precomputing aggregate values, that is, computing them at data modification time rather than at select time

Reducing the number of tables (in some cases)

Splitting Tables

Sometimes, splitting normalized tables can improve performance. You can split tables in two ways:

Horizontally, by placing rows in two separate tables, depending on data values in one or more columns

Vertically, by placing the primary key and some columns in one table, and placing other columns and the primary key in another table.

Figure 2-14: Horizontal and vertical partitioning of tables

Splitting tables--either horizontally or vertically--adds complexity to your applications. There usually needs to be a very good performance reason.

Horizontal Splitting

Use horizontal splitting in the following circumstances:

A table is large, and reducing its size reduces the number of index pages read in a query. B-tree indexes, however, are generally very flat, and you can add large numbers of rows to a table with small index keys before the B-tree requires more levels. An excessive number of index levels may be an issue with tables that have very large keys.

The table split corresponds to a natural separation of the rows, such as different geographical sites or historical vs. current data. You might choose horizontal splitting if you have a table that stores huge amounts of rarely used historical data, and your applications have high performance needs for current data in the same table.

Table splitting distributes data over the physical media (there are other ways to accomplish this goal, too).

Generally, horizontal splitting adds a high degree of complication to applications. It usually requires different table names in queries, depending on values in the tables. This complexity alone usually far outweighs the advantages of table splitting in most database applications. As long as the index keys are short, and the indexes are used for queries on the table (rather than table scans being used), doubling or tripling the number of rows in the table may increase the number of disk reads required for a query by only one index level.

Figure 2-15 shows how the authors table might be split to separate active and inactive authors:

Figure 2-15: Horizontal partitioning of active and inactive data

Vertical Splitting

Use vertical splitting in the following circumstances:

Some columns are accessed more frequently than other columns.

The table has wide rows, and splitting the table reduces the number of pages that need to be read.

Vertical table splitting makes even more sense when both of the above conditions are true. When a table contains very long columns that are not accessed frequently, placing them in a separate table can greatly speed the retrieval of the more frequently used columns. With shorter rows, more data rows fit on a data page, so fewer pages can be accessed for many queries.

SYBASE INTRODUCTION

SYBASE:

It is database server developed by Sybase Corp. Which was Architect by Dr.Robert Epstein and Tom Haggin.

Sybase support RDBMS (it is database management system that is based on the relational model gives data model based on predicate logic and set theory) and T-SQL

T-SQL

To communicate with the sql server and to manipulate objects stored in sql server, client programs and SP use a variety of Structured Query language is called T-SQL

-- T-sql extends sql by adding program flow-control constructs, local variables and other capability to allow the DBA to build code-based objects in SP and Trigger.

VERSIONS:

Basically created for Unix platform. Microsoft and Sybase made an agreement and released sql server 1.0 but their agreement soon came to an end on 1993 .1995 Sybase released Sybase sql server 11.0 (has a single process,multi-thread database engine and includes a query optimizer). Latest releases server 15.0

SYSTEM DATABASES

Master Database

Controls the user databases and the operation of SQL Server as a whole. Known as master, it keeps track of such things as user accounts, ongoing processes, and system error messages.

If the master database fails or repaired than server will became unavailable until you repair

Model database

A template for new user databases. The build master program and the install model script create model when SQL Server is installed. Each time the create database command is issued; SQL Server makes a copy of model and extends it to the size requested, if necessary.

It is house for those items you want available across all databases (rules, defaults, user defined data types)

SYBSYSTEMPROCS In server 4.21 version we had 3 sys databases

It is a modular program keep it all SP in it database, If Sybsystemprocs exists it will check to see whether you execute system SP.

Stored procedure:

A collection of SQL statements and optional control-of-flow statements stored under a name. SQL Server-supplied stored procedures are called system procedures.

System procedures:

Stored procedures that SQL Server supplies for use in system administration. These procedures are provided as shortcuts for retrieving information from the system tables, or mechanisms for accomplishing database administration and other tasks that involve updating system tables.

Sp_help, Sp_helptext, Sp_dboption, Sp_lock etc.

Temporary database

The temporary database in SQL Server, tempdb, that provides a storage area for temporary tables and other temporary working storage needs (for example, intermediate results of group by and order by).

P-T-T: This table exists in the tempdb until server gets restart or shutdown

T-T-T: This table exists in the tempdb only for the session or Sp in which they created, all sever go for T-T-T creation

System table

One of the data dictionary tables. The system tables keep track of information about the SQL Server as a whole and about each user database. The Master Database contains some system tables that are not in user databases.

Database –level sy tb System level sy tb

Syscomments syslogins

Sysindexes sysdatabases

Sysobjects sysservers

etc… etc….

SYBASE TOOLS:

ISQL: It is std interface provided by Sybase, which used for batch program execution (like a editor)

BCP: It is used to import or export data from database to os files and os files to database

DEFNCOPY: It is used to import or export creation text of the objects (SP, Trigger, rules, view, defaults) from database to os files and os files to databases.

-------------------------------------------------------------------------------------------------------------------------

Follow the documents in the EIS application.

Four type Doc 1. What to study in Sybase

Performance tunings

Versions

Exercises on query

The output of the query should contain all th employees names and their corresponding departments, if the department is assigned otherwise emp name

There are 2 tables, Employee and Department. There are few records in employee table, for which, the department is not assigned. The output of the query should contain all th employees names and their corresponding departments, if the department is assigned otherwise employee names and null value in the place department name. What is the query?

Use an ouer join to get your query.

Select E.ENAME, D.DNAME

from Employee E, Dept D

where D.DEPTNO (+)= E.DEPTNO

RE: There are 2 tables, Employee and Department. There...

An outer join is correct, but try using the ANSII standard join syntax as it is more flexible with outer joins and filtering of the inner tables

select e.name As Employee, d.name as Dept

3. select name, department from employee left join department on employee.department_id = department.id

4. Select DeptNo, DName, Count(1)FROM Emp a, Dept bWHERE a.deptno*=b.deptnoGroup By DeptNo, Dname

5. select e.empname,d.dptname

from employee e,department d

where e.eid=d.eid(+);

6. Select a.ename, b.dname

from Emp a, Dept b

where a.deptno (+)= b.deptno

Given a table which contains some rows with duplicate keys, how would you remove the duplicates?

select distinct * from table_1 into temp_table_1

truncate table_1

insert into table_1

select * from temp_table_1

This is one of the ways to eliminate duplicates .

How to find the 2 minimum salaries in a table?

select min(salary) from employee where salary >(select min(salary) from employee)

select salary, min(salary) from employee where salary >(select min(salary) from employee) group by salary.

DDL (Data Definition Language) statements are used to define the database structure or schema. Some examples:

1. CREATE - to create objects in the database
2. ALTER - alters the structure of the database
3. DROP - delete objects from the database
4. TRUNCATE - remove all records from a table, including all spaces allocated for the records are removed
5. COMMENT - add comments to the data dictionary
6. RENAME - rename an object

DML (Data Manipulation Language) statements are used for managing data within schema objects. Some examples:

1.SELECT - retrieve data from the a database
2. INSERT - insert data into a table
3. UPDATE - updates existing data within a table
4. DELETE - deletes all records from a table, the space for the records remain
5. MERGE - UPSERT operation (insert or update)
6. CALL - call a PL/SQL or Java subprogram
7. EXPLAIN PLAN - explain access path to data
8. LOCK TABLE - control concurrency

DCL is Data Control Language statements. Some examples:

1. GRANT - gives user’s access privileges to database
2. REVOKE - withdraw access privileges given with the GRANT command

Transaction Control

Manages the changes made by DML statements. These commands allow statements to be grouped together into logical transactions.

COMMIT - save work done
2. SAVEPOINT - identify a point in a transaction to which you can later roll back
3. ROLLBACK - restore database to original since the last COMMIT
4. SET TRANSACTION - Change transaction options like isolation level and what rollback segment to use

The Difference between ‘Count’ and ‘Count(*)’

‘Count’: Counts the number of non-null values.
‘Count(*)’: Counts the number of rows in the table, including null values and duplicates

The GROUP BY keywords have been added to SQL because aggregate functions (like SUM) return the aggregate of all column values every time they are called. Without the GROUP BY functionality, finding the sum for each individual group of column values was not possible

Bore

Dropping : (Table structure + Data are deleted), Invalidates the dependent objects ,Drops the indexes

Truncating: (Data alone deleted), Performs an automatic commit, Faster than delete

Delete : (Data alone deleted), Doesn?t perform automatic commit

What’s the difference between a primary key and a unique key

Both primary key and unique enforce uniqueness of the column on which they are defined. But by default primary key creates a clustered index on the column, where are unique creates a nonclustered index by default. Another major difference is that, primary key doesn’t allow NULLs, but unique key allows one NULL only.

Explanation : The query SELECT * FROM EMP ORDER BY ENAME DESC will display the output sorted on ENAME in descending order.

CREATE TABLE .. AS SELECT command
Explanation : To copy only the structure, the WHERE clause of the SELECT command should contain a FALSE statement as in the following.CREATE TABLE NEWTABLE AS SELECT * FROM EXISTINGTABLE WHERE 1=2;
If the WHERE condition is true, then all the rows or rows satisfying the condition will be copied to the new table.

What operator performs pattern matching

Sybase Database Interview Questions and Answers