Data Mining

Introduction: 
 
The core of Business Intelligence Applications is Data warehousing and the core of Data warehousing is Data Mining. If we are speaking the technical aspect, Data Mining is gathering knowledge from Data. So Knowledge of Data Discovery also part of the Data Mining. 

So Data mining is the important subject to know, if we are working in data warehousing.

Definition of  Data Mining:

Data mining is the entire process of applying computer-based methodology, including new techniques for knowledge discovery, from data.

Knowledge discover explains the step of process How to get the useful data from the Raw data.

From the below picture, you will know the knowledge discovery process.


Step 1:
The data is collected from various sources like spreadsheet, flatfiles, dmp, and etc., that is called Raw Data.

Step 2:
The collected raw data is integrated and cleansed to make a data warehousing. Data Integration is involving collect all the source data and making a single unit of system and cleaning Data involves removing or making Junk files and only keeping the necessary files that needed for data warehousing.

Step 3: 
 
The real process of Data mining starts from Data warehousing. So after integration and Cleansing of data, data is transforming into various subject area called Sales, Purchase, Finance and etc.,

Step 4:

So from the separated subject area, we are applying all the knowledge i.e Domain knowledge and we are giving the specific output from the data. 

Read more


OLAP Vs OLTP

Introduction:

The constant growth of Data Analysis and Business Intelligence Applications, understanding OLAP and OLTP having the important Place.

The design of Data Warehouse and Online Analytical Processing (OLAP) cubes is basically different than OnLine Transactional Processing (OLTP). 

OLTP:

Source of Data: OLTP is original Source of data that maintains the Day by Day transactions and processes for every Business.

Purpose of Data: To run and control the fundamental business task and record all the events inside.

OLTP Reveals the copy of the ongoing Business processes.

Insertion and Updation is very fast in OLTP and this operation is initiated by End Users. All the OLTP is Normalized one and Doesn't having any duplicate records in any situation.

In OLTP, the Query Processing is normally standardized and simple query that gives the result of few records.

The processing speed is very speed and the size of the OLTP  Size  (MB to GB) is very small compare with OLTP if historical data is archived.

Database design of OLTP is highly Normalized with many tables.

Backup & Recovery of OLTP:
Operation data is very critical to run the business and data loss is likely to entail and legal liablity.

OLAP:

The source of data of OLAP is OLTP, Consolidated from various OLTP Sources.

The Main aim of this data is to help reporting, Decision support, Data analyzing and for problem Solving.

This data reveals the Multi-Dimensional views of various kinds of business Activities.

Periodically Its long process to insert the data and update the data But retrieving the data from OLAP is easy.

Typicall the query processing in OLAP is long and tedious and often with complex queries involving aggregations.

The processing speed in OLTP depends upon the size of the data involved, that may batch processes and complex queries may take hours to one and query speed can be improved by creating Views ans Indexes.

The size of OLAP is very Large ( GB to TB) and It contains the consolidated data of OLTP with more aggregation structures and Historical data.

The database design of OLAP is in most cases de-normalized with the fewer tables and using Star or Snowflake schema.




Read more


Data Warehousing

Data Warehousing & Need of DW:
 
Commonly Data warehousing is used by the companies to analyze the trends over the Time. In other meaning, data warehousing is used to view day-to-day operations of Business for a company, but its primary function is facilitating best decision making process resulting from long-term data overviews. For example, Business Process & Models, Forecasts, Business Reports and Projections can be made. Because the main intention of the Data Warehousing is intended to provide the overview like Reporting, analyzing the past process activities.

By Technically, Data Warehousing is a Collection of Data stored in single database to describe the past Business Process in best that leads to decision making process of company. It is the read-only data that never going to change but the size will grow up over the time.

Data Warehouse:

The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way: "A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process"

Subject Oriented:

Data that explains information about the a particular subject of a Business instead of explaining all the process of ongoing operations of company.
For Eg: Sales 

Integrated:

In Data Warehouse, the data is gathered from various kind of sources and merged into a single integrated storage place.

For Eg: the sales data of a business may gathered from Excel file, Flat file, dumb file and etc.,

Time Variant:

All the data in the Data Warehouse is explaining a particular Time period and It will grow up over a time.

Non - Volatile:

The data in the Data Warehouse is stable and never changed but may grow the size because of the data is to be added over a time and the data is never removed.

A Typical Architecture of Data Warehouse:




Read more


How to Create ODBC Connection

Before going to work with Administration Tool, We have to know How to create ODBC Connection for our database. Using this connection only, we are importing all the data sources into the physical layer of RPD in Administration Tool.

OBIEE is supporting almost all the datasources like Oracle, DB2, MS SQL Server, MySQL, XML and etc., In the following example, We will discuss How to make a connection with Oracle Database.

1. Go To Control Panel -> Administrative Tool -> Click Data Sources (ODBC) then following screen will appear.


2. From the above screen, Click System DSN tab then you will get new screen like below:


3. I'm going to connect with SH inbuilt schema of Oracle 11g database. So Click Add in the above screen to get a new screen like below:

The above "Create New Data Source" screen will display all the drivers of Installed datasources. Click on Oracle 11g_home1 and then Click finish.

4. The below screen will appear as a result of above actions. In the below "Oracle Driver Configuration" Screen, Enter all details of Data Source Name, Service Name, User Name and then Click "Test Connection" Button. 
It will give "Oracle ODBC Driver Connect" Screen, there you have to give the password of that schema.


5. Once you entered The User Name and Password of the database, It will give Connection Successful window screen like below.


6. Click "OK" in all the screen and come to main screen. It will just give the "SH" DSN that just created and you can see in below screen.

  
The connection is made successfully and you are ready to import the datasource and ready to work with Administration Tool. 
Like above, you can make a connection for various data source that you want.

Keep Rocking!!!

Read more


Oracle BI Architecture Components

In this discussion you will get the information about the component of OBIEE 10g and relationship between all the components.

1. Clients
2. Oracle BI Presentation Services
3. Oracle BI Server
4. Oracle BI Repository
5. Data sources

A Simple Architecture:

We will discuss all the components one by one.

1. Clients: 
It provides access to Business Intelligence information via a Web Browser.
The Best of Client is Oracle BI Answers and Oracle BI Interactive Dashboard

Oracle BI Answers is a set of graphical tools used to build, view and modify Oracle BI Requests.

Oracle BI Interactive Dashboard is the place where we are displaying the answers requests and other  items like (Prompts, filters, Briefing Book and etc.,

2. Oracle BI Presentation services

Oracle BI Presentation Services is an extension to an existing Web server. It receives processing instructions from an Oracle BI client, retrieves the requested information from Oracle BI server, and then renders the information inside the requesting client. 

Oracle BI Presentation Services uses a catalog to store saved content, such as Oracle BI requests and Oracle BI Interactive dashboards.

 3. Oracle BI Server

It is the core server behind the Business Intelligence Application and provide efficient process to access the physical data sources and structure information.

Oralce BI Server connects natively or through ODBC with database.
It generates the dynamic SQL query to get the data from physical datasources and provides the Business Intelligence data to Oracle Presentation services.

4. Oracle BI Repository:

Oracle BI Server stores metadata in repositories. The Administration Tool has a graphical user interface that allows server administrators to set up these repositories. 

An Oracle BI Server repository consists of three layers. Each layer appears in a separate pane in the Administration Tool user interface and has a tree structure. You can expand each object to see a list of its components. These layers are not visible to the end user.

5. Data Sources

Data sources are the physical sources where the data is stored. They can be in any format, including transactional databases, online analytical processing databases, text files, XMLA, spreadsheets and etc., A connection to the data source is created and then used by Oracle BI Server. The data source connection can be defined to use native drivers or ODBC.

SQL is generated by Oracle BI Server against the data sources using the data source connection, information from the repository, and database-specific parameters stored in a DBFeatures.INI file. Thus, Oracle BI Server is not just a SQL generator. It figures out the best source and the optimal way to access data. In some cases, Oracle BI Server takes on operations that are more efficient for it to do rather than the host data source.

Read more

About This Blog

Labels

Views

Followers