Wednesday, March 30, 2016

Presentation and Visualization Methods



Hello,

Today’s world is a world of big data. Big data means lots and lots of data. To understand this data, one needs to analyze it and to make it understand to people who are not related to the data, one needs to present it. Hence, data visualization has gained prime importance in conveying information to people. Complex problems can be understood by data visualization as it shows how different data are related to each other. Data visualization is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns.


Following are some of the examples of data visualization techniques in different lines of business:
1. Healthcare


The above graph has different colors for different countries and every country’s data in a certain age bracket. This is an excellent way to represent data as it gives us the distribution country wise, age wise and cost wise, all in one graph.

2. Technology


The above dashboard shows different ways in which the sales can be depicted. The bar graph very conveniently shows both the sales and the goals for that month. It makes it very easy to compare the sales to goal ration. The map is used to depict sales by region with the color indicating density. A simple pie chart is used to show the sales breakdown by product since only percentages matter here.

3. Customer relationship management



This is an excellent dashboard showing the CRM functions. The bar graph shows a good comparison of different types of customers in different periods. Pie chart shows the cases by subject. The line graph shows how the number of open and resolved cases increase or decrease every month. It gives us a trend.

In conclusion, visualization transforms data into information.

References:



Tuesday, March 1, 2016

Structured and Unstructured Data

Data is on the rise today and this gave birth to big data. However, with such amount of data available today, it is not guaranteed that it will be in a desired form for any kind of analysis. Data is currently available in two forms:

Unstructured Data
This generally refers to any kind of data that is not stored in a traditional table format and is raw and unorganized. The data exists in different forms and it is difficult to identify attributes and derive information from it. Some of the sources of unstructured data are:
  • Word Processing Files
  • PDF files
  • Emails
  • Digital Images
  • Audio
  • Video
  • Social Media Posts

These files are the means of communication of data in most of the businesses. The content of these files has data but it has to be broken down and categorized to understand it.
Structured Data
Structured data is any data which is stored in some kind of a table, be it a spreadsheet or a database. This data exists in rows and columns each of which are created with some titles and it is easy to order and process this data with data mining tools. More emphasis is given to what fields of data will be stored and how the data will be stored. Structured data is similar to a filing cabinet which is perfectly organized so that everything is identified, labeled and easy to access. Some examples of structured data:
1. Machine Generated
  • Sensory Data - GPS data, manufacturing sensors, medical devices
  • Point-of-Sale Data - Credit card information, location of sale, product information
  • Call Detail Records - Time of call, caller and recipient information
  • Web Server Logs - Page requests, other server activity

2. Human Generated
  • Input Data - Any data inputted into a computer: age, zip code, gender, etc.

This video explains more about the differences between structured and unstructured data:

Data in organizations:
·       Dell survey shows structured data represents at least 75 percent of data under management for the majority of organizations
·       Nearly one-third of companies surveyed still don’t actively manage unstructured data
·       83 percent of organizations cite growth in transactional data (including e-commerce) as one of the most important sources of structured data growth within their organization, with 51 percent also citing growth in management data, such as ERP systems.
·       Although there is an increasing industry focus on the proliferation of social data, an increase in the creation of internally generated documents was seen as the top driver of unstructured data growth, identified by more than 50 percent of respondents.
An example of how the data can be analyzed using a data warehouse is as shown:

This primarily involves extraction from the original data source, performing transformations to suit different data sources and finally load into a separate database (ETL).

Limitations of data warehousing:
Some of the limitations of using a data warehouse are:
·       The transformations from individual data sources to the data warehouse usually represents 50% of the total data warehouse effort
·       Data owners lose control over their data, raising ownership (responsibility and accountability), security and privacy issues
·       Initial implementation time is high and expensive
·       Updating to add new data sources is time consuming and expensive
·       Limited flexibility since it requires multiple separate data marts for multiple uses and types of users
·       Data is static and dated and changes cannot be monitored
·       No data drill-down capabilities

Future of data warehousing (supernova schema):
According to Kimball, data warehousing has never been more valuable and interesting than it is currently in the age of big data. Since dimensions are the most important part of a warehouse, they need to be more powerful to support advanced queries and analytics. In a supernova schema, attributes of a dimension can be complex objects and not just simple text. Supernova dimensions become much more flexible and extensible from one analysis to the other. The following figure shows how a supernova customer dimension is different from a traditional customer dimension.



References: