Monday 25 February 2013

Basic ETL Concepts

Recently, I assigned to Data warehousing project to test the ETL (Extract,Transform,Load).

Lot of testers interested to know about ETL testing. So that i would like to share the article.

Here we go..

I hope this will give you some ideas to improve your ETL Knowledge and what we do to test ETL process.

ETL testing (Extract, Transform, and Load) means Extract the data from different sources table and transform by using some business logic. Finally load the data into target table.

Now we can see the different type of category:

Data warehouse testing is categorized into four different types:
  • New Data Warehouse Testing – New DW is built and verified from scratch. Getting input from customer and new data warehouse is build and verified with the help of ETL tools.
  • Migration Testing – Project customer will have an existing DW and the ETL is performing the job. However they are looking for new ETL tool in order to improve efficiency.
  • Change Request – In this type of project new data is added from different sources to an existing DW. However customer needs to change their existing business rule or they might integrate the new rule.
  • Report Testing – Report is main key factor for any Data Warehouse project. Report must be tested by validating layout, data in the report and calculation.
Lot of testers, misunderstanding that the Database testing and Data Warehouse is similar while the fact is that both hold different direction.

Now we can see how Database testing differs from Data Ware house testing
  • Database testing is done using smaller scale of data whereas Data ware house testing is done large volume of data.
  • Database testing is normally using with OLTP (Online transaction processing) type of databases whereas data warehouse testing normally with OLAP (online analytical processing) databases.
  • Database testing is consistently injected from uniform/Single sources whereas Data ware house testing inconsistently injected the data from different/Multiple sources.
  • Database testing performs only CRUD (Create, read, update and delete) operation whereas Data Ware house testing performs only SELECT operation (Read Only).
  • Database testing used Normalized databases whereas Data ware house testing used the DE Normalized databases.
  • Database uses only .sql files (Based on the DB platform) whereas Data Ware house testing can use any type of file (.txt, .xlsx) to import data. 
List of ETL Testing Techniques that are treated when we testing the ETL package:
  • Make sure that Data transformed correctly to Target table.
  • Make sure data is loaded into the DW table without any data loss and truncation.
  • Make sure that Rejects the invalid data if any and generates the reports/Logs for invalid data for reference.
  • Make sure that data is loaded within expected time frames. (Performance)
List of points to be considered as essential when we testing the ETL package:
  • Data transformation from source to destination works 
  • Expected data is added in target system
  • No of Records count should match.
  • Error Log should create for rejected data
  • NULL Value fields 
  • Duplicate data is not loaded
  • Focus is to ensure Data validation and integrity mainly.
  • Verify that data getting loaded without truncation.
ETL Testing Process:
All testing types like unit testing, integration testing, System testing and UAT testing lies under verification and validation process as same like ETL testing also lies under verification and validation process.
Now we can see the process of Data Warehouse /ETL process.
  •  Business and requirement understanding.
  • Validating the requirement.
  • Estimate for testing.
  • Preparing test plan based on test estimation and Requirement.
  • Preparing Use cases (Scenarios) and test case as per the inputs.
  • Once all the test cases are freezed and approved, testing team move forward to execute the test cases.
  • Execution is performed once the exit criteria are met.
  • Finally, prepared the completed summary report and Closure process is done.
ETL Testing Challenges which faced in my project: 

ETL testing is quite different from other testing. There are many challenges we faced while performing data warehouse testing.
Here I have listed few ETL challenges which most of the testers faced on the project:
  • Out of Memory exception occurs while executing ETL.
  • Taking more memory while writing huge volume of Data into the table.
  • Package goes to No response state when no memory to execute the package.e
  • Incompatible and duplicate data.
  • Loss of data during ETL process
  • Unfreeze data format
  • Don’t have access to run the ETL jobs.
  • Volume of data is huge. 





Tuesday 12 February 2013

Dareware House Testing Observation

I got chance to do the Dataware house testing  (BI) in my company.

In Data Integrity testing, I have used below approach : 

I have used VBExcel comparer (Macro) to validate the column values or data between source and destination.
It is useful for me to validate/compare large number of records in both source and destination.
 

Few observations (Issues) found while doing the Performance testing,

While executing the ETL with huge volume of date, Out of Memory exception occurred. For BI Testing, mainly focused to avoid the memory related issues...

 After some extent of analysis to provide the solution that,
  1. Release the the memory object through garbage collector after inserting the large volume of data. So Memory level has maintained at the same level if inserting huge data to the database.
  2. Avoid to use more connection to open and close. Since every connection takes memory to initialization. So that is also taking some memory. Reduce the connections.


Note : I have used SQL Monitor 3.2 to analysis the performance of the server. It has been used generating various reports like CPU Usage, Disk Read, Disk Write, Disk Transfer, SQL Server Total Memory...etc.