Bitmap indexes make use of bit arrays (bitmaps) to answer queries by performing bitwise logical operations Bitmap indexes are useful in the data warehousing applications Bitmap indexes have a significant space and performance advantage over other structures for such data Tables that have less number of insert or update operations can be good candidates The advantages of Bitmap indexes are: - They have a highly compressed structure, making them fast to read - Their structure makes it possible for the system to combine multiple indexes together so that they can access the underlying table faster The Disadvantage of Bitmap indexes is: - The overhead on maintaining them is enormous
Correct Answer: Data cleaning is also known as data scrubbing Data cleaning is a process which ensures the set of data is correct and accurate Data accuracy and consistency, data integration is checked during data cleaning Data cleaning can be applied for a set of records or multiple sets of data which need to be merged Data cleaning is performed by reading all records in a set and verifying their accuracy Typos and spelling errors are rectified Mislabeled data if available is labeled and filed Incomplete or missing entries are completed Unrecoverable records are purged, for not to take space and inefficient operations
2. Explain the use of lookup tables and Aggregate tables.
Correct Answer: At the time of updating the data warehouse, a lookup table is used When placed on the fact table or warehouse based upon the primary key of the target, the update is takes place only by allowing new records or updated records depending upon the condition of lookup The materialized views are aggregate tables It contains summarized data For example, to generate sales reports on weekly or monthly or yearly basis instead of daily basis of an application, the date values are aggregated into week values, week values are aggregated into month values and month values into year values To perform this process aggregate function is used
Correct Answer: Data Mart is a data repository which is served to a community of people who works on knowledge (also known as knowledge workers) The data resource can be from enterprise resources or from a data warehouse
4. Difference between ER Modeling and Dimensional Modeling.
Correct Answer: Dimensional modelling is very flexible for the user perspective Dimensional data model is mapped for creating schemas Where as ER Model is not mapped for creating shemas and does not use in conversion of normalization of data into denormalized form ER Model is utilized for OLTP databases that uses any of the 1st or 2nd or 3rd normal forms, where as dimensional data model is used for data warehousing and uses 3rd normal form ER model contains normalized data where as Dimensional model contains denormalized data
5. What is the difference between view and materialized view?
Correct Answer: View: - Tail raid data representation is provided by a view to access data from its table - It has logical structure can not occupy space - Changes get affected in corresponding tables Materialized view - Pre calculated data persists in materialized view - It has physical data space occupation - Changes will not get affected in corresponding tables
6. What is the purpose of cluster analysis in Data Warehousing?
Correct Answer: Cluster analysis is used to define the object without giving the class label It analyzes all the data that is present in the data warehouse and compare the cluster with the cluster that is already running It performs the task of assigning some set of objects into the groups are also known as clusters It is used to perform the data mining job using the technique like statistical data analysis It includes all the information and knowledge around many fields like machine learning, pattern recognition, image analysis and bio-informatics Cluster analysis performs the iterative process of knowledge discovery and includes trials and failures It is used with the pre-processing and other parameters as a result to achieve the properties that are desired to be used
7. What are the different models used in cluster analysis?
Correct Answer: There are many algorithms that can be used to analyze the database to check the maintenance of all the data sets that are already present The different types of cluster models include as follows: - Connectivity models: these are the models that connect one cluster to another cluster This includes the example of hierarchical clustering that is based on the distance connectivity of one model to another model - Centroid models: these are the models that are used to find the clusters using the single mean vector It includes the example of k-means algorithm - Distribution models: it includes the specification of the models that are statistically distributed for example multivariate normal distribution model - Density models: deals with the clusters that are densely connected with one another in the regions having the data space - Group models: specifies the model that doesn?t provide the refined model for the output and just gives the grouping information
8. What is Cascade and Drill Through? What is the difference between them?
Correct Answer: Cascade: - Cascade process involves taking values from various other prompts - The result is a single report - The result is used when a criteria is to be implemented Drill Through: - Drill Through process is implemented when navigation from summary to detailed information - Drill Through has a parent and a child report - Data of another report can be seen based on the current details of data
9. Explain about various caches available in Data Integrator
Correct Answer: NO_CACHE ? It is used for not caching values PRE_LOAD_CACHE ? Result column preloads and compares the column into the memory, prior to executing the lookup PRE_LOAD_CACHE is used when the table can exactly fit in the memory space DEMAND_LOAD_CACHE ? Result column loads and compares the column into the memory when a function performs the execution DEMAND_LOAD_CACHE is suitable while looking up the highly repetitive values with small subset of data
10. What are the prime responsibilities of Data Integration Administrator?
Correct Answer: - Scheduling and executing the batch jobs - Configuring, starting and stopping the real-time services - Adapters configuration and managing them - Repository usage, Job Server configuration - Access Server configuration - Batch job publishing - Real-time services publishing through web services