- The process of combining data from different resources - The combined data is provided to the users with unified view - Information from different enterprise domains are integrated ? known as Enterprise Information Integration - Useful for merging information from different technologies among enterprises - The sub areas of data integration are 1 Data Warehousing 2 Data Migration 3 Master Data Management
Technology problems
Search Results
1. What are the prime responsibilities of Data Integration Administrator?
Correct Answer: - Scheduling and executing the batch jobs - Configuring, starting and stopping the real-time services - Adapters configuration and managing them - Repository usage, Job Server configuration - Access Server configuration - Batch job publishing - Real-time services publishing through web services
2. Explain about various caches available in Data Integrator
Correct Answer: NO_CACHE ? It is used for not caching values PRE_LOAD_CACHE ? Result column preloads and compares the column into the memory, prior to executing the lookup PRE_LOAD_CACHE is used when the table can exactly fit in the memory space DEMAND_LOAD_CACHE ? Result column loads and compares the column into the memory when a function performs the execution DEMAND_LOAD_CACHE is suitable while looking up the highly repetitive values with small subset of data
3. What is Cascade and Drill Through? What is the difference between them?
Correct Answer: Cascade: - Cascade process involves taking values from various other prompts - The result is a single report - The result is used when a criteria is to be implemented Drill Through: - Drill Through process is implemented when navigation from summary to detailed information - Drill Through has a parent and a child report - Data of another report can be seen based on the current details of data
4. What are the different models used in cluster analysis?
Correct Answer: There are many algorithms that can be used to analyze the database to check the maintenance of all the data sets that are already present The different types of cluster models include as follows: - Connectivity models: these are the models that connect one cluster to another cluster This includes the example of hierarchical clustering that is based on the distance connectivity of one model to another model - Centroid models: these are the models that are used to find the clusters using the single mean vector It includes the example of k-means algorithm - Distribution models: it includes the specification of the models that are statistically distributed for example multivariate normal distribution model - Density models: deals with the clusters that are densely connected with one another in the regions having the data space - Group models: specifies the model that doesn?t provide the refined model for the output and just gives the grouping information
5. What is the purpose of cluster analysis in Data Warehousing?
Correct Answer: Cluster analysis is used to define the object without giving the class label It analyzes all the data that is present in the data warehouse and compare the cluster with the cluster that is already running It performs the task of assigning some set of objects into the groups are also known as clusters It is used to perform the data mining job using the technique like statistical data analysis It includes all the information and knowledge around many fields like machine learning, pattern recognition, image analysis and bio-informatics Cluster analysis performs the iterative process of knowledge discovery and includes trials and failures It is used with the pre-processing and other parameters as a result to achieve the properties that are desired to be used
Correct Answer: Following are the benefits of data integration: - Makes reporting, monitoring, placing customer information across the enterprise flexible and convenient - Data usage is efficient - Cost Effective - Risk adjusted profitability management as it allows accurate data extraction - Allows timely and reliable reporting, as data quality is the prime technology for business challenges
7. What are the factors that are addressed to integrate data?
Correct Answer: Following are the data integration factors: - Sub set of the available data should be optimal - Noise/distortion estimation levels because of sensory/processing conditions at the time of data collection - Accuracy, spatial and spectral resolution of data - Data formats, storage and retrieval mechanisms - Efficiency of computation for integrating data sets to reach the goals
8. How do we measure progress in Data Integration?
Correct Answer: Look for the existence of the following items: - Generic Data Models - An Enterprise Data Platform - Identify the Data Sources - Selection of a MDM Product - Implementation of a Customer Master Index or appropriate alternative
Correct Answer: - UDAI places the data in the source systems - A set of views are defined for providing access the unified view to the clients / customers - Zero latency of data can be propagated from the source system - The generated consolidated data need not require separate storage space - Data history and version management is limited and applied only to the similar type of data - Accessing to the user data overloads on the source systems
Correct Answer: - Physical Data Integration is all about creating new system that replicates data from the source systems - This process is done to manage the data independent of the original system - Data Warehouse is the example of Physical Data Integration - The benefits of PDI include data version management, combination of data from various sources, like mainframes, flat files, databases - A separate system is needed for handling vast data volumes