Corporate Data Quality

Text
0
Kritiken
Leseprobe
Als gelesen kennzeichnen
Wie Sie das Buch nach dem Kauf lesen
Schriftart:Kleiner AaGrößer Aa

1.4 The Framework for Corporate Data Quality Management

In practice, the requirements indicated above must be considered with respect to the special needs and capabilities of each company, so that data quality management can be successfully established throughout the company. Data quality does not mean quality at any price, but rather quality in accordance with the corporate strategy, the business processes, the structure of the organization and the information systems.

1.4.1 An Overview of the Framework

The framework for the quality management of corporate (master) data offers a solution for these design tasks, by applying the approach of Business Engineering to the company-wide management of data quality (see Figure 1-11). In general, Business Engineering is a method-oriented, model-based approach for designing companies in the information age (Österle and Winter 2003).The “Framework for Corporate Data Quality Management” proposes six design areas for developing artifacts at the strategic, organizational and information system levels (Otto 2011b; Otto et al. 2011). Each design area will have its own types of results (documents).

1.4.2 Strategic Level

The strategy for the management of data quality will orient the management of data quality on the company goals (see Table 1-3).


Table 1-3: Results for the Data Quality Strategy


Figure 1-11: The Framework for Corporate Data Quality Management (Otto et al. 2011 p. 10)

One example for the connection between the management of data quality and the company’s goals can be found at DB Netz AG, which is responsible for the railway infrastructure in Germany. The railway infrastructure includes the network of rails, tunnels, bridges, train stations and so on. An agreement on performance and financing governs the allocation of resources from the Federal Republic of Germany to DB Netz AG in the sense of subsidization of the repair and maintenance work performed on the railway infrastructure.

The amount of the annual subsidy (within certain limits) depends directly on the quality of the land registry for the infrastructure, which includes the registration of the number, maintenance status and certain performance parameters (such as permitted speed) for all infrastructure systems. Therefore, a high level of consistency, actuality, completeness and availability of the master data for the infrastructure system will positively affect the funding of the entire company.

1.4.3 Organizational Level

The organizational level includes three design areas, specifically the Management System area for data quality management (also called “data quality controlling system” or “quality assurance”), the DQM Organization system and the Processes and Methods for DQM.

The management of data quality can only be targeted properly after the meaning of “good (master) data” has been quantified. To achieve this, the quality of the data must be measured. Key performance indicators (KPIs) for data quality are quantitative measures for data quality (Hüner 2011)[11]. The decisive factor when developing a key performance indicator system for data quality lies in ascertaining what should be measured and what can be measured. Key performance indicator systems for data quality must be oriented on the factual requirements and combined with the key performance indicators for business processes to the greatest extent possible. Table 1-4 depicts the results for the Management System design area.


Table 1-4: Results for the Management System

Because the management of master data is an interdisciplinary issue, the tasks of managing the data must be coordinated across the boundaries between the company’s individual divisions and business areas. The organization of the DQM system is intended to serve this purpose. It is a virtual organization in many companies, through which the employees remain in their original chains of command for purposes of discipline and are also associated with an additional technical chain of command (Table 1-5).

Organization of the management of data quality manifests itself in the DQM roles and the assignment of responsibilities to these roles. In practice, a variety of roles have evolved in order to allow the tasks associated with company-wide management of data quality to be taken seriously.


Table 1-5: Results for Organization

In addition to the identification and description of the roles, the responsibilities must be defined. Responsibilities indicate which task areas and rights (such as rights to give instructions, plan, make decisions and employee participation) should be assigned to a role in the management of the master data. For example, the development of a uniform data model for the business objects used interdepartmentally is one task area. The company data steward, who also is responsible for the development of the master data management system, bears primary responsibility for this. In addition to the company data steward, those people responsible for data, who have the necessary specialist and technical knowledge to approve the data model or request improvements, are often defined as additional roles. The data owner is responsible and accountable for certain data objects and generally a management representative (such as the manager of the Central Purchasing or Supply Chain Management departments) for a specialized area.

The fourth design area, DQM Processes and Methods, is based on the management of the lifecycle for master data as well as those processes according to which the employees manage the quality of the data (Table 1-6). One of the most important causes of poor data quality is the lack of overall cultivation of individual classes of master data. Companies are organized according to their roles (such as purchasing or marketing), countries or markets and business processes (such as order-to-cash or make-to-stock). For that reason, only a few companies have an office that maintains an overview of the acquisition, modification, use and deletion of individual pieces of master data.

The task of analyzing causes and effects of low quality master data is very complex for this reason. As a rule, causes are actions that are executed using data in application systems (such as creation, modification, supplementation or deletion). These actions in turn affect business processes whose quality can be quantified by key performance indicators.


Table 1-6: Results for Processes and Methods

1.4.4 Information System Level

The system level covers two design areas, specifically the DQM architecture and the application systems for the management of data quality. Table 1-7 summarizes the results for the DQM Architecture design area. The architecture for the distribution and retention of data describes which data will be stored in which systems and indicates how data will flow between those systems.

The core business object model is a central result of the management of data quality, because it is the prerequisite for a uniform understanding of the data and therewith for the intended use of the data as well. Its development and progress must include the specialist departments, because the knowledge about the significance of the master data in the business processes is only available in those departments. The core business object model will then be transformed into a corporate data model for implementation by the IT department.


Table 1-7: Results for the Architecture of Corporate Data

Ultimately, the landscape of application systems forms the sixth design area for the management of data quality. The results for this design area are presented in Table 1-8.


Table 1-8: Application System Results

This design area deals with the analysis, design, implementation and improvement of those application systems that will be required to support the management of data quality. This includes special master data management systems like SAP NetWeaver MDM on the one hand and software tools for the management of the core business object model on the other hand. When selecting these application systems for the management of master data, issues of data modeling, data quality management, security, usage interfaces, data distribution architecture and, in particular, integration; must be taken into consideration both in regards to the systems as well as to the information to be integrated into them. The Fraunhofer Institute for Industrial Engineering (Fraunhofer IAO) has provided an extensive comparison of the established systems (Kokemüller 2009).

 

1.5 Definition of Terms and Foundations

The management of master data and its quality is not a new issue. Its penetration into all areas of business through digital services has merely increased its special importance to a drastic extent. Both practice and research have been discussing this issue for as long as corporate information systems have been used to support business processes. Knowing which solutions are already available, which are predicted and which are not has special importance for both research and practical applications. For that reason, the central concepts and terms of data quality management must be defined first. Figure 1-12 depicts the most important terms and their relationships with each other.


Figure 1-12: Conceptual Model of the Terms for Data Quality Management (authors’ illustration)

1.5.1 Data and Information

Data describes the properties of business objects, meaning the material and immaterial object found in the real world (Boisot and Canals 2004). There are in fact many papers about the difference between data and information, however a clear understanding of this term has not yet been accepted (Boisot and Canals 2004; Badenoch et al. 1994). One school of thought understands information to be knowledge that is exchanged during human communication; another takes the perspective of the distribution of information, according to which data is the building block for information (Oppenheim et al. 2003). Accordingly, data must be processed in order to become information (Van den Hoven 1999; Holtham 1995; Wang 1998). According to ISO/IEC 2382-1, data is the formalized representation (meaning ready for further processing, interpretation and communication) of the properties of business objects (ISO/IEC 1993).

The logical organization of data distinguishes a number of levels of aggregation (Chen 1976; Levitin and Redman 1998; Yoon et al. 2000). Data elements form the lowest level of aggregation. Data elements are instantiations of the attributes of data objects (such as a customer’s family name). Records form the second level of aggregation. A record is the instantiation of a data object. For example, a customer master data record is constructed using all of the characteristics of the customer business object such that business processes (marketing, service and accounts receivable) will run smoothly for customers. At the third level of aggregation, tables (such as a customer master table) aggregate multiple records. Databases in turn aggregate several tables. A customer management database could contain all customer master data as well the associated marketing data. The totality of all company databases ultimately forms the corporate data resource. Figure 1-13 visualizes these relationships.


Figure 1-13: Overview of the Logical Organization of Data (authors’ illustration)

This depiction of the organization of data is oriented on the relational data model. In the ideal, semantically unique case, there will be a 1:1 relationship between the world of data and the real world, meaning that one data object represents precisely one business object. In reality however, there are often several data objects that simultaneously represent the same business object. In that case, management of the data quality is required and guidelines for processes and systems must be established, which will allow identification of the “right” data object for a certain context and will provide it for use in business processes.

1.5.2 Master Data

The ISO 8000 standard from the International Organization for Standardization (ISO) (ISO 2009) defines master data as information objects “…which are independent and fundamental for an organization. [Master data] needs to be referenced in order to perform transactions.” Within a company, this data must be identified uniquely and interpreted consistently across multiple organizational units. In this, corporate data is so-called “global” master data that applies to the entire company. In contrast with this, “local” master data applies only, for example, to one business division, one location or one company role. The focus of the methods and tools presented here is on the corporate data. However, because the term master data has entered the general language as well as commercial software solutions, this book will predominantly use this more popular term.

In contrast with transaction or inventory data, master data does not change very often. For that reason, this data is also sometimes called “static data”. In practice, it is not possible to define one generally valid list for all classes of master data. Thus for example, data about contracts is considered master data in the energy and insurance industries, however in contrast is considered transaction data by the telecommunications industry due to the comparably short terms of contracts and frequency of changes.

Bosch, for example, has classified the following data items as master data (Hatz 2008):

· Customers

· Customer hierarchies

· Materials

· Suppliers

· Employees

· Charts of accounts

· Organizational units

In practice, differentiating master data from transaction or inventory data is less relevant than the determination of which individual attributes of a class of master data must be managed by a central master data management department, because one central organizational unit alone cannot assume this task due to the complexity of the individual classes of master data. In principle, the question of which attributes of a class of master data fall within the scope of the central master data management department will be answered by the analysis of the strategic requirements of each individual company.

The following characteristics for differentiation may help with this issue (White 2010; White and Radcliffe 2007).

· Organizational scope: differentiation between local data and “global” data (meaning corporate data used throughout the company)

· Type of data: differentiation between “structured” data (which is typically managed in relational databases) and “unstructured” data like product information (such as images, advertising text and application video clips)

· Location of the definition of the metadata: differentiation between the internal definition of significance, formats and default values on the one hand and external definition on the other hand (such as the country and currency codes defined by the ISO or classification standards like eCl@ss and UN/SPSC)

Master data that has been defined externally is called reference data. Examples of such data include the country and currency codes noted above as well as geo-data. Metadata (literally data about data) describes and defines the properties of other types of data (DAMA 2008 p. 84). Modification data is one example of a sub-class of metadata. This data determines when, how and by whom a specific data item has been modified.

1.5.3 Data Quality

Data quality is a multi-dimensional concept that is dependent on the context (Wang and Strong 1996). Thus, a single characteristic that completely describes data quality does not exist. Rather, there is a variety of dimensions of data quality that describe the quality of data overall. Typical dimensions for the quality of data include[12]:

· Correctness: does the data factually agree with the properties of the real world object that it should represent?

· Consistency: do several versions of the data regarding the same real object to be stored in different information systems, for example, agree with each other?

· Completeness: are all of the values or attributes of a record completely present?

· Actuality[13]: does the data agree with the current status of the real object at each point in time and will the data be modified as the state changes?

· Availability: can data users access the data easily at the desired point in time?

Contextual dependency means that the data quality will be sufficient for a business transaction, but may be insufficient for a different transaction. For example, a customer’s correct delivery address, including the correct shipping ramp, is essential for an automobile supplier’s logistics department in order to ship the contract. For the marketing department in the same company, the correctness and consistency of the delivery address is insignificant, because the company name and country information suffices to allow that department to evaluate the revenue earned with that customer in the previous year. For this reason, data quality is defined as a measure for the applicability of data for certain requirements in those business processes where it is used (Otto et al. 2011).

Data quality will change over time, because the data ultimately represents a snapshot of reality, which will also change over time. For example, customers will move to new addresses and suppliers will change the form of their legal incorporation. Figure 1-14 represents the typical progress of the data quality over time, as found in many companies.


Figure 1-14: Typical Progress of Data Quality over Time in a Company (Otto 2014 p. 21)

Many companies first begin to solve problems with data quality when the quality of the data has fallen below a certain level that prevents business processes from running smoothly. Examples of problems in data quality, which suddenly become visible once the data quality has already fallen below such a level, include problems in migration, a high level of manual activity in the business processes and management reports with different values for the same key performance indicators.

1.5.4 Data Quality Management (DQM)

The mandate of Data Quality Management (DQM) is to analyze, improve and ensure the quality of the data. The Data Management Association (DAMA) defines that all activities, procedures and systems that measure, improve and ensure the suitability of data for use belong to DQM. This perspective again adopts quality management methods from the area of production management (DAMA 2008).

 

DQM generally differentiates between preventive and reactive measures. Preventive DQM measures target the avoidance of defects in the data with negative effects on the quality of the data. In contrast, reactive DQM measures target the discovery of existing defects in the data and their correction.

The reactive approach has several disadvantages.

· Resources for improving the quality of the data (such as software, expert knowledge and consulting activities) are not planned and budgeted, and may therefore not be available in cases of doubt.

· Purely reactive management of data quality is frequently associated with a lack of measurement of the data quality. In such cases, companies frequently do not have target values, meaning that the question of whether the (reactive) measures for the improvement of the data quality are sufficient, or extend well beyond their desired goals, cannot be measured or determined.

· The total quality management approach (an approach to the management of quality from operations management) has demonstrated that the total costs of all reactive measures in quality management exceed the costs of preventive quality management (Reid and Sanders 2005). This applies to material, as well as immaterial, resources (like data).

The use of automated validation rules (or “business rules”, see the following section) for manual data entry is one example of a preventive DQM measure. The Data Universal Numbering System (DUNS) number is a numeric code used for the identification of companies and is an important piece of master data used by many companies. For this case, services are available that check the validity of the number entered on a form supported by the system and prevent errors by doing so. An example of a reactive measure using the same case is the subsequent cleanup of duplicated data in the same, or various, databases.

The principle goal of companies is to detect data defects before they enter the system as much as possible, in order to avoid risks and costs as a consequence of defective data quality. However, defective data is not the only cause of costs; the DQM process itself causes costs. DQM costs arise both from preventive as well as reactive measures. Analogous to the management of quality in general (Campanella 1999), a disproportionate connection between data quality and DQM costs can be insinuated. Incremental costs for DQM therefore increase.


Figure 1-15: Costs of Data Quality (according to Eppler and Helfert 2004 p. 318)

In contrast to this, the follow-up costs of data defects fall as the quality of the data increases. The ideal total cost of data quality is the minimum of the total function of DQM costs and follow-up costs of data defects (Eppler and Helfert 2004) (Figure 1-15). The task of DQM therefore lies in finding the ideal combination of costs from preventive and reactive measures. In practice, difficulties often arise because accounting typically does account for many DQM costs. This applies in particular to the follow-up costs that originate in various corporate roles and can hardly be quantified in many areas.

Sie haben die kostenlose Leseprobe beendet. Möchten Sie mehr lesen?