Executive Summaries for PROFIT Working Paper Series

E40-313, MIT Sloan School of Management
30 Memorial Drive
Cambridge, MA 02139 USA

Executive Summaries of recent Working Papers are available online at: www.ssrn.com.
You can view relevant papers as follows:

Dr. Amar Gupta
Dr. Stuart Madnick
Dr. Michael Siegel
Dr. Rafael Palacios

Earlier Working Papers are as follows:

PROFIT-92-01 Toward Quality Data: An Attribute-Based Approach
(length: 30 pages)
November 92 by Richard Y. Wang, M.P. Reddy, & Henry B. Kon (double-spaced)

Abstract: The need for a quality perspective in the management of the data resource is becoming increasingly critical. Managing data quality, however, is a complex task. Although it would be ideal to achieve zero defect data, this may not always be attainable. Moreover, different users may have different criteria in determining the quality of data. This suggests that it would be useful to be able to tag data with quality indicators which are characteristics of the data and its manufacturing process. From these quality indicators, users can make their own judgment of the quality of the data for the specific application at hand.

This paper investigates how quality indicators may be specified, stored,retrieved, and processed. Specifically, we propose an attribute-based data model that facilitates cell-level tagging of data. Included in this attribute-based model are a mathematical model description that extends the relational model, set of quality integrity rules, and a quality indicator algebra which can be used to process SQL queries that are augmented with quality indicators. A data quality requirements analysis methodology that extends the Entity Relationship model is also presented.

PROFIT-92-02 A Knowledge-based Approach to Assisting in Data Quality Judgement
(length: 10 pages) (single-spaced)
December 92 by Y. Jang, Henry B. Kon & Richard Y. Wang
Abstract: As the integration of information systems enables greater accessibility to data from multiple sources, the issue of data quality becomes increasingly important. This paper attempts to formally address the data quality judgment problem with a knowledge-based approach. Our analysis has identified several related theoretical and practical issues. For example, data quality is determined by several factors, referred to as quality parameters. Quality parameters are often not independent of each other, raising the issue of how to represent relationships among quality parameters and reason with such relationships to draw insightful knowledge about the overall quality of data.

In particular, this paper represents a data quality reasoner. The data quality reasoner is a data quality judgment model based on the notion of a "census of needs." It provides a framework for deriving an overall data quality value from local relationships among quality parameters. The data quality reasoner will assist data consumers in judging data quality. This is particularly important when a large amount of data involved in decision-making come from different, unfamiliar sources.

PROFIT-93-03 Algorithms for Thinning and Rethickening Binary Digital Patterns inDigital Signal Processing (length: 6 pages)
January 93 by M.V. Nagendraprasad, Patrick S.P. Wang, & Amar Gupta

Abstract: Pattern recognition and image processing application frequently deal with raw inputs that contain lines of different thickness. In some cases, this variation in the thickness is an asset, enabling quicker recognition of the features in the input image. For example, in processing aerial photographs, detection of major landmarks can be aided by the variations in the thickness of the contours. In other cases, the variation can be a liability, and can cause degradation in the accuracy and speed of recognition. For example, in the case of handwritten characters, the degree of uniformity of the thickness of individual strokes directly impacts the probability of successful recognition, especially if neural network based recognition techniques are employed. This paper describes the thinning stage, a theoretical proof related to a new and faster thinning algorithm, and the rethickening stage of the handwriting recognition process.


PROFIT-93-04 An Integrated Architecture for Recognition of Totally Unconstrained Handwritten Numerals, in International Journal of Pattern Recognition and Artificial Intelligence, Vol. 7, No. 4
(length: 16 pages) (single-spaced)
January 93 by Amar Gupta, M.V. Nagendraprasad, A. Liu, P.S.P. Wang, & S. Ayyadurai

Abstract: A multi-staged system for off-line handwritten numeral recognition is presented here. After scanning, the digitized binary bit map image of the source document is passed through a preprocessing stage which performs segmentation, thinning and rethickening, normalization, and slant correction. The recognizer is a three-layered neural net trained with back-propagation algorithm. While a few systems that use three-layered nets for recognition have been presented in the literature, the contribution of our system is based on two aspects: elaborate preprocessing based on structural pattern recognition methods combined with a neural net based recognizer; and integration of neural net based and structural pattern recognition methods to produce high accuracies.


PROFIT-93-05 A Process View of Data Quality
(length: 17 pages) (double-spaced)
March 93 by Henry B. Kon, Jacob Lee & Richard Wang

Abstract: We posit that the term data quality, though used in a variety of research and practitioner contexts, has been inadequately conceptualized and defined. To improve data quality, we must bound and define the concept of data quality. In the past, researchers have tended to take a product-oriented view of data quality. Though necessary, this view is insufficient for three reasons. First, data quality defects in general are difficult to detect by simple inspection of the data product. Second, definitions of data quality dimensions and defects, while useful intuitively, tend to be ambiguous and interdependent. Third, in line with a cornerstone of TDQM philosophy, emphasis should be placed on process management to improve product quality.

The objective of this paper is to characterize the concept of data quality from a process perspective. A formal process model of an information system (IS) is developed which offers precise process constructs for characterization data quality. With these constructs, we rigorously define the key dimensions of data quality. The analysis also provides a framework for examining the causes of data quality problems. Finally, facilitated by the exactness of the model, an analysis is presented of the interdependencies among the various data quality dimensions.

PROFIT-93-06 A Research Retrospective of Innovation Inception and Success: The Technology-Push Demand-Pull Question
(length: 27 pages) (double-spaced)
March 93 by Shyam R. Chidamber & Henry B. Kon

Abstract: Innovation researchers have frequently debated whether organizational innovation is driven by market demand or by technological shifts. The market demand school of thought suggests that organizations innovate based on market needs, whereas the technology proponents claim that change in technology is the primary driver of innovation. Collectively, empirical research studies on technological innovation are inconclusive regarding this technology-push demand-pull (TPDP) debate. Eight key studies relevant to this issue are examined for their methods, implications, and caveats to establish a structured way of interpreting the various results. The philosophical underpinnings of market demand and technology factors as drivers of innovation are also examined.

This paper suggests that much of the contention between the technology-push and demand-pull findings is due to different research objectives, definitions, and models. The main conclusion is that there exists a clear relationship between the research models used in these studies and the outcomes observed, suggesting that differences in problem statement and research contracts may be causing the apparent incongruity in research findings. Organizational and national policy level issues are also examined in light of the finding that different levels of analysis lead to different results.

PROFIT 93-07 Towards an Active Schema Integration Architecture for Heterogeneous Database Systems, in Third International Workshop on Research Issues on Data Engineering: Interoperability in Multidatabase Systems, Vienna, Austria
(length: 9 pages) (single-spaced)
April 93 by M.P. Reddy, Michael Siegel and Amar Gupta

Abstract: In this paper we describe our research in the development of a four-layered architecture for Heterogeneous Distributed Database Management Systems (HDDBMS). The architecture includes the local schema , local object schema, global schema, and global view schema. This architecture was developed to support the propagation of local database semantics (e.g., integrity constraints, context) to the global schema and global view. Constraints propagated to the global level can be used to derive new constraints that could not have been recognized by any of the local components. These constraints significantly reduce query processing costs in the HDDBMS environment by permitting incorporation of techniques similar to semantic query optimization in the single database environment [CFM84,HZ80,Kin81,SSS91]. These techniques are used on the global query to identify candidate databases and reduce the number of required local databases.


PROFIT-93-08 Data Quality Requirements Analysis and Modeling
(length: 7 pages) (single-spaced)
April 93 by Richard Y. Wang, Henry B. Kon, & Stuart E. Madnick

Abstract: Data engineering is the modeling and structuring of data in its design, development, and use. An ultimate goal of data engineering is to put quality of data in the hands of users. Specifying and ensuring the quality of data, however, is an area in data engineering that has received little attention. In this paper we: (1) establish a set of premises, terms, and definitions for data quality management, and (2) develop a step-by-step methodology for defining and documenting data quality parameters important to users. These quality parameters are used to determine quality indicators, to be tagged to data items, about the data manufacturing process such as data source, creation time, and collection method. Given such tags, and the ability to query over them, users can filter out data having undesirable characteristics.

The methodology developed provides a concrete approach to data quality requirements collection and documentation. It demonstrates that data quality can be an integral part of the database design process. The paper also provides a perspective for the migration towards quality management of data in a database environment.

PROFIT 93-09 Detection of Courtesy Amount Block on Bank Checks
(length: 33 pages) (single-spaced)
May 93 by Arun Agarwal, Len M. Granowetter, Amar Gupta, & P.S.P. Wang

Abstract: This paper presents a multi-staged technique for locating the courtesy amount block on bank checks. In the case of a check processing system, many of the proposed methods are not acceptable, due to the presence of many fonts and text sizes, as well as the short length of many text strings. This paper will describe a particular method chosen to implement a Courtesy Amount Block Locator (CABL). First, the connected components in the image are identified. Next, strings are constructed on the basis of proximity and horizontal alignment of characters. Finally, a set of rules and heuristics are applied to these strings to chose the correct one. The chosen string is only reported if it passes a verification test, which includes an attempt to recognize the currency sign.


PROFIT-93-10: An Adaptive Modular Neural Network With Application To Unconstrianed Character Recognition (length: 27 pages)
August 93 by Lik Mui, Arun Agarwal, Amar Gupta, & Patrick Wang

Abstract: The topology and capacity of a traditional multilayer neural system, as measured by the number of connections in the network, has surprisingly little impact on its generalization ability. This paper presents a new adaptive modular network that offers superior generalization capability. The new network provides significant fault tolerance, quick adaptation to novel inputs, and high recognition accuracy. We utilize this paradigm for recognition of unconstrained handwritten characters.


PROFIT-93-11: Run-time Type Information and Incremental Loading in C++, in Journal of Object Oriented Programming
(length 15 pages) (double-spaced)
September 93 by Murali K. Vemulapati, Sriram Duvvuru, & Amar Gupta

Abstract: We present the design and implementation strategy for an integrated programming environment which facilitates specification, implementation, and execution of persistent C++ programs. Our system is implemented in E, a persistent programming language based on C++. The environment provides type identity and type persistence, i.e., each user-defined class has a unique identity and persistence across compilations. The system provides Run-time type information for the user-defined types and it provides efficient run-time access to the members of an object by generating maps of the objects. It also supports incremental linking and loading of new classes or modification of classes existing in the database.

PROFIT 94-12 Context Interchange: Overcoming the Challenges of Large-Scale Interoperable Database Systems in a Dynamic Environment
(length: 25 pages) (double-spaced)
February 94 by Cheng Hian Goh, Stuart E. Madnick, & Michael D. Siegel

Abstract: Research in database interoperability has primarily focused on circumventing schematic and semantic incompatibility arising from autonomy of the underlying databases. We argue that, while existing integration strategies might provide satisfactory support for small or static systems, their inadequacies rapidly become evident in large-scale interoperable database systems operating in a dynamic environment. The frequent entry and exit of heterogeneous interoperating agents renders "frozen" interfaces (e.g., shared schemas) impractical and places an ever-increasing burden on the system to accord more flexibility to heterogeneous users. User heterogeneity mandates that disparate users' conceptual models and preferences must be accommodated, and the emergence of large-scale networks suggests that the integration strategy must be scalable and capable of dealing with evolving semantics.

As an alternative to the integration approaches presented in the literature, we propose a strategy based on the notion of context interchange. In the context interchange framework, assumptions underlying the interpretations attributed to data are explicitly represented in the form of data contexts with respect to a shared ontology. Data exchange in this framework is accomanied by context mediation whereby data originating from multiple source contexts is automatically transformed to comply with the receiver context. The focus on data contexts giving rise to data heterogeneity (as opposed to focusing on data conflicts exclusively) has a number of advantages over classical integraton approaches providing interoperating agents with greater flexibility as well as a framework for graceful evolution and efficient implementation of large-scale interoperable database systems.

PROFIT-94-13: Incremental Loading in the Persistent C++ Language E
(length: 26 pages) (double-spaced)
February 94 by Murali Vemulapati, D. Sriram, & Amar Gupta

Abstract: E is an extension of C++ language providing database types and persistence. Persistence in E entails some form of dynamic linking of method code. This is because a program might encounter, on the persistent store, an object whose type was not known to the program when it was compiled. This necessitates dynamic linking of the method code of the corresponding type to the program so that the type definition is made available to the program. The current run-time support library provided by E is inadequate for this purpose. We modify and extend the run-time library of E by adding functionalities to dynamically link and unlink object modules. We then present the design of a class type of facilitate persistent types. Each user-defined type will have a unique persistent type object associated with it. Class type provides methods for dynamic linking and unlinking of user-defined classes using the extended run-time support. In addition, class type ensures identity of user-defined types, i.e., each user-defined type will have a unique identity across compilations of a program.


PROFIT-94-14 A Knowledge Based Segmentation Algorithm for Enhanced Recognition of Handwritten Courtesy Amounts
(length: 17 pages) (single-spaced)
March 94 by Karim Hussein, Arun Agarwal, Amar Gupta, & Patrick Shen-Pei Wang

Abstract: A knowledge based segmentation critic algorithm to enhance recognition of courtesy amounts on bank checks is proposed in this paper. This algorithm extracts the context from the handwritten material and uses a syntax parser based on a deterministic finite automation to provide adequate feedback to enhance recognition. The segmentation critic presented is capable of handling a number of commonly used styles for courtesy amount representation. Both handwritten and machine written numeric strings were utilized to test the efficacy of the preprocessor for the check recognition system described in this paper. The substitution error fell by 1.0% in our early tests.


PROFIT-94-15 Error Browsing and Mediation: Interoperability Regarding Data Error
(length: 10 pages) (single-spaced)
July 94 by Henry B. Kon & Michael D. Siegel

Abstract: Our research goals involve development of methodologies and systems to support administration and sharing errored data (e.g., data having incompleteness, inaccuracy, and invalid syntax). Data sources are assumed to have non-trivial degree of error. Data receivers are assumed to have differing sensitivity to various forms of error. Browsing involves measurement of error. Mediation involves run-time management of the source-receiver "error fit".

In this extended abstract we provide a foundation for error definition and measurement, and discuss their role in browsing and mediation. Included are: (1) a classification scheme for error definition as syntactic error and semantic error types, (2) a theoretical basis for relating semantic error to data meaning, (3) an outline of three general approaches to error measurement, and (4) an overview of browsing and mediation.

PROFIT-94-16 An Ontological and Semantical Approach to Source-Receiver Interoperability
(length: 10 pages) (double-spaced)
October 94 by Jacob Lee & Michael D. Siegel

Abstract: In this paper, we propose a theoretical approach to address the issue of semantic interoperability between a data source and a data receiver in the framework of the context interchange architecture. In particular, this approach highlights the concept of a statement as a unit of exchange. Several statement conversion axioms are proposed for source-receiver interoperability. The conceptual foundation for this approach is derived from Mario Bunge's Ontology and Semantics. The implications of this approach for further research on the design of interoperable database systems based on the context interchange architecture is then discussed.


PROFIT-94-17 International Multi-Company Collaborative Engineering: A Study of Japanese Engineering and Construction Firms
(length: 39 pages) (double-spaced)
August 94 by Masatoshi Kano, Ram Duvvuru Sriram, & Amar Gupta

Abstract: Concurrent/collaborative engineering (CE) often requires the collaboritive participation of several companies. Management of such multi-company collaboration in engineering becomes more difficult when companies from different countries are involved, as is the trend in the Japanese engineering and construction industry (E&C). This paper focuses on the conditions required for successful management of such CE. The authors first propose a conceptual time- and cost-based model of multi-company CE work as a framework. Then using this framework, current practices of Japanese E&C firms are analyzed. Major findings are that (1) international multi-company CE should be designed with careful consideration, not only to task split and allocation but also to inter-firm task dependencies; and (2) Japanese E&C firms need to alter their inter-firm coordination scheme significantly in order to derive full benefits in the global marketplace.


PROFIT-94-18 Image-Information Systems for Traffic Management
(length: 19 pages)(double -spaced)
August 94 by Ichiro Masaki & Amar Gupta

Abstract: This paper describes some examples of image-information systems which are relevant to traffic management. After reviewing related work in the fields of traffic management, intelligent vehicles, stereo vision, and ASIC-based approaches, the paper focuses on a stereo vision system for intelligent cruise control. The system measures the distance to the vehicle in front using trinocular triangualtion. An application specific processor architecture was developed to offer low mass-production cost, real-time operation, low power consumption, and small physical size. The system was installed in the trunk of a car and evaluated successfully on highways.

PROFIT-94-19 Context Interchange: A Lattice Based Approach , in Knowledge-Based Systems, Butterworth-Heinemann, Oxford England
(length: 25) (double-spaced)
August 94 M.P. Reddy and A. Gupta

Abstract: The level of semantic data interoperability between a source and a receiver is a function of the context interchange mechanism that operates between the source and the receiver. The semantic interoperability mechanisms in existing systems are usually static in nature and cannot cope with changes in the semantics of data either at the source or at the receiver. In this paper, we propose a context interchange mechanism, based on lattice theory which can handle changes in the semantics of data at both the source and the receiver. A site-copy selection algorithm is also presented in this paper which selects the set of sources that can supply semantically meaningful data to the query or source.


PROFIT-94-20 A Methodology for Integration of Heterogeneous Databases, in IEEE Transactions on Kowledge and Data Engineering, Vol. 6, No. 6
(length: 13 pages) (single-spaced)
December 94 by M.P. Reddy, B.E. Prasad, P.G. Reddy, and A. Gupta

Abstract: The transformation of existing local databases to meet diverse application needs at the global level is performed through a four-layered procedure that stresses total schema integration and virtual integration of local databases. The proposed methodology covers both schema integration and database integration, and uses a four-layered schema architecture (local schemata, local object schemata, global schema, and global view schemata) with each layer presenting an integrated view of the concepts that characterize the layer below. Mechanisms for accomplishing this objective are presented in theoretical terms, along with a running example. Object equivalence classes, property equivalence classes, and other related concepts are discussed in the context of logical integration of heterogeneous schemata, while object instance equivalence classes, property instance equivalence classes, and other related concepts are discussed for data integration purposes. The proposed methodology resolves naming conflicts, scaling conflicts, type conflicts, level of abstraction, and other types of conflicts during schema integration, and data inconsistencies during data integration.


PROFIT-93-21 Information Technology, Incentives, and the Optimal Number of Suppliers
(length: 16 pages) (single-spaced)
Fall 93 by J. Yannis Bakos & Erik Brynjolfsson

Abstract: Buyers are transforming their relationships with suppliers. For example, instead of playing off dozens or even hundreds of competing suppliers against one another, many firms are finding it more profitable to work closely with only a small number of "partners." In this paper we explore some causes and consequences of this transformation. We apply the economic theory of incomplete contracts to determine the optimal strategy for a buyer. We find that the buyer firm will often maximize profits by limiting its options and reducing its own bargaining power. This may seem paradoxical in an age of cheap communications costs and aggressive competition. However, unlike earlier models that focused on coordination costs, we focus on the critical importance of providing incentives for suppliers.

Our results spring from the need to make it worthwhile for suppliers to invest in "noncontractibles" such as innovation, responsiveness, and information sharing. Such incentives will often be stronger when the number of competing suppliers is small. The findings of the theoretical models appear to be consistent with observations from empirical research which highlights the key role of information technology in enabling this transformation.

PROFIT-94-22 Network Externalities in Microcomputer Software: An Econometric Analysis of the Spreadsheet Market
(length: 24 pages) (double-spaced)
November 1994 by Erik Brynjolfsson & Chris F. Kemerer

Abstract: As an economic good, software has a number of interesting properties. In addition to the value of intrinsic features, the creation of or conformance to industry standards may be critical to the success of a product. This research builds and evaluates econometric models to determine which product features are important in the purchase and pricing decisions for microcomputer software. A special emphasis is to identify the effects of standards and network externalities. The results of this research and the general model proposed can be used to estimate the relative values of software package features, adherence to standards, and increased market share. It also quantifies the opportunities created by changes in technology architecture. Finally, the results offer guidance into current public policy issues such as the value of intellectual property embodied in software.


PROFIT-93-23 The Productivity Paradox of Information Technology
(length: 11 pages) (single-spaced)
December 93 by Erik Brynjolfsson

Abstract: The relationship between information technology (IT) and productivity is widely discussed but little understood. Delivered computing power in the U.S. economy has increased by more than two orders of magnitude since 1970 yet productivity, especially in the service sector, seems to have stagnated. This article summarizes what we know and do not know, distinguishes the central issues from diversions, and clarifies the questions that can profitably be explored in future research. After reviewing and assessing the research to date, it appears that the shortfall of IT productivity is as much due to deficiencies in our measurement and methodological tool kit as to mismanagement by developers and users of IT.


PROFIT-94-24 Technology's True Payoff
(length: 3 pages) (single -spaced)
October 94 by Erik Brynjolfsson

Abstract: A few years ago, the business press was filled with stories about the so-called "productivity paradox" of computers: The billions of dollars poured into computers didn't seem to boost worker output. The pendulum has now swung with full force in the opposite direction. After the author and Lorin Hitt published a study that found a correlation between computer investment and significantly higher output in a sample of 300 large companies, dozens of business publications ran stories about the "technology payoff." This article examines the tools for assessing the return on information technology, and focuses attention on intangible benefits such as inventory savings, reduced space requirements, and decrease in rework.


PROFIT-94-25 Paradox Lost? Firm-level Evidence on the Returns to Information Systems Spending
(length: 40 pages) (double-spaced)
November 94 by Erik Brynjolfsson & Lorin Hitt

Abstract: The "productivity paradox" of information systems (IS) is that, despite enormous improvements in the underlying technology, the benefits of IS spending have not been found in aggregate output statistics. Our study uses new firm-level data on several components of IS spending for 1987-91, with a dataset including 367 large firms which generated approximately $1.8 trillion dollars in output in 1991. Our results indicate that IS spending has made a substantial andstatistically significant contribution to firm output. We find that the gross marginal product (MP) for computer capital averaged 81% for the firms in our sample. We find that the MP for computer capital is at least as large as the marginal product of other types of capital investment and that, dollar for dollar, IS labor spending generates at least as much output as spending on non-IS labor and expenses. Because the models we applied were similar to those that have previously been used to assess the contribution of IS and other factors of production, we attribute the different results to the fact that our data set is more current and larger than others explored. We conclude that the "productivity paradox" disappeared by 1991, at least in our sample of firms.


PROFIT-93-26 A Heuristic Multi-stage Algorithm for Segmenting Simply Connected Handwritten Numerals, in Heuristics, The Journal of Knowledge Engineering & Technology, vol. 6, No.4
(length: 10 pages) (single-spaced)
Winter 1993 by M.V. Nagendraprasad, Peter L. Sparks, & Amar Gupta

Abstract: A multi-stage algorithm for segmenting strings of numerals is presented in this paper. This algorithm utilizes no a priori knowledge of the number of digits in the string. By employing multiple stages, each operating independently of others, higher accuracies are obtained, as compared to the conventional scenario of using one stage only. Further, since the stages are triggered on an as-needed basis, computational bandwidth requirements are kept within acceptable limits. Tests with experimental data from NIST hand-printed character database show that the algorithm provides high accuracies.


PROFIT-95-27 Correction of Handwritten Numerals for Automated Data Processing in Engineering Applications of Artificial Intelligence, vol. 8, No. 4
(length: 4 pages) (single-spaced)
April 1995 by V. Feliberti, M.V. Nagendraprasad, & Amar Gupta

Abstract: A new slant-correction algorithm that searches for the minimum in the width space of a numeral, based on a binary search on the angular slant values, is presented here. It is based onthe idea that if a numeral is transformed through a series of slanted positions, it usually attains its minimum width when it is least slanted.


PROFIT-95-28 The Context Interchange Network Prototype
(length:26 pages) (single-spaced)
February 95 by Adil Daruwala, Cheng Goh, Scott Hofmeister, Karim Hussein, Stuart Madnick, & Michael D.

Abstract: In this paper we describe a prototype implementation of the Context Interchange Network (CIN). The CIN is designed to provide for the intelligent integration of contextually (i.e., semantically) heterogeneous data. The system uses explicit context knowledge and a context mediator to automatically detect conflicts and resolve them through context conversion. The network also allows for context explication; making it possible for a receiver of data to understand the meaning of the information represented by the source data. A financial application is used to illustrate the functionality of the prototype.


PROFIT-95-29 Context Interchange: Research in Using Knowledge About Data to Integrate Disparate Sources
(length: 18 pages) (double-spaced)
March 95 by Amar Gupta & Stuart E. Madnick

Abstract: The Context Interchange (CI) project, a component of the "PROFIT" initiative at MIT, deals with transforming information across functional boundaries and organizational boundaries to suit the individual needs of an increasingly diverse set of users of such information. By providing effective "on-off ramps" to the emerging information highways, the goal is to enhance drastically the ability to make effective use of large volumes of information obtained from disparate sources (each with its own set of underlying meanings and assumptions), by transforming automatically the incoming streams of data to the desired meaning (or context) needed for a particular job or function. This paper provides an introduction to the approaches being pursued in the Context Interchange project, as well as a summary of the key accomplishments to date.


PROFIT-95-30 Formulating Global Integrity Constraints During Derivation of Global Schema
(length: 24 pages) (single-spaced)
March 95 by M.P. Reddy, B.E. Prasad, & Amar Gupta

Abstract: In a heterogeneous distributed database environment, each component database is characterized by its own logical schema and its own set of integrity constraints. The task of generating a global schema from a constituent local schemata has been addressed by many researchers. The complementary problem of using multiple sets of integrity constraints to create a new set of global integrity constraints is examined in this paper. These global integrity constraints facilitate both query optimization and update validation tasks.

PROFIT-97-31 Temporal Data Mining
(length: 43pages) (single-spaced)
March 97 by J. Shanmugasundaram, M.V. Nagendra Prasad, & Amar Gupta
Abstract: Finding patterns in historical data is an important problem in many domains. In this paper, we concentrate on the problem of estimating the future sales of products using past sales data. We use recurrent neural networks as the tool to predict future sales because of (a) its power to generalize trends and (b) its ability to store relevant information about past sales. We first describe the implementation of a distributed recurrent neural network using the real time recurrent learning algorithm. We then describe the validation of this implementation by providing results of tests with well known examples from the literature. The description and analysis of the limitation, based on the predictions of noisy mathematical functions, are also given.

PROFIT-97-32 Technologies for Connecting and Using Databases and Server Applications on the World Wide Web
(length: 56pages) (double-spaced)
May 97 Adolfo G. Castellon Jr.

Abstract: This paper presents a study of current technologies used to build applications that make use of the World Wide Web. In particular, this paper discusses three different technologies (Java Beans, OLE/ActiveX and CORBA) born of very different heritage, that are evolving towards a common goal. The emphasis is on technologies that have been recently developed to connect databases to Web applications. Two applications created by the author are used to demonstrate specific types of emerging Web technologies.


PROFIT-97-33 An Architecture for Secure Transactions in the Processing Bank Checks
(length: 97pages) (double-spaced)
May 97 Joseph Figueroa

Abstract: Check is a system architecture that enables the banking system to increase the check processing speed and decrease the check processing cost. The iCheck architecture provides a secure, reliable, and widely available infrastructure for accessing the existing bank payment system over the internet. This is accomplished by designing, developing and integrating the components necessary to decrease bank check processing time and to reduce the need for human intervention in a secure form over open public networks. These components are implemented in two main modules and demonstrate the inter-operability and the infrastructure that can benefit banks by allowing them a consistent, secure, trusted way of offering faster and cheaper services. This infrastructure should also enable the offering of innovative new systems which take advantage of new developments in telecommunications technology and enhancements to the existing banking payment system.

PROFIT-97-34 Telecommunications in Mexico
(length: 100pages) (double-spaced)
May 97 Adrian E. Gonzalez

Abstract: The telecommunications industry in Mexico has recently experienced a dramatic change in its regulatory and market structure. After many years of government control, telecommunications services have deteriorated to the point where they are unable to meet the population's needs. However, during the last eight years, the Mexican government has reformed this and many other industries by allowing private ownership and competition. In line with these reforms, the government has outlined a series of goals related to telephone penetration, modernization of services and cost reductions.

This paper offers an analysis of the telecommunications industry in Mexico during three time periods: the past (1900-1989), the deregulation period (1990-1997) and the future (1998-2005). For each period, the prevailing political and economic environment is also analyzed in order to provide an accurate context for the study of the telecommunications industry. The analysis also extends to major events that have or are expected to occur within each time period and the influence of these events on this industry. Emphasis is put on the second period with an in-depth analysis of the role of the government, companies, and customers in the deregulation process. Special attention is given to competitors' marketing strategies and structures. In the last section, the probability of achieving the goals set by the government is analyzed. Based upon the results of the analysis, at least one of the government's goals is unlikely to be realized - that of doubling the number of lines per 100 inhabitants. Extra support in the form of government subsidies or other assistance will be required to reach this goal.

At the end, there is a brief description of the similarities and differences between the deregulation process of the telecommunications industry in Chile and in Mexico. Based on this comparison of the two Latin American countries, the author recommends that Brazil utilize Mexico's model of deregulation in the near future.

PROFIT-97-35 Neural Networks Based Data Mining Applications for Medical Inventory Problems
(length: 15pages) (single-spaced)
May 97 Kanti Bansal, Sanjeev Vadhavkar, & Amar Gupta

Abstract: One of the main requirements for agile organizations is the development of information systems for effective linkages with their suppliers, customers, and other channel partners involved in transportation, distribution, warehousing and maintenance. Agility increasingly depends on the quality of decision making and companies are continuously trying to improve the quality of decisions by learning from past transactions and decisions. An efficient inventory management system based on contemporary information systems is a first step in this direction. This paper discusses the use of neural networks to optimize the inventory in a a large medical distribution. The paper defines the inventory patterns, elaborates on the method of constructing and choosing an appropriate neural network to solve the problem. As an extension to the neural network models, statistical procedures and assumptions used to augment the neural network model are explained in detail. With the large number of neural network classes, it is difficult to identify a particular class and model which offers the best inventory model. The paper describes an elaborate scheme based on traditional statistical techniques to evaluate the best neural network type. The paper concludes with a detailed evaluation of the "neural network solution". Using the method proposed in this paper, the total inventory level of the concerned medical distribution organization could be decreased from over a billion dollars to about half-a-billion dollars (reduction by 50 percent).