Executive Summaries for PROFIT Working Paper Series
The PROFIT INITIATIVE
E40-313, MIT Sloan School of Management
30 Memorial Drive
Cambridge, MA 02139 USA
Executive Summaries of recent Working Papers are available online at:
www.ssrn.com.
You can view relevant papers as follows:
Dr. Amar Gupta
Dr. Stuart Madnick
Dr. Michael Siegel
Dr. Rafael Palacios
Earlier Working Papers are as follows:
PROFIT-92-01 Toward Quality Data: An Attribute-Based Approach
(length: 30 pages)
November 92 by Richard Y. Wang, M.P. Reddy, & Henry B. Kon (double-spaced)
| Abstract: | The need for a quality
perspective in the management of the data resource is
becoming increasingly critical. Managing data quality,
however, is a complex task. Although it would be ideal to
achieve zero defect data, this may not always be
attainable. Moreover, different users may have different
criteria in determining the quality of data. This
suggests that it would be useful to be able to tag data
with quality indicators which are characteristics of the
data and its manufacturing process. From these quality
indicators, users can make their own judgment of the
quality of the data for the specific application at hand. This paper investigates how quality indicators may be specified, stored,retrieved, and processed. Specifically, we propose an attribute-based data model that facilitates cell-level tagging of data. Included in this attribute-based model are a mathematical model description that extends the relational model, set of quality integrity rules, and a quality indicator algebra which can be used to process SQL queries that are augmented with quality indicators. A data quality requirements analysis methodology that extends the Entity Relationship model is also presented. |
PROFIT-92-02 A Knowledge-based Approach to
Assisting in Data Quality Judgement
(length: 10 pages) (single-spaced)
December 92 by Y. Jang, Henry B. Kon & Richard
Y. Wang
| Abstract: | As the integration of
information systems enables greater accessibility to data
from multiple sources, the issue of data quality becomes
increasingly important. This paper attempts to formally
address the data quality judgment problem with a
knowledge-based approach. Our analysis has identified
several related theoretical and practical issues. For
example, data quality is determined by several factors,
referred to as quality parameters. Quality parameters are
often not independent of each other, raising the issue of
how to represent relationships among quality parameters
and reason with such relationships to draw insightful
knowledge about the overall quality of data. In particular, this paper represents a data quality reasoner. The data quality reasoner is a data quality judgment model based on the notion of a "census of needs." It provides a framework for deriving an overall data quality value from local relationships among quality parameters. The data quality reasoner will assist data consumers in judging data quality. This is particularly important when a large amount of data involved in decision-making come from different, unfamiliar sources. |
PROFIT-93-03 Algorithms for Thinning and
Rethickening Binary Digital Patterns inDigital Signal Processing
(length: 6 pages)
(single-spaced)
January 93 by M.V. Nagendraprasad, Patrick S.P.
Wang, & Amar Gupta
| Abstract: | Pattern recognition and image processing application frequently deal with raw inputs that contain lines of different thickness. In some cases, this variation in the thickness is an asset, enabling quicker recognition of the features in the input image. For example, in processing aerial photographs, detection of major landmarks can be aided by the variations in the thickness of the contours. In other cases, the variation can be a liability, and can cause degradation in the accuracy and speed of recognition. For example, in the case of handwritten characters, the degree of uniformity of the thickness of individual strokes directly impacts the probability of successful recognition, especially if neural network based recognition techniques are employed. This paper describes the thinning stage, a theoretical proof related to a new and faster thinning algorithm, and the rethickening stage of the handwriting recognition process. |
PROFIT-93-04 An Integrated Architecture for
Recognition of Totally Unconstrained Handwritten Numerals, in
International Journal of Pattern Recognition and Artificial
Intelligence, Vol. 7, No. 4
(length: 16 pages) (single-spaced)
January 93 by Amar Gupta, M.V. Nagendraprasad, A.
Liu, P.S.P. Wang, & S. Ayyadurai
| Abstract: | A multi-staged system for off-line handwritten numeral recognition is presented here. After scanning, the digitized binary bit map image of the source document is passed through a preprocessing stage which performs segmentation, thinning and rethickening, normalization, and slant correction. The recognizer is a three-layered neural net trained with back-propagation algorithm. While a few systems that use three-layered nets for recognition have been presented in the literature, the contribution of our system is based on two aspects: elaborate preprocessing based on structural pattern recognition methods combined with a neural net based recognizer; and integration of neural net based and structural pattern recognition methods to produce high accuracies. |
PROFIT-93-05 A Process View of Data Quality
(length: 17 pages) (double-spaced)
March 93 by Henry B. Kon, Jacob Lee & Richard
Wang
| Abstract: | We posit that the term
data quality, though used in a variety of research and
practitioner contexts, has been inadequately
conceptualized and defined. To improve data quality, we
must bound and define the concept of data quality. In the
past, researchers have tended to take a product-oriented
view of data quality. Though necessary, this view is
insufficient for three reasons. First, data quality
defects in general are difficult to detect by simple
inspection of the data product. Second, definitions of
data quality dimensions and defects, while useful
intuitively, tend to be ambiguous and interdependent.
Third, in line with a cornerstone of TDQM philosophy,
emphasis should be placed on process management to
improve product quality. The objective of this paper is to characterize the concept of data quality from a process perspective. A formal process model of an information system (IS) is developed which offers precise process constructs for characterization data quality. With these constructs, we rigorously define the key dimensions of data quality. The analysis also provides a framework for examining the causes of data quality problems. Finally, facilitated by the exactness of the model, an analysis is presented of the interdependencies among the various data quality dimensions. |
PROFIT-93-06 A Research Retrospective of
Innovation Inception and Success: The Technology-Push Demand-Pull
Question
(length: 27 pages) (double-spaced)
March 93 by Shyam R. Chidamber & Henry B. Kon
| Abstract: | Innovation researchers
have frequently debated whether organizational innovation
is driven by market demand or by technological shifts.
The market demand school of thought suggests that
organizations innovate based on market needs, whereas the
technology proponents claim that change in technology is
the primary driver of innovation. Collectively, empirical
research studies on technological innovation are
inconclusive regarding this technology-push demand-pull
(TPDP) debate. Eight key studies relevant to this issue
are examined for their methods, implications, and caveats
to establish a structured way of interpreting the various
results. The philosophical underpinnings of market demand
and technology factors as drivers of innovation are also
examined. This paper suggests that much of the contention between the technology-push and demand-pull findings is due to different research objectives, definitions, and models. The main conclusion is that there exists a clear relationship between the research models used in these studies and the outcomes observed, suggesting that differences in problem statement and research contracts may be causing the apparent incongruity in research findings. Organizational and national policy level issues are also examined in light of the finding that different levels of analysis lead to different results. |
PROFIT 93-07 Towards an Active Schema
Integration Architecture for Heterogeneous Database Systems, in
Third International Workshop on Research Issues on Data
Engineering: Interoperability in Multidatabase Systems, Vienna,
Austria
(length: 9 pages) (single-spaced)
April 93 by M.P. Reddy, Michael Siegel and Amar
Gupta
| Abstract: | In this paper we describe our research in the development of a four-layered architecture for Heterogeneous Distributed Database Management Systems (HDDBMS). The architecture includes the local schema , local object schema, global schema, and global view schema. This architecture was developed to support the propagation of local database semantics (e.g., integrity constraints, context) to the global schema and global view. Constraints propagated to the global level can be used to derive new constraints that could not have been recognized by any of the local components. These constraints significantly reduce query processing costs in the HDDBMS environment by permitting incorporation of techniques similar to semantic query optimization in the single database environment [CFM84,HZ80,Kin81,SSS91]. These techniques are used on the global query to identify candidate databases and reduce the number of required local databases. |
PROFIT-93-08 Data Quality Requirements Analysis
and Modeling
(length: 7 pages) (single-spaced)
April 93 by Richard Y. Wang, Henry B. Kon, &
Stuart E. Madnick
| Abstract: | Data engineering is the
modeling and structuring of data in its design,
development, and use. An ultimate goal of data
engineering is to put quality of data in the hands of
users. Specifying and ensuring the quality of data,
however, is an area in data engineering that has received
little attention. In this paper we: (1) establish a set
of premises, terms, and definitions for data quality
management, and (2) develop a step-by-step methodology
for defining and documenting data quality parameters
important to users. These quality parameters are used to
determine quality indicators, to be tagged to data items,
about the data manufacturing process such as data source,
creation time, and collection method. Given such tags,
and the ability to query over them, users can filter out
data having undesirable characteristics. The methodology developed provides a concrete approach to data quality requirements collection and documentation. It demonstrates that data quality can be an integral part of the database design process. The paper also provides a perspective for the migration towards quality management of data in a database environment. |
PROFIT 93-09 Detection of Courtesy Amount Block
on Bank Checks
(length: 33 pages) (single-spaced)
May 93 by Arun Agarwal, Len M. Granowetter, Amar
Gupta, & P.S.P. Wang
| Abstract: | This paper presents a multi-staged technique for locating the courtesy amount block on bank checks. In the case of a check processing system, many of the proposed methods are not acceptable, due to the presence of many fonts and text sizes, as well as the short length of many text strings. This paper will describe a particular method chosen to implement a Courtesy Amount Block Locator (CABL). First, the connected components in the image are identified. Next, strings are constructed on the basis of proximity and horizontal alignment of characters. Finally, a set of rules and heuristics are applied to these strings to chose the correct one. The chosen string is only reported if it passes a verification test, which includes an attempt to recognize the currency sign. |
PROFIT-93-10: An Adaptive Modular Neural
Network With Application To Unconstrianed Character Recognition
(length: 27 pages)
(double-spaced)
August 93 by Lik Mui, Arun Agarwal, Amar Gupta,
& Patrick Wang
| Abstract: | The topology and capacity of a traditional multilayer neural system, as measured by the number of connections in the network, has surprisingly little impact on its generalization ability. This paper presents a new adaptive modular network that offers superior generalization capability. The new network provides significant fault tolerance, quick adaptation to novel inputs, and high recognition accuracy. We utilize this paradigm for recognition of unconstrained handwritten characters. |
PROFIT-93-11: Run-time Type Information and
Incremental Loading in C++, in Journal of Object Oriented
Programming
(length 15 pages) (double-spaced)
September 93 by Murali K. Vemulapati, Sriram
Duvvuru, & Amar Gupta
| Abstract: | We present the design and implementation strategy for an integrated programming environment which facilitates specification, implementation, and execution of persistent C++ programs. Our system is implemented in E, a persistent programming language based on C++. The environment provides type identity and type persistence, i.e., each user-defined class has a unique identity and persistence across compilations. The system provides Run-time type information for the user-defined types and it provides efficient run-time access to the members of an object by generating maps of the objects. It also supports incremental linking and loading of new classes or modification of classes existing in the database. |
PROFIT 94-12 Context Interchange: Overcoming the Challenges of
Large-Scale Interoperable Database Systems in a Dynamic
Environment
(length: 25 pages) (double-spaced)
February 94 by Cheng Hian Goh, Stuart E. Madnick, & Michael
D. Siegel
| Abstract: | Research in database
interoperability has primarily focused on circumventing
schematic and semantic incompatibility arising from
autonomy of the underlying databases. We argue that,
while existing integration strategies might provide
satisfactory support for small or static systems, their
inadequacies rapidly become evident in large-scale
interoperable database systems operating in a dynamic
environment. The frequent entry and exit of heterogeneous
interoperating agents renders "frozen"
interfaces (e.g., shared schemas) impractical and places
an ever-increasing burden on the system to accord more
flexibility to heterogeneous users. User heterogeneity
mandates that disparate users' conceptual models and
preferences must be accommodated, and the emergence of
large-scale networks suggests that the integration
strategy must be scalable and capable of dealing with
evolving semantics. As an alternative to the integration approaches presented in the literature, we propose a strategy based on the notion of context interchange. In the context interchange framework, assumptions underlying the interpretations attributed to data are explicitly represented in the form of data contexts with respect to a shared ontology. Data exchange in this framework is accomanied by context mediation whereby data originating from multiple source contexts is automatically transformed to comply with the receiver context. The focus on data contexts giving rise to data heterogeneity (as opposed to focusing on data conflicts exclusively) has a number of advantages over classical integraton approaches providing interoperating agents with greater flexibility as well as a framework for graceful evolution and efficient implementation of large-scale interoperable database systems. |
PROFIT-94-13: Incremental Loading in the
Persistent C++ Language E
(length: 26 pages) (double-spaced)
February 94 by Murali Vemulapati, D. Sriram, & Amar Gupta
| Abstract: | E is an extension of C++ language providing database types and persistence. Persistence in E entails some form of dynamic linking of method code. This is because a program might encounter, on the persistent store, an object whose type was not known to the program when it was compiled. This necessitates dynamic linking of the method code of the corresponding type to the program so that the type definition is made available to the program. The current run-time support library provided by E is inadequate for this purpose. We modify and extend the run-time library of E by adding functionalities to dynamically link and unlink object modules. We then present the design of a class type of facilitate persistent types. Each user-defined type will have a unique persistent type object associated with it. Class type provides methods for dynamic linking and unlinking of user-defined classes using the extended run-time support. In addition, class type ensures identity of user-defined types, i.e., each user-defined type will have a unique identity across compilations of a program. |
PROFIT-94-14 A Knowledge Based Segmentation
Algorithm for Enhanced Recognition of Handwritten Courtesy
Amounts
(length: 17 pages) (single-spaced)
March 94 by Karim Hussein, Arun Agarwal, Amar Gupta, &
Patrick Shen-Pei Wang
| Abstract: | A knowledge based segmentation critic algorithm to enhance recognition of courtesy amounts on bank checks is proposed in this paper. This algorithm extracts the context from the handwritten material and uses a syntax parser based on a deterministic finite automation to provide adequate feedback to enhance recognition. The segmentation critic presented is capable of handling a number of commonly used styles for courtesy amount representation. Both handwritten and machine written numeric strings were utilized to test the efficacy of the preprocessor for the check recognition system described in this paper. The substitution error fell by 1.0% in our early tests. |
PROFIT-94-15 Error Browsing and Mediation:
Interoperability Regarding Data Error
(length: 10 pages) (single-spaced)
July 94 by Henry B. Kon & Michael D. Siegel
| Abstract: | Our research goals
involve development of methodologies and systems to
support administration and sharing errored data (e.g.,
data having incompleteness, inaccuracy, and invalid
syntax). Data sources are assumed to have non-trivial
degree of error. Data receivers are assumed to have
differing sensitivity to various forms of error. Browsing
involves measurement of error. Mediation involves
run-time management of the source-receiver "error
fit". In this extended abstract we provide a foundation for error definition and measurement, and discuss their role in browsing and mediation. Included are: (1) a classification scheme for error definition as syntactic error and semantic error types, (2) a theoretical basis for relating semantic error to data meaning, (3) an outline of three general approaches to error measurement, and (4) an overview of browsing and mediation. |
PROFIT-94-16 An Ontological and Semantical
Approach to Source-Receiver Interoperability
(length: 10 pages) (double-spaced)
October 94 by Jacob Lee & Michael D. Siegel
| Abstract: | In this paper, we propose a theoretical approach to address the issue of semantic interoperability between a data source and a data receiver in the framework of the context interchange architecture. In particular, this approach highlights the concept of a statement as a unit of exchange. Several statement conversion axioms are proposed for source-receiver interoperability. The conceptual foundation for this approach is derived from Mario Bunge's Ontology and Semantics. The implications of this approach for further research on the design of interoperable database systems based on the context interchange architecture is then discussed. |
PROFIT-94-17 International Multi-Company
Collaborative Engineering: A Study of Japanese Engineering and
Construction Firms
(length: 39 pages) (double-spaced)
August 94 by Masatoshi Kano, Ram Duvvuru Sriram, & Amar Gupta
| Abstract: | Concurrent/collaborative engineering (CE) often requires the collaboritive participation of several companies. Management of such multi-company collaboration in engineering becomes more difficult when companies from different countries are involved, as is the trend in the Japanese engineering and construction industry (E&C). This paper focuses on the conditions required for successful management of such CE. The authors first propose a conceptual time- and cost-based model of multi-company CE work as a framework. Then using this framework, current practices of Japanese E&C firms are analyzed. Major findings are that (1) international multi-company CE should be designed with careful consideration, not only to task split and allocation but also to inter-firm task dependencies; and (2) Japanese E&C firms need to alter their inter-firm coordination scheme significantly in order to derive full benefits in the global marketplace. |
PROFIT-94-18 Image-Information Systems for
Traffic Management
(length: 19 pages)(double -spaced)
August 94 by Ichiro Masaki & Amar Gupta
| Abstract: | This paper describes some examples of image-information systems which are relevant to traffic management. After reviewing related work in the fields of traffic management, intelligent vehicles, stereo vision, and ASIC-based approaches, the paper focuses on a stereo vision system for intelligent cruise control. The system measures the distance to the vehicle in front using trinocular triangualtion. An application specific processor architecture was developed to offer low mass-production cost, real-time operation, low power consumption, and small physical size. The system was installed in the trunk of a car and evaluated successfully on highways. |
PROFIT-94-19 Context Interchange: A Lattice
Based Approach , in Knowledge-Based Systems,
Butterworth-Heinemann, Oxford England
(length: 25) (double-spaced)
August 94 M.P. Reddy and A. Gupta
| Abstract: | The level of semantic data interoperability between a source and a receiver is a function of the context interchange mechanism that operates between the source and the receiver. The semantic interoperability mechanisms in existing systems are usually static in nature and cannot cope with changes in the semantics of data either at the source or at the receiver. In this paper, we propose a context interchange mechanism, based on lattice theory which can handle changes in the semantics of data at both the source and the receiver. A site-copy selection algorithm is also presented in this paper which selects the set of sources that can supply semantically meaningful data to the query or source. |
PROFIT-94-20 A Methodology for Integration of
Heterogeneous Databases, in IEEE Transactions on Kowledge and
Data Engineering, Vol. 6, No. 6
(length: 13 pages) (single-spaced)
December 94 by M.P. Reddy, B.E. Prasad, P.G. Reddy, and A. Gupta
| Abstract: | The transformation of existing local databases to meet diverse application needs at the global level is performed through a four-layered procedure that stresses total schema integration and virtual integration of local databases. The proposed methodology covers both schema integration and database integration, and uses a four-layered schema architecture (local schemata, local object schemata, global schema, and global view schemata) with each layer presenting an integrated view of the concepts that characterize the layer below. Mechanisms for accomplishing this objective are presented in theoretical terms, along with a running example. Object equivalence classes, property equivalence classes, and other related concepts are discussed in the context of logical integration of heterogeneous schemata, while object instance equivalence classes, property instance equivalence classes, and other related concepts are discussed for data integration purposes. The proposed methodology resolves naming conflicts, scaling conflicts, type conflicts, level of abstraction, and other types of conflicts during schema integration, and data inconsistencies during data integration. |
PROFIT-93-21 Information Technology,
Incentives, and the Optimal Number of Suppliers
(length: 16 pages) (single-spaced)
Fall 93 by J. Yannis Bakos & Erik Brynjolfsson
| Abstract: | Buyers are transforming
their relationships with suppliers. For example, instead
of playing off dozens or even hundreds of competing
suppliers against one another, many firms are finding it
more profitable to work closely with only a small number
of "partners." In this paper we explore some
causes and consequences of this transformation. We apply
the economic theory of incomplete contracts to determine
the optimal strategy for a buyer. We find that the buyer
firm will often maximize profits by limiting its options
and reducing its own bargaining power. This may seem
paradoxical in an age of cheap communications costs and
aggressive competition. However, unlike earlier models
that focused on coordination costs, we focus on the
critical importance of providing incentives for
suppliers. Our results spring from the need to make it worthwhile for suppliers to invest in "noncontractibles" such as innovation, responsiveness, and information sharing. Such incentives will often be stronger when the number of competing suppliers is small. The findings of the theoretical models appear to be consistent with observations from empirical research which highlights the key role of information technology in enabling this transformation. |
PROFIT-94-22 Network Externalities in
Microcomputer Software: An Econometric Analysis of the
Spreadsheet Market
(length: 24 pages) (double-spaced)
November 1994 by Erik Brynjolfsson & Chris F. Kemerer
| Abstract: | As an economic good, software has a number of interesting properties. In addition to the value of intrinsic features, the creation of or conformance to industry standards may be critical to the success of a product. This research builds and evaluates econometric models to determine which product features are important in the purchase and pricing decisions for microcomputer software. A special emphasis is to identify the effects of standards and network externalities. The results of this research and the general model proposed can be used to estimate the relative values of software package features, adherence to standards, and increased market share. It also quantifies the opportunities created by changes in technology architecture. Finally, the results offer guidance into current public policy issues such as the value of intellectual property embodied in software. |
PROFIT-93-23 The Productivity Paradox of
Information Technology
(length: 11 pages) (single-spaced)
December 93 by Erik Brynjolfsson
| Abstract: | The relationship between information technology (IT) and productivity is widely discussed but little understood. Delivered computing power in the U.S. economy has increased by more than two orders of magnitude since 1970 yet productivity, especially in the service sector, seems to have stagnated. This article summarizes what we know and do not know, distinguishes the central issues from diversions, and clarifies the questions that can profitably be explored in future research. After reviewing and assessing the research to date, it appears that the shortfall of IT productivity is as much due to deficiencies in our measurement and methodological tool kit as to mismanagement by developers and users of IT. |
PROFIT-94-24 Technology's True Payoff
(length: 3 pages) (single -spaced)
October 94 by Erik Brynjolfsson
| Abstract: | A few years ago, the business press was filled with stories about the so-called "productivity paradox" of computers: The billions of dollars poured into computers didn't seem to boost worker output. The pendulum has now swung with full force in the opposite direction. After the author and Lorin Hitt published a study that found a correlation between computer investment and significantly higher output in a sample of 300 large companies, dozens of business publications ran stories about the "technology payoff." This article examines the tools for assessing the return on information technology, and focuses attention on intangible benefits such as inventory savings, reduced space requirements, and decrease in rework. |
PROFIT-94-25 Paradox Lost? Firm-level Evidence
on the Returns to Information Systems Spending
(length: 40 pages) (double-spaced)
November 94 by Erik Brynjolfsson & Lorin Hitt
| Abstract: | The "productivity paradox" of information systems (IS) is that, despite enormous improvements in the underlying technology, the benefits of IS spending have not been found in aggregate output statistics. Our study uses new firm-level data on several components of IS spending for 1987-91, with a dataset including 367 large firms which generated approximately $1.8 trillion dollars in output in 1991. Our results indicate that IS spending has made a substantial andstatistically significant contribution to firm output. We find that the gross marginal product (MP) for computer capital averaged 81% for the firms in our sample. We find that the MP for computer capital is at least as large as the marginal product of other types of capital investment and that, dollar for dollar, IS labor spending generates at least as much output as spending on non-IS labor and expenses. Because the models we applied were similar to those that have previously been used to assess the contribution of IS and other factors of production, we attribute the different results to the fact that our data set is more current and larger than others explored. We conclude that the "productivity paradox" disappeared by 1991, at least in our sample of firms. |
PROFIT-93-26 A Heuristic Multi-stage Algorithm
for Segmenting Simply Connected Handwritten Numerals, in
Heuristics, The Journal of Knowledge Engineering &
Technology, vol. 6, No.4
(length: 10 pages) (single-spaced)
Winter 1993 by M.V. Nagendraprasad, Peter L. Sparks, & Amar
Gupta
| Abstract: | A multi-stage algorithm for segmenting strings of numerals is presented in this paper. This algorithm utilizes no a priori knowledge of the number of digits in the string. By employing multiple stages, each operating independently of others, higher accuracies are obtained, as compared to the conventional scenario of using one stage only. Further, since the stages are triggered on an as-needed basis, computational bandwidth requirements are kept within acceptable limits. Tests with experimental data from NIST hand-printed character database show that the algorithm provides high accuracies. |
PROFIT-95-27 Correction of Handwritten Numerals
for Automated Data Processing in Engineering Applications of
Artificial Intelligence, vol. 8, No. 4
(length: 4 pages) (single-spaced)
April 1995 by V. Feliberti, M.V. Nagendraprasad, & Amar Gupta
| Abstract: | A new slant-correction algorithm that searches for the minimum in the width space of a numeral, based on a binary search on the angular slant values, is presented here. It is based onthe idea that if a numeral is transformed through a series of slanted positions, it usually attains its minimum width when it is least slanted. |
PROFIT-95-28 The Context Interchange Network
Prototype
(length:26 pages) (single-spaced)
February 95 by Adil Daruwala, Cheng Goh, Scott Hofmeister, Karim
Hussein, Stuart Madnick, & Michael D.
| Abstract: | In this paper we describe a prototype implementation of the Context Interchange Network (CIN). The CIN is designed to provide for the intelligent integration of contextually (i.e., semantically) heterogeneous data. The system uses explicit context knowledge and a context mediator to automatically detect conflicts and resolve them through context conversion. The network also allows for context explication; making it possible for a receiver of data to understand the meaning of the information represented by the source data. A financial application is used to illustrate the functionality of the prototype. |
PROFIT-95-29 Context Interchange: Research in
Using Knowledge About Data to Integrate Disparate Sources
(length: 18 pages) (double-spaced)
March 95 by Amar Gupta & Stuart E. Madnick
| Abstract: | The Context Interchange (CI) project, a component of the "PROFIT" initiative at MIT, deals with transforming information across functional boundaries and organizational boundaries to suit the individual needs of an increasingly diverse set of users of such information. By providing effective "on-off ramps" to the emerging information highways, the goal is to enhance drastically the ability to make effective use of large volumes of information obtained from disparate sources (each with its own set of underlying meanings and assumptions), by transforming automatically the incoming streams of data to the desired meaning (or context) needed for a particular job or function. This paper provides an introduction to the approaches being pursued in the Context Interchange project, as well as a summary of the key accomplishments to date. |
PROFIT-95-30 Formulating Global Integrity
Constraints During Derivation of Global Schema
(length: 24 pages) (single-spaced)
March 95 by M.P. Reddy, B.E. Prasad, & Amar Gupta
| Abstract: | In a heterogeneous distributed database environment, each component database is characterized by its own logical schema and its own set of integrity constraints. The task of generating a global schema from a constituent local schemata has been addressed by many researchers. The complementary problem of using multiple sets of integrity constraints to create a new set of global integrity constraints is examined in this paper. These global integrity constraints facilitate both query optimization and update validation tasks. |
PROFIT-97-31 Temporal Data Mining
(length: 43pages) (single-spaced)
March 97 by J. Shanmugasundaram, M.V. Nagendra Prasad, & Amar
Gupta
| Abstract: | Finding patterns in historical data is an important problem in many domains. In this paper, we concentrate on the problem of estimating the future sales of products using past sales data. We use recurrent neural networks as the tool to predict future sales because of (a) its power to generalize trends and (b) its ability to store relevant information about past sales. We first describe the implementation of a distributed recurrent neural network using the real time recurrent learning algorithm. We then describe the validation of this implementation by providing results of tests with well known examples from the literature. The description and analysis of the limitation, based on the predictions of noisy mathematical functions, are also given. |
PROFIT-97-32 Technologies for Connecting and
Using Databases and Server Applications on the World Wide Web
(length: 56pages) (double-spaced)
May 97 Adolfo G. Castellon Jr.
| Abstract: | This paper presents a study of current technologies used to build applications that make use of the World Wide Web. In particular, this paper discusses three different technologies (Java Beans, OLE/ActiveX and CORBA) born of very different heritage, that are evolving towards a common goal. The emphasis is on technologies that have been recently developed to connect databases to Web applications. Two applications created by the author are used to demonstrate specific types of emerging Web technologies. |
>
PROFIT-97-33 An Architecture for Secure
Transactions in the Processing Bank Checks
(length: 97pages) (double-spaced)
May 97 Joseph Figueroa
| Abstract: | Check is a system architecture that enables the banking system to increase the check processing speed and decrease the check processing cost. The iCheck architecture provides a secure, reliable, and widely available infrastructure for accessing the existing bank payment system over the internet. This is accomplished by designing, developing and integrating the components necessary to decrease bank check processing time and to reduce the need for human intervention in a secure form over open public networks. These components are implemented in two main modules and demonstrate the inter-operability and the infrastructure that can benefit banks by allowing them a consistent, secure, trusted way of offering faster and cheaper services. This infrastructure should also enable the offering of innovative new systems which take advantage of new developments in telecommunications technology and enhancements to the existing banking payment system. |
PROFIT-97-34 Telecommunications in Mexico
(length: 100pages) (double-spaced)
May 97 Adrian E. Gonzalez
| Abstract: | The
telecommunications industry in Mexico has recently
experienced a dramatic change in its regulatory and
market structure. After many years of government control,
telecommunications services have deteriorated to the
point where they are unable to meet the population's
needs. However, during the last eight years, the Mexican
government has reformed this and many other industries by
allowing private ownership and competition. In line with
these reforms, the government has outlined a series of
goals related to telephone penetration, modernization of
services and cost reductions. This paper offers an analysis of the telecommunications industry in Mexico during three time periods: the past (1900-1989), the deregulation period (1990-1997) and the future (1998-2005). For each period, the prevailing political and economic environment is also analyzed in order to provide an accurate context for the study of the telecommunications industry. The analysis also extends to major events that have or are expected to occur within each time period and the influence of these events on this industry. Emphasis is put on the second period with an in-depth analysis of the role of the government, companies, and customers in the deregulation process. Special attention is given to competitors' marketing strategies and structures. In the last section, the probability of achieving the goals set by the government is analyzed. Based upon the results of the analysis, at least one of the government's goals is unlikely to be realized - that of doubling the number of lines per 100 inhabitants. Extra support in the form of government subsidies or other assistance will be required to reach this goal. At the end, there is a brief description of the similarities and differences between the deregulation process of the telecommunications industry in Chile and in Mexico. Based on this comparison of the two Latin American countries, the author recommends that Brazil utilize Mexico's model of deregulation in the near future. |
PROFIT-97-35 Neural Networks Based Data Mining
Applications for Medical Inventory Problems
(length: 15pages) (single-spaced)
May 97 Kanti Bansal, Sanjeev Vadhavkar, & Amar Gupta
| Abstract: | One of the main requirements for agile organizations is the development of information systems for effective linkages with their suppliers, customers, and other channel partners involved in transportation, distribution, warehousing and maintenance. Agility increasingly depends on the quality of decision making and companies are continuously trying to improve the quality of decisions by learning from past transactions and decisions. An efficient inventory management system based on contemporary information systems is a first step in this direction. This paper discusses the use of neural networks to optimize the inventory in a a large medical distribution. The paper defines the inventory patterns, elaborates on the method of constructing and choosing an appropriate neural network to solve the problem. As an extension to the neural network models, statistical procedures and assumptions used to augment the neural network model are explained in detail. With the large number of neural network classes, it is difficult to identify a particular class and model which offers the best inventory model. The paper describes an elaborate scheme based on traditional statistical techniques to evaluate the best neural network type. The paper concludes with a detailed evaluation of the "neural network solution". Using the method proposed in this paper, the total inventory level of the concerned medical distribution organization could be decreased from over a billion dollars to about half-a-billion dollars (reduction by 50 percent). |