The DEVLAB -- Dartmouth Experimental Visualization Laboratory

This is the experimental DEVLAB web server. Click here for the main DEVLAB web server.

EBITS: Electronic Business & Information Technology for Society Research Consortium

Development of
an Educational and Research Infrastructure
For Safe Electronic Commerce


Principal and Co-Principal Investigators:

Nabil Adam (PI)
Rutgers University, Director of the CIMIC Center
adam@adam.rutgers.edu

Vijay Atluri
Rutgers University, Asst. Professor, MSIS and CIMIC
atluri@andromeda.rutgers.edu

Johnathan Bick
Rutgers University, School of Law
JBICK@fslaw.com

Matthew Bishop
University of California, Davis, Assoc. Professor, Computer Science
bishop@cs.ucdavis.edu

Marcus Felson
Rutgers University, School of Criminal Justice
felson@andromeda.rutgers.edu

Peter Gloor
PriceWaterhouseCoopers, Senior Partner, Finance and IT, Zurich
gloor@acm.org

Marc Holzer
Rutgers University, Professor, Public Administration
mholzer@pipeline.com

Fillia Makedon
Dartmouth College, Professor, Director of Dartmouth Experimental Visualization Lab
makedon@cs.dartmouth.edu

Charles Owen
Michigan State University, Asst. Professor, Director of the Media and Entertainment Technologies Lab
cbowen@cse.msu.edu

Stephen Powell
Dartmouth College, Assoc. Professor, Tuck Business School
Stephen.Powell@Dartmouth.EDU

James Storer
Brandeis University, Professor and Chair
storer@cs.brandeis.edu

Peter Scheuermann
Northwestern University, Professor, ECE
peters@ece.nwu.edu


Table of Contents

1 Introduction
   1.1 General Social Science Theme
   1.2 Basic Model
2 Objectives And Expected Outcomes
3. Examples Of Discovering Patterns Of E-Commerce Misuse
   3.1 Fraud In Electronic Commerce (Holzer)
   3.2 Improper Transactions In Supply Chains (Powell)
   3.3 Criminal Patterns And Electronic Commerce (Felson)
   3.4 Web Site Tenant Rights Violations (Bick)
4. Proposed Work: An Infrastructure Of Data And Enabling Technologies
   4.1 Ebits Data Repository (Adam)
   4.2 Knowledge Discovery In Electronic Commerce Systems (Scheuermann)
   4.3 Security: Access Control (Atluri)
   4.4 Security: Intrusion Detection (Bishop)
   4.5 Data Compression Technologies To Predict Patterns Of Misuse (Storer)
   4.6 Data Access Techniques And Visualization (Owen and Makedon)
   4.7 Intuitive User Interfaces (Makedon and Owen)
5. Management Plan
   5.1 Execution Plan
   5.2 Implementation Details: Data Processing & Dissemination
   5.3 Evaluation
6. References
   6.1 References (E-Commerce)
   6.2 References (Criminal Patterns in Urban Development)
   6.3 References (Web Site Tenant Rights Violations)
   6.4 References (Improper Transactions in Supply Chains)
   6.5 References (Knowledge Discovery in Electronic Commerce Systems)
   6.6 References (Data Security Technology to Guard Against Attacks)
   6.7 References (Data Compression)
   6.8 References (Data Access and Visualization, User Interfaces)
   6.9 Other Related References
A. Appendix: Implementation Details & Dissemination
   A.1 Data Processing
   A.2 Dissemination Public Service By-Products
   A.3 EBITS Self-Start Program
   A.4 EBITS Minority Education Programs
B. Appendix: EITS System Diagrams
C. Appendix: Intuitive User Interfaces


Abstract

Electronic commerce (EC) is a major social force expected to bring fundamental changes to human transactions at all levels of daily life. However, the very nature of e-commerce also brings out new social concerns of safety, fraud and misuse. These concerns need to be addressed, as they can have an adverse impact on our lives. E-commerce misuses are already evident due to uncontrolled commercial dissemination, insider problems of company abuse, loss of privacy, exclusionary practices and other problems. To this date, there is no comprehensive infrastructure to collect EC misuse data. In this proposal the goal is to construct a data resource documenting EC misuses of all types. This resource will enable social and behavioral scientists extract patterns, arrive at conclusions and disseminate their data on the Web. The proposed infrastructure for EC safety has four components: 1) Collected Data and Data-Enabling Software (i.e., tools to better view the data), 2) a web-searchable Database of "unsafe" Cases, classified into a taxonomy of crime and other parameters and linked to the first component, 3) an Educational Infrastructure component, (containing tutorials, examples, and guided tours though cases), 4) a Public Service Component of programs (e.g., minority education workshops). The third and fourth components are dissemination and built-in evaluation components which will assess the performance of the proposed infrastructure. The technical experts of the assembled team will work closely with the social scientists and will incorporate mature technologies in the areas of data mining, security, compression and multimedia to represent, process, visualize, store, browse or retrieve this data. The proposed infrastructure is unique, comprehensive, easy to use and highly relevant to social science research as well as an issue of national importance.


1 Introduction

The economy of the 21st Century is becoming an Information Economy based on billions of electronic transactions. "Electronic commerce" (or EC for short) describes this development. "Ensuring security for EC is a fundamental prerequisite before any commercial activities involving sensitive information can take place," is a valid quote from Adam et al*. As EC promises to become a new social force, affecting every aspect of daily life, and affecting even persons who do not use the Internet, it is vital and very timely that we establish standards and criteria of what constitutes an ethical, democratic, fair and especial safe EC transaction, beyond the current simplistic "secure document" modes of ordering over the Web. This study proposes to design and create an innovative, large scale and comprehensive infrastructure composed of EC cases of fraud, conflict, misrepresentation, system attack, crime, and other potential danger to safe EC transactions. This infrastructure will be specifically designed for and with the collaboration of social and behavioral scientists to facilitate research in their areas in the next decade. The infrastructure will become a foundation of facts, software, cases and educational programs for examining existing and emerging new social problems arising with the rapid growth of EC. Only then will social scientists be able to address a wide range of issues from multiple perspectives that are based on easily accessible real and simulated data, and thus prepare for expected social changes. The proposed infrastructure will be called EBITS, which stands for Electronic Business and Information Technology for Society. (See Appendix B for EBITS system architecture diagram.)

The proposed infrastructure for EC safety has four components (see diagram Appendix B): 1) Collected Data and Data-Enabling Software (i.e., tools to better view the data), 2) a web-searchable Database of "unsafe" Cases, classified into a taxonomy of crime and other parameters and linked to the first component, 3) an Educational Infrastructure component, (containing tutorials, examples, and guided tours through cases), 4) a Public Service Component of programs (e.g., minority education workshops). EBITS collected data will include primary, secondary, and tertiary data related to EC abuse. Primary data will consist of log data from various types of transactions, either in original format or "processed" (cleared, catalogued, compressed, filtered, etc.). Secondary data will consist of meta-information about transactions such as, dates (but protecting the privacy of individuals), history and background information, statistics, maps, visualizations, software programs, experiments or the like. The infrastructure will also include tertiary data contributed by social science participants, which will be processed for clearance. Included in (1) will be a searchable (by attribute) database of enabling software or tools, (such as Web-based tools to perform web-searching and browsing for purposes of scientific study), thus forming an essential symbiosis of data and software. The second component, a Database of Cases, will be "derived" data growing over time as they are results of EBITS studies that have been cleared and classified into a legal taxonomy. This component will also be searchable by easy, user-based queries and will also be linked to the data and software component (if there is need to demonstrate with real data the case in hand). The third and fourth components are essential dissemination and built-in evaluation components which constantly assess the performance of the proposed infrastructure as a valuable, robust and useful research and educational aid for social and behavioral science research. EBITS philosophy is that energetic dissemination and evaluation are essential building block processes for making EBITS useful for diverse social scientists, public, policymakers, and students.

The proposed infrastructure is unique, as we do not know of any other of this type in existence. It is comprehensive, easy to use, and integrated with the Web to make it as democratic as possible. It is timely because the issue constitutes a national safety issue affecting every American citizen, even those who do not use the Internet. It is relevant to social science research because abuse of EC is a social phenomenon, not a technical or other phenomenon, already rampant and growing. Information is a valuable commodity and its commercial manipulation is very much a public concern. This project serves the public interest because it proposes mechanisms to monitor and assess the safety of commercial information exchange, from the trading of stocks, to the sharing of X-ray images, to the trading of music CDs, or ordering components to build bridges. This is information (commodity) exchange that affects the public. The new EC practices of such exchanges intimately affect the public, every day and at all levels. National information policies on abuse of EC, are areas of social science research. This project is to be carried out by a highly capable and unique interdisciplinary team of experts that has been assembled for this purpose. The team of twelve experts is drawn from seven different institutions and covers disciplines in criminology and criminal law, international law, computer law, business, finance, EC and computer science. EBITS will collect large-scale real and simulated data from multiple sources in the business, government and public sectors. It will enable users to interact with real-world data, otherwise hard to access or understand. It will offer opportunities to visualize intricate processes, make prognoses, navigate through large databases of related information, all in a seamless way.

It is important to comment here on the capability of the team we have assembled and explain the reasons for our choices. The proposed problem of collecting heterogeneous, large-scale data into an organized archive is very challenging. It does not simply require content acquisition, clearance of the data, and a taxonomy of cases. It requires applying state of the art technologies in data mining, security, compression, multimedia, networks and storage. To make this a large-scale robust resource we had to go and find the best possible people in these areas who are interested in participating in this project. At the same time, we had to demonstrate that the time is not only ripe for this work but that it is urgent that we undertake this project now. For this purpose, in section 3, we have assimilated a group of legal, business, financial and other social science experts who give testimonies of their need of this infrastructure. However, besides the testimony of how their research will benefit, these experts have unique EC expertise, commitment to the problem and the capability to provide the basic domain expertise for this problem. Other experts will also be involved in this effort from outside the current team, such as experts in ethics, women's studies, psychology, communication, social science history, and others we have already identified. We also have strong company support: PriceWaterhouseCoopers has committed to providing us with data, cases, and contacts and so has the federal GSA agency that already works with the CIMIC center at Rutgers. In addition, the investigators have close ties with IBM, Oracle, Lucent, and other high-technology companies who have expressed great interest in this work. The team is very capable to manage such a project as there is already built-infrastructure, as discussed in the methodology section. Furthermore, all of the institutions involved have at their disposal high-bandwidth links to share the data (using Internet II, for example), high-end equipment to prepare the data and software, and world-class expertise in each of the areas involved. This is not a project that has simply divided the tasks to various partners, but an ongoing, synergistic and highly interactive effort where the different disciplines will work together to form the EBITS archive. This is the first time that such a team has been assembled to consider this issue of national importance.

The remaining of this section outlines the general theme of the project. Part 2 gives the objectives and outcomes or deliverables. Part 3 gives examples of EC-misuse related research that EBITS can support while at the same time introducing the domain experts who will form the founding team. Part 4 outlines the proposed work. Here it is important to remember that the technical experts will work with the experts to apply the technologies and that these are mature technologies which involve implementation, not research of new tool development. It involves customization of existing tools, data processing and programming the connections between the different parts of the EBTS system. The reader is encouraged to look at the diagrams in Appendix B in order to understand the relations of the different parts. Part 5 describes the methodology of how this work will be carried out. A great deal of this description is in fact already embedded in parts 1-4, which explain in detail who are the experts and what is the methodology of the technology to be applied. Therefore, the methodology section is not as long. Part 6 is the references divided by domain, each section corresponding to the section in parts 3 and 4, as indicated. Appendix A gives more details about the dissemination and evaluation programs of EBITS.

1.1 General Social Science Theme

Since the number of transactions far exceeds the number of abuses, discovery of statistical significance for the latter may require collecting vast amounts of data from diverse sources. This is hard for a single scientist or small group of scientists to do. Obtaining clearance from different companies is time consuming and is an effort that should not be duplicated. Once obtained, this data may be hard to view with the naked eye and this requires the assistance of filtering, abstraction or visualization software. Large-scale log data, for example, even if old and cleared (meaning that through cross tabulation and other means, it is anonymous), is also hard to maintain or share. The data may also come in formats that are hard to analyze or compare because thy are incompatible with each other or unrelated to existing formats. Automatic and systematic ways are needed to process this valuable resource in a way that it is highly usable and shareable by a large community of social and behavioral scientists. These are some of the problems facing the study of EC growth and its impact on society. The proposed resource promises to open the way for many new research opportunities in the field of social and behavioral sciences and paves the way towards arriving at EC common set of standards and testbeds, that is, a prototype for studying EC. Common testbed will assist diverse social scientists and keep their work comparable, while spinning off new ideas and studies.

1.2 Basic Model

Our basic model of EC abuse is depicted in Figure A. In the first stage, various individuals discover the potential for abusing new technologies for their own purposes and proceed to do so. As these abuses are discovered, organizations and individuals begin to assign responsibility for these abuses and for their prevention. As this gets sorted out, government and private sector organizations begin to take action to thwart the abuses. For example, bank personnel learn how to embezzle from individual accounts; banks decide whether to take responsibility and how; governments and private sector organizations begin either to mandate or to take action.

Figure A. Depiction of a Basic Model of E-Commerce Abuse and its Dynamics

This basic model is reflected in our four substantive pieces of social science research, as described in Section 3. Holzer's work will discover fraud in EC, Powell's work will discover improper transactions in Supply Chains in business, Felson's work will discover criminal patterns, Bick's work will discover violated Web Site Tenant rights. All segments reach into the other and are examples of EC safety research. This list of social science examples demonstrate the need of an archival infrastructure as the one proposed. These experts are recognized scholars who have enthusiastically agreed to work with technical experts in building such an infrastructure.


2 Objectives And Expected Outcomes

The main objective is to create an infrastructure of common format data that is integrated with data-enabling software and methods in a World Wide Web accessible infrastructure called EBITS. EBITS (E-Business and Information Technology for Society) is designed to support the work of social and behavioral scientists because it provides:

  1. Common Format Real-World Data: This will be a valuable national resource containing real-world data which, otherwise, would be either inaccessible, or too cumbersome to analyze without appropriate processing. Interoperability amongst heterogeneous and vast data repositories will be ensured. Cost-effective mechanisms of data clearance and dissemination for shared use will be adopted to avoid effort duplication.

  2. Data-Enabling Software: With the data, EBITS will contain built-in technologies (see system diagram in Appendix B). This software will be designed to evaluate transactions (e.g., for safety, violations, conflicts, negligence, etc.). Such embedded technologies will enable large-scale data analysis, efficient multimedia data storage, data classification, and other. The focus will not be to develop new tools or invent new methodologies, but to use existing and mature techniques to elucidate complex or cumbersome data to a very large group of people.

    1. Simulation Tools: This will be a set of built-in interactive simulation tools, powerful visualizations and modeling of the data which offer the user hands-on exposure, decision making support, testing and play-acting capabilities.

    2. Publishing: Social scientists will be able to make their results known in a variety of formats using the EBITS publication facility, an electronic journal over the Web. Included will be: facilities for multimedia document authoring, searching, browsing and maintenance over the Web.

    3. Navigation: Given the size of the infrastructure it will be important to have visual mechanisms of navigation in the EBITS information space as well as facilities for traversing conceptual maps that represent data relations.

  3. Retraining And Education: EC involves totally new thinking and capabilities at all levels. Interactive tools for retraining on new models of doing business will be used and connections made to the archived data. Integrated with the process of data collection, these various software tools will provide seamless links to background information on issues, laws, standards, statistics, and other information. Robust and fast multimedia user interfaces will be used.

    1. Tutorials And Immersive Paradigms: These will include virtual reality paradigms of exploring 3D spaces of data so that the user may gain a deeper understanding of the issues related to use and abuse of EC applications.

  4. An Interdisciplinary Team Of Law, Business And Technology Experts: An interdisciplinary team of experts will supervise the basic processes of content acquisition and content processing, which include:

    1. collection of data;
    2. clearance of the data;
    3. processing the data for analysis;
    4. creation of new software or application of existing software for better representation, visualization, storage and classification;
    5. development of user interfaces;
    6. concept maps for different transaction cases that guide navigation and search.

One outcome of this project will be an established mechanism for social scientists and technologists to come together in solving a very real and large social problem, EC safety. EBITS will maintain a list of active virtual community users and content providers. A second outcome will be the bridging the often distinct worlds of business and finance with the academic communities of legal and computer experts. Only this way will be it be possible to solve such a complex problem. In this sense, EBITS will constitute an EC safety forum. However, EBITS will not be a stationary repository but a growing and dynamically evolving resource which is updated with current events. Links to CNN content and to Wall Street will be made. Retrieval tools developed at Dartmouth will mine additional informational from this type of news sources as the need arises. A third outcome will be significant new education opportunities on EC issues for social and behavioral scientists (see Appendix for details of education program deliverables).


3. Examples Of Discovering Patterns Of E-Commerce Misuse

In this section we include examples of social science research on EC misuse that would be conducted by legal, business and public administration experts. Development of an infrastructure that discovers EC misuse is a preventive and essential measure for its healthy growth.

3.1 Fraud In Electronic Commerce (Holzer)

Fraud in the emerging EC in the public sector is a becoming a serious concern. According to the International Encyclopedia of Public Policy and Administration (Lange, pp. 939-941), in the EC context fraud may be defined as actions or omissions that cheat, deceive, distort, or intentionally and willfully swindle or dupe citizens, clients and customers of government-provided or government-endorsed services and goods. Fraudulent EC is expensive and time-consuming to detect, investigate, and prosecute. It is sometimes characterized by sophisticated and hard to detect schemes which use accounting, auditing, and legal techniques that circumvent normal monitoring and regulatory checks (and the latter are virtually nonexistent in the EC context). Investigation of EC fraud is often particularly labor-intensive. The comprehensive system proposed here can assist that effort.

Public sector services and goods are delivered primarily through direct provision by an agency (i.e. traditional, bureaucratic mode). Fraud committed by administrative personnel in EC programs may include misrepresentation of recipient eligibility; overpayments or underpayments to recipients, third parties, or auxiliary providers; and withholding services and benefits to beneficiaries, including those in the health care, taxation, and licensing systems.

We lack adequate assessment of emerging EC public fraud and means for preventing it. We need an infrastructure such as EBITS to organize a synergistic and interdisciplinary effort in attacking EC fraud. EC fraud may be especially serious when goods and services are provided indirectly, such as via privatization and out-contracting. Examples of crimes perpetrated by auxiliary providers are collusive bidding, inferior delivery of services, and malicious destruction of records. In addition, organized criminal groups may target EC as a "target of opportunity." Our tentative hypothesis is that EC in this context is more vulnerable to fraud than in traditional, bureaucratic contexts (as above). A corollary is that fraud is almost as extensive where service providers (not-for-profits and for-profits) are engaged as contractors and EC is used to facilitate services to others.

There are also data-related issues (which will also be addressed in constructing EBITS itself). In the case of public domain data, the following questions arise: Should data be available? In what format? Is such data too "protected" from legitimate research use? In the case of private domain data, at issue are the conditions of availability imposed by government contracts. This suggests a need to develop model provisions for government contracts using EC. Case studies, comparative analyses, fraud analyses, and customer service studies are all needed. We only list some examples of abuses of EC to make our point, but they go beyond these examples alone. Many abuses are built into hidden charges. Fear of abuse affects clients and customer behavior. Failed delivery of goods and services or poor training of staff also result from governmental and private sector inability to monitor EC abuses. E-commerce abuses bring to life new social science issues and make necessary a new breed of social scientists to evaluate the consequences. The vast array of abuses produces exponential needs. Obtaining datasets from sources and providing privacy with them creates a vast and difficult task for any group of social scientists, but the work has to start.

We propose to conduct a survey of best and worst practices relating to EC fraud-prevention in public sector and privatized services. We will incorporate these findings in the EBITS database of cases and provide primary data as well. The results of our intensive interviewing of a stratified random sample of public agencies in the United States will also become part of the EBITS public service component (see Appendix B). We will explore the extent to which systems are in place to detect and deter fraudulent activities by public, not-for-profit and private delivery mechanisms. Questions include: whether EC is perceived as a real problem, whether software and procedures are in place to detect such fraud, the presence of complaint mechanisms, hidden charges hiding EC abuse, loss of EC use due to fear of fraud, remedies for abuses, staff training or its absence, and governmental capacity to prevent these problems. For this work, we are planning to depend for data storage and classification on the technical team of EBITS.

3.2 Improper Transactions In Supply Chains (Powell)

A supply chain is a collection of businesses that together meet the needs of a consumer for a product or service, such as wheat to bakery to store to consumer. Supply chains are vital to business and have existed since the beginnings of commerce. EC changes their nature. They are no longer as short or close as they were. Supply chains have become highly complex with the growing ease of transporting data and physical goods. There are numerous examples of firms supplying locally-differentiated products to all points of the globe and manufacturing those products from components made on three or four continents, often in quick-shifting markets. The danger of misrepresentation and fraud is very real. Collecting data and software to analyze supply chains is the first step to prevention and establishment of standards.

One of the key enabling technologies for the modern supply chain is Enterprise Resource Planning (ERP) software. ERP software is integrated, firm-wide software that provides all aspects of basic business information management, including sales and materials planning, production planning, warehouse management, financial accounting, and human resources management. SAP is the leading vendor in this market, along with Peoplesoft, Baan, Oracle, and J.D. Edwards. Not only do ERP systems provide all the basic management functions a firm needs, they also form a repository for the most basic information on products and customers. Just as existing ERP systems help to coordinate all the diverse aspects of a single business, future ERP software is expected to help in managing the entire supply chain. This new family of software envisions the sharing of information across complex business networks as readily as firms can now share internal information among divisions that use compatible ERP systems.

Unfortunately, these excellent business tools also provide niches for offenders to defraud companies. On the other hand, it is increasingly possible to develop new software to trace goods and services and prevent such frauds. Whereas ERP systems have in the past been focused inward, they must soon be modified to accept the data provided by EC and to support a firm's efforts to exploit EC. Understanding supply chain management is an essential prerequisite to also understanding mismanagement, fraud, conflicts, customer or vendor violations, and other misuses. Tracking transaction data for various parameters sensitive to patterns of misuse is feasible with the infrastructure proposed here. Developing standards for supply chain and EC safety will be one of the goals.

Software and Data: Establishing a Supply Chain Data Repository

EBITS offers a unique opportunity to create a research repository for both the software and the data that firms use to engage in supply chain operations. Neither are now available to researchers without extraordinary efforts. Also, no efforts have been made to make the data from different ERP systems and different firms compatible. This repository will exist within EBITS and be interconnected with other data and tools. It will allow researchers in many fields to study the supply chain as it evolves over the next decade as EC takes hold.

Thousands of firms utilize ERP systems, and through them have built large databases of routine transactions data. This data is easily accessible and covers, among many other things: (a) customer orders; (b) orders placed with suppliers; (c) delivery dates to customers; (d) delivery dates for suppliers; (e) planned production levels; (f) inventories of finished goods, raw materials, and work-in-progress; (g) financial data.

EBITS will assist in the solution of two obstacles in accessing and analyzing supply chain data: lack of access software with which to access the data itself, and understanding the meaning of the data itself. Only when these two problems are overcome can we hope to undertake research across many firms that focuses on the specifics of operations. We propose to develop an archive of chain-supply transactional data on a broad variety of types of firms. Researchers will be able to access such basic data as production cycle times, or percent of orders delivered on time. This data will be available for a selection of firms in a given industry, or for a selection of firms that offer Web ordering services. In this way the repository will support a range of research into the effects of EC. To create this archive will require computer experts who can create the tools necessary for making ERP software and data accessible over the Web. In addition, we will draw on other experts in this consortium to create tools for viewing and analyzing this data.

3.3 Criminal Patterns And Electronic Commerce (Felson)

This section emphasizes the need for an infrastructure that can combine seemingly unrelated data to solve a crime. Social science statistical data has long merged different data by census tract, block, or other area units. In the past, that required only the ability to assign diverse data to the same areas. However, today's social research over space is much more complex. It involves new types of data, many of them electronic, with far more diversity. New theory links such data. New precision is called for by the theory. And new technologies make possible better mapping and socio-spatial analysis. As people and activities weave more intricate paths, especially due to EC, so must the social data infrastructure develop in order to fit the actual community of everyday life.

Social problems and criminology offer an excellent example for the need for the proposed infrastructure. Crime is associated with diverse problems and data, often not amenable to traditional analysis and this data may include electronic ordering patterns. The trip to crime follows paths among work, school, shopping (e-commerce or otherwise based) and entertainment [Brantingham and Brantingham, 1995]. Moreover, crime risks vary greatly even within a single block or from one address to another very close. These patterns are increasingly studied empirically and made a part of crime theory [Felson, 1998]. Very diverse types of data are now used to study how crime opportunity is generated. These range from the location and movement of merchandise suitable for stealing to the positioning of ATM machines to the hilliness or other topographical features of the metropolis that make it harder for burglars to find their way to and from a given place. With new mapping technology (tools which can easily be embedded in the EBITS infrastructure), data with very different locality features can be linked and can be mapped and correlated in much more effective ways than correlating across census units. For example, deterioration of housing and housing quality from land use datasets can be combined with data on traffic circulation to predict and explain the location of drug markets. Police efforts to thwart drug-selling can be evaluated more scientifically with solid hypotheses about how housing and traffic patterns are linked. Geological forms, pollutants, and deterioration of vegetation can be linked to lowering land values, out-migration, and crime takeover of areas not formerly high in crime. Re-population of areas can be studied in terms of improved conditions in the physical and social environment. In addition, theorists are increasingly recognizing that one type of crime often begets another. The spread of open-air drug markets give rise to nearby burglary, whose spread breaks down surrounding places for drug-market growth [Rengert, 1996]. The time has come for a modernized data infrastructure.

Criminology and E-Commerce: Developing a Science Across Disciplines

Crime is increasingly part of EC and its community. EC allows crime to be perpetrated at one site yet culminate at another. It completely transforms the supervision function and the social control structures to prevent crime. This is why an entirely new way of thinking, and concomitant data and analysis, are needed. The landmark book by Grabosky and Smith (1998) set the stage for building much stronger links between EC specialists and social scientists, especially criminologists. Their topics included illegal interceptions, electronic vandalism and terrorism, stealing telecom services, electronic piracy, offensive content, telemarketing fraud, funds transfer crime, money laundering, and using EC to enhance criminal conspiracies. Each of these is, of course, a topic for investigation in order to apprehend specific perpetrators. But each is also a topic for the scientific study of crime and for both conceptual and data enhancement.

At the same time, much EC crime depends upon low-technology crimes, such as walking into someone's office uninvited and pulling a password out of the desk, or stealing keys for unauthorized access. Indeed, EC involves a remarkable merger of past and up-to-date misbehavior. Basic criminology already provides expertise in thinking and learning about the modus operandi of each offense [Felson, 1998]. Part of this effort involves getting beyond the legal categories alone, looking at exactly how offenders do offenses, for what purposes, and with what calculations. Criminologists often know the right questions to ask to evoke this information and to organize it. But usually they are not adequately informed about the technical aspects of EC to do this job alone. The marriage of engineering, business, criminology and related fields thus offers major opportunities for studying the abuse of electronic systems as a scientific field. The principles likely to be developed can then strongly interact with practice, enhancing science at the same time that crime prevention improves.

Traditionally, criminologists gather crime data from police reports, surveys of victims, surveys of offenders, business data, medical reports, and systematic observation. In the case of EC, many of these methods will be applicable but not necessarily the best sources. Indeed, electronic files themselves provide many signatures of misuse. Such "signatures" are already used by forensic accountants and fraud examiners to detect specific offenders. This same data can be systematically studied to provide scientific indicators of more general problems and their correlates, and to offer us a revolution in the study of a whole class of crimes. For example, multiple tries at passwords to break into a system leaves an electronic signature that itself can be registered, counted and examined. These counts provide indicators that would never reach police reports or self-report surveys. Although most fraud examiners are not sophisticated electronically, those who are have already devised techniques that can be converted to scientific use [Felson, 1998]. EC contributes to this analysis with additional data but also brings new issues of how to protect the privacy of the individual and the propriety concerns of companies. We plan to work with the EBITS team to build such safeguards and provide results in the new science of EC abuse which must cross disciplines.

3.4 Web Site Tenant Rights Violations (Bick)

This section describes how EBITS can support research on finding and classifying Internet user rights violations. Any user of the Internet is a Web site tenant, together with over 10 million Americans with personal Internet Web pages [Sweet et al] . This also holds for Internet domain name owners who have registered with InterNIC have implemented their Internet Web sites through a Web hosting provider [Everitt]. Web site tenant rights are the least understood rights because their value has not been quantified and the Web site hosting agreements, as they occur now, misrepresent Web site tenant rights, resulting in significant economic loss. The existence and extent of such losses can only be determined by experimentation on large quantities of data, as can be afforded by the proposed EBITS infrastructure. By application of traditional sampling and statistical methodologies, EBITS can support large-scale discovery of Web site tenant rights violations. This will broaden and deepen our understanding of Web site hosting service transaction costs and benefits. A scientific analysis and quantification of the costs and benefits will be of value to local, state, and federal government agencies, and to lawmakers. It will also promote decision making and regulation of Web site hosting transactions both from the public and private sector.

When Americans and American businesses allow others to operate their Internet pages or sites, most of their rights are contract rights. The specific components of any particular Web site hosting services agreement differs . Most agreements include common provisions and are offered on a "take it or leave it basis," while the law of contracts assumes that there are reasonable expectations of the parties arising from a bargain-promise [Young]. The mere existence of the wide spread implementation of "take it or leave it" agreements suggests the presence of substantially unequal bargaining positions among the parties [Weaver]. This sort of behavior usually results in a claim of an "adhesion" contract which may result in the courts denying enforcement either of the contract as a whole or to certain terms in the contract.

Certain Web tenant rights spring from the transaction itself. One method of identifying Web tenant rights is to review the user rights uncovered by the courts during the last two decades of experience with similar computer transactions. For example, Web site hosting agreements, service bureau agreements and outsourcing agreements share salient characteristics. We propose to test the hypothesis that Web site hosting service agreements have misled Web site tenants with respect to their rights and then provide the results we collect to EBITS for further study. A comparison of the private rights set forth in a typical Web site hosting service agreement with the common law rights arising from the Web hosting transaction clearly suggest that Web site tenant rights are not being properly presented. This will involve collection of real data collected from agreements, courts, private annotations, questionnaires and Internet sampling. Predictions can then be made. A random selection of such agreements will be collected from the Internet and the private rights associated with them will be summarized to common format, thus forming a testbed of data valuable to other EBITS researchers. This derived data will be compared with a list of common law and case law Web site tenant rights, which would be created by reviewing existing statutory and case law. We propose to work on developing a taxonomy and database of such cases. We will also collect information for the purpose of quantifying the uneconomic behavior which results from their lack of knowledge of specific the Web site tenant rights. This information would be used to estimate the economic loss as a result of such behavior. The outcome of this work will be of value to policymakers, Web providers, and the general public.


4. Proposed Work: An Infrastructure Of Data And Enabling Technologies

This part outlines the work of the technical part of the team: How will they ensure that integration of the information and easy access for maximum usability? These are the questions that this section covers. For the non-technical reader we suggest that he takes a high-level perspective of the enabling technologies we propose to use. The one thing that we wish to convince the reader of is that the technology is there and ready and that the team is highly capable, as shown by their record and technical accomplishments. For example, Storer is one of the leading experts in data compression, author of two of the defining books in the field with a long list of tools and publications. Scheuermann is one of the leading Database and Data Mining experts whose work is highly referenced. Adam is director of a large center, CIMIC which integrates technologies. He is the PI of large state grants as well as the author of a new book on EC. Atluri is a Database expert at Rutgers who is involved in distributed computing research and management of large scale information systems. Felson is a leading criminologist at Rutgers and Holzer a recognized computer and EC law expert. Makedon (Director of the Dartmouth Experimental and Visualization laboratory) and Owen (Director of the Laboratory of Media and Entertainment Technologies) are recognized multimedia experts with numerous publications whose joint work on information retrieval is going to published in a book by Kluwer Publishers shortly. Bishop's work and laboratory at U.California Davis is one of the leading data security labs and his expertise is sought by federal government. Gloor is author of books and leading scientist as well as partner of the largest consulting firm, PriceWaterhouseCoopers. His enthusiasm and support of the project will be accompanied with supporting data, contacts, expertise and consultation. Powell, a Professor at the distinguished Tuck School of Business, is a recognized Supply Chain Management expert who is committed to establishing a state of the art repository of data that document different approaches to supply chains. The success of this project will depend on the close and systematic cooperation of these parties which is already evidenced by their joint work in the EBITS consortium, whose name this infrastructure takes after. To our knowledge this is the first such team to examine EC safety, from all different points of view. We plan to additional technical expertise, as needed.

4.1 Ebits Data Repository (Adam)

Building the EBITS data repository necessitates making the data "talk to each other." This translates to the following major tasks: Assuring Integration and Interoperability, and Data Warehousing. Data to be stored in EBITS is extracted and collected from a variety of information sources, that are heterogeneous, autonomous and distributed. In order to provide users with seamless access, there is a need to have integration and interoperability technologies in place. Although there exist several commonly used commercial systems and research prototypes available to provide interoperability and integration, they suffer from several drawbacks. For example, with CORBA different heterogeneous systems communicate through a common interface, called Interface Definition Language (IDL). In order for two systems (client and server) to interoperate, each of them has to maintain an IDL interface. A change in the server application program requires all the IDL interfaces (of both the client and the server) to be updated. Approaches using hard-coded wrappers for a variety of information sources, e.g., Sybase, Oracle, flat file, etc., also suffers from several drawbacks. For example, whenever a new information source is added to the system a considerable effort has to be spent on writing the corresponding wrapper. Since the EC environment is Web-based and dynamic in nature, i.e., information sources change over time and new information sources are added often. To accommodate such an environment we need an approach that does not require or impose rigid standards on the information sources as in CORBA, and that does not require hard-coding the wrappers.

At Rutgers CIMIC, we have developed an XML (Extensible Markup Language) based system to provide such interoperability and integration. This approach enjoys several advantages over existing systems because of the use of XML. Specifically, (1) XML lends itself to automatic generation of the wrapper through its transportability between heterogeneous systems in a neutral and system-amenable manner, and (2) XML is becoming increasingly popular in the EC community. Today there exists a number of XML parsers and libraries available; this trend is going to increase as XML is become more popular. The already available parsers make the implementation easier. According to our methodology, whenever a new information source is added, it will first advertise its services, and its schema using XML. Locally-independent DTD will be used to define the XML-formatted services, which includes the necessary information to execute the service, e.g., the description of the service, commands to execute the service, the parameters for the service, the data format for each of the parameters, the description of the service response, data type of the result, and its ranges. Using the given schema and the service specifications, a new translator will be generated automatically thus, makes our system suitable for a dynamic environment such as EC.

Rutgers CIMIC has the expertise to carry out the proposed project, as evidenced by their funding for the related project described above. CIMIC has been funded by the State of New Jersey Hackensack Meadowlands Development Commission (HMDC) in the amount of $3 million for the next five years. This project, the Meadowlands Environmental Research Institute (MERI), focuses on five research themes--vegetation patterns, mudflats, contaminant hotspots, and scientific data management. Hackensack Meadowlands, an 82 square kilometer region in northern NJ, is an environmentally assaulted region that at one time contained over 1,012 hectares of active landfills for consumer and industrial waste in the middle of a degraded urban estuary, located 4 kilometers west of New York City. To identify the most effective regions to mitigate, integration and analysis of data collected by researchers in wetland biology, soil science, hydrology, geology and professionals such as civil and solid waste management engineers and planners is needed. This data is in a variety of formats and collected from diverse sources. This data repository is intended to provide a suitable view of the data to each of the user communities including urban planners, researchers, school children and teachers, and general public.

In order to provide fast and efficient data analysis, classification, clustering, trend and deviation analysis and data mining, integrated data are consolidated into a multi-dimensional and summary database--a data warehouse. Building a data warehouse involves such activities as warehouse design, schema evolution, providing methods to store new derivative information and products in the data warehouse, data warehouse data models capable of accepting such objects pushed back from the users. Rutgers CIMIC is also involved in other major projects related to data warehouse design and development and thus has the resident infrastructure and expertise (3 associate research professors). The data warehousing research and development effort is funded as part of the NASA Regional Applications project, the Lawrence Livermore National Laboratory data warehousing study, and NASA Hubble Space Telescope study of commercial off the shelf data warehousing systems.

4.2 Knowledge Discovery In Electronic Commerce Systems (Scheuermann)

Once there is a data repository in place, there are many ways to extract additional information that can then also become part of the database of EBITS. Knowledge discovery in databases (KDD) extracts higher level knowledge that is hidden within a database [Fayyad et al]; this makes the data more usable to social scientists who wish to extract semantic meaning from the data. The KDD process consists of six steps: data selection, data cleansing, data warehousing, data mining and visualization. We will apply knowledge discovery techniques to the EBITS EC transaction data (e.g., log files which may include a history of user accesses and descriptions of user purchases) in multiple ways, working with various social scientists. We describe each of the steps that will be involved in an EC application:

Data selection creates a target data set on which discovery can be performed. In EC applications, data selection involves determining which logs are beneficial for discovering customer purchasing patterns or identifying intruders, fraud and other misuse. For example, different log types may exist for individual applications or multiple log types may exist for a single application. Since all of these logs may not be useful for intrusion detection or purchasing pattern identification, data selection is necessary. Next, some log files may be different formats or contain inconsistencies, noise, or missing values. The data cleansing step of the KDD process involves removing or compensating for any data integrity problems. For example, if partial customer transactions exist in a log, we need to decide whether or remove or append each of these transactions [Kimball].

A data warehouse is a stand-alone repository that contains subject oriented, nonvolatile information potentially integrated from multiple data sources. It is typically used to answer decision support queries. Since log data is massive and may exist on multiple sources in multiple formats, useful subsets of the log data can be placed into a data warehouse. Benefits of this approach include: well-formatted data, fast query processing, and easy access to historic log data. Depending upon the number of attributes associated with the logs, either user access or user purchase data can be placed in a data warehouse.

Data mining finds hidden patterns and relationships from a data set. The main problem to overcome here is the large number of possible patterns and relationships. Therefore, it is imperative to find intelligent and efficient ways to search for useful relationships within a data warehouse or within the logs themselves. There are two types of data mining solutions that are particularly beneficial within the EC domain: classification and association rule discovery. Classification algorithms attempt to map data items to one of several predefined categories with some confidence. Here are some examples of useful classification rules that we could use for the EBITS data:

Skiers owning sports cars are good credit card customers: 95%

Animals with wings are birds: 99%

Formally, [Agrawal et al] define an association rule in transaction databases to be an expression of the form X -> Y, where X and Y are sets of items. The support of XY is the probability of joint occurrence of X and Y, P(XY). If P(XY) is above some minimum specified support, it is considered a rule candidate. The goal is to identify rules that occur more often than some minimum specified confidence, where confidence is defined as the conditional probability of Y given X, P(Y|X). The following are examples of traditional association rules:

80% of customers that purchase pies and donuts also purchase candy.

item set: { pies, donuts, candy }

formal rule: { pie & donuts => candy: 80% }

These examples highlight the two-step process used to generate the rules, namely determining the sets of items above a predefined minimum support and identifying the antecedent and result portions of the final rules above a minimum specified confidence. A slight modification of the traditional association rule attempts to identify deviations or exceptions within a database [Knorr et al]. Because intrusions are typically not common, intrusion detection falls into this category. As another example, suppose customer purchases usually result in low monetary value transactions. The exceptional case of a high monetary transaction may interest us for either fraud detection or marketing purposes. Finally, classification and association rules can be generalized to fit the EBITS goals. Generalization rules involve high level concepts, not just primitive database concepts. Applying a generalization operation requires the use of a concept hierarchy that keeps track of various relationships between concepts: parent-child, sibling, etc. This subset of rules combines information from a concept hierarchy with information about the distribution of data values to attain another level of interesting rules. As an example, suppose we have a database that contains supermarket transactions, where a transaction contains all the items purchased by a customer. The following is a traditional association rule:

When milk and butter are purchased together bread is purchased 70% of the time.

butter & milk => bread: 70%

A generalization based on this association rule is:

dairy => grains: 95%

Since grains and dairy products are not primitive database concepts, i.e. values in the customer transactions, the previous rule is considered a generalized association rule. Generalized classification rules are determined similarly. We have developed a methodology for mining generalized association rules that can be applied also to semi-structured data, such as text or HTML documents [Singh,Chen and Scheuermann]. In summary, we will apply to EBITS data mature knowledge discovery techniques to discover patterns of commerce misuse. We will combine data mining techniques with compression-based learning software (thus working with Storer on this) in order to increase the efficiency of knowledge discovery on log data.

4.3 Security: Access Control (Atluri)

Not all information in the EBITS data repository is public or appropriate for all users. For example, the detailed criminal records should only be accessible to the police and authorized government officials, whereas the areas of high criminal activity can be made available to general public. To ensure that appropriate content is displayed only to authorized users, suitable access control technologies must be in place.

The development of such a system is challenging due to the unusual requirements with respect to the formulation, specification and enforcement of data protection policies. Unlike conventional database environments, such a system is typically characterized by a dynamic user population, often making accesses from remote locations, and by an extraordinarily large amount of information stored in a variety of formats. Moreover, there is a need to specify access policies on the basis of user qualifications and characteristics, rather than user identity (for example, a private citizen cannot access an individual's census data stored in the data repository). Another crucial requirement is the support for content-dependent access control. As the data are collected from a variety of information sources, maintained by possibly different organizations, different access control policies may be in place. This calls for a need to integrate or at least harmonize disparate policies in order to provide a global consistency of the overall access control policy.

Since traditional access control mechanisms are not adequate to meet the above access control requirements, recently Rutgers CIMIC has developed an access control system suitable for such environments, called DLAS, which stands for Digital Library Authorization System. Our system provides (1) flexible specification of access control policies based on the qualifications and characteristics of users; (2) both content-dependent and content-independent access control to data objects; and (3) varying granularity of authorization objects ranging from sets of objects to specific portions of objects. Since the EBITS is used in a similar environment, we believe that the DLAS can be adopted in EBITS.

4.4 Security: Intrusion Detection (Bishop)

Some of the EC security problems can be overcome using cryptography and protocols for untraceable electronic cash and protection of network connections over which credit card information (or other sensitive information) is sent. In this proposal we focus on system-attacks and system-related security issues. Attacks disrupting systems that support EC arise from two sources. Outside attackers, are attackers not authorized to use the systems. For example, if an attacker deduced a connection's cryptographic key from the time of day, originating system, and other ancillary information [Bishop et al.], that attacker is an outsider. Inside attackers are people authorized to use the systems but use them in unauthorized ways. For example, a clerk in a pharmacy is authorized to access records indicating the drugs a patient is taking. If the clerk knows that drug X is prescribed only to those who have AIDS, and sells the list of takers of drug X to firm marketing herbal cures for AIDS, that clerk is an insider who has violated the pharmacy's policy of patient confidentiality.

Much is known about outside attacks. The science of intrusion detection [Hochberg et al.] analyzes known attacks against computer systems. However, all these schemes assume either a known sequence of actions is what changes state, or the system enters a state that is known to be bad. Monitoring for inside attacks is not well understood. In the scientific literature, only one paper [Hochberg et al.] has discussed the psychology of the insider threat as it relates to computer security. A draft paper [Templeton] has attempted to characterize insider attacks, but far more work is needed. These techniques can be used to detect intrusion by analyzing (whether outside attacks, or inside attacks). The system basically moves an "unauthorized state." We will use these techniques in analyzing EBITS data and building simulations that test them

When a system is compromised, its security policy is violated. Analyzing the events leading up to the compromise (and detailing the precise events that constitute the compromise) requires detailed information from the compromised system. This is derived from log files, in which system events of interest are recorded. Typically, systems provide two granularities of logging: log at the application level (in which the application creates its own log files), or log everything at the system level (in which every system call, and its arguments and results, are recorded). Analyzing the log raises several problems. First, the size of the logs is staggering. An active system can generate megabytes or even gigabytes of logs per day. Compression can reduce the storage. The contents of log files are typically very regular, so compression mechanisms tailored to their formats may achieve higher compression than standard methods. Ideally, the logs can be decompressed as the analysis tools access them, so they need never exist in uncompressed form.

Data mining, as described earlier, can be used to analyze log files. We will work with Scheuermann and his team in applying data mining to examine large data sets of EBITS sample data and attempt to uncover previously unknown or unrecognized patterns and/or entries relevant to a known pattern. Uncovering all log entries related to specific entries will give us the events leading up to an attack. "Doorknob rattling" is the technique by which an attacker (internal or external) looks for ways to compromise the system. If an attacker begins with a database, then the attacker probably has inside knowledge of the system (certainly enough to gain access to the system). If the attacker begins by guessing passwords or trying to bypass the authentication functions, the attacker is probably an outsider. Attacks are frequent in EC applications and will increase because of the amount of data. Visualization (or graphical representation of the data) is essential, as images are more expressive than words. We will work with Owen and Makedon (see later sections) to create such visualizations which can offer a quick perusal of logs for critical events.

Lastly, EC is a distributed enterprise, as several systems--at least two-end systems (client and server) and numerous infrastructure systems--collect log information. Attacks involving both insiders and outsiders would be distributed over several hosts. Hence any analysis of attacks require study of a distributed set of logs and systems. As the technology evolves, EBITS will updated accordingly. Bishop's team at U.C. Davis is recognized as one of the leading security labs in the world, advising an array of government agencies.

4.5 Data Compression Technologies To Predict Patterns Of Misuse (Storer)

This section considers a vital technology: compression. A criminal act or person can hide because of the vast size of data transactions involved. To detect fraud, therefore, there must be mechanisms of retrieving information that passes compressed. How can we detect those special criminal features if the data is compressed? We have developed tools that learn by doing adaptive compression and this section explains how they can be used in an EC application studied by an EBITS scientist.

Most science data of this type in the future will never been seen by a human (there are simply too few humans to go around). Data processed by and stored on computers today is growing exponentially. NASA is proposing to archive terabits of data per day. So it is with EC: there is a vast volume of data, including security logs. Here, again there is more data than will ever be looked at by a human. How can we use compression techniques to store and process this data? How can one detect fraud or enforce laws on compressed data? In what follows we describe two types of compression, on-line (or real-time) and off-line (not real time). In the EBITS project, off-line will be of interest because we want to compress data for storage and retrieval. However, we also describe on-line compression because many transactions do occur using compressed data and the ability to detect patterns of misuse in real-time compressed data is an important capability.

As compression technology becomes more understood by scientists, most data sets will be compressed during both transmission and storage--in some cases lossless (where information is not lost due to the compression process) and in some lossy (where information may be lost due to compression). In a typical real-time application, such as EC, data streams pass through a compressor or de-compressor as it is transmitted or received (either over a communication line or to and from a storage device). When adaptive compression algorithms (lossless or lossy) are employed, a great deal of learning can take place. In other words, the system learns what to data to anticipate based on past history. Information provided by this learning process can in some applications be more important than the benefits of the compression itself. For large scientific data archives, such learning can provide the basis for fast automated browsing. For security logs arising from EC, such algorithms can allow fast and automatic identification of problem areas or high risk locations. For Internet applications, a number of "data mining" methods have already been based on this principle.

Not all computation needs to be done in real time or at high speeds--a system can be augmented with off-line helpers that perform higher level computations at a much slower rate. Carpentieri and Storer [1995] use the helper model to perform standard region classification algorithms on a sub-sampling of video frames and use this information to improve the quality of decompressed video by providing better criteria for the amount of error correction to be used at different points in decompressed frames (borders of classified regions are tracked as well as borders of predicted blocks). Because of the very high throughput rates required for real-time video processing (over a gigabit per second for high definition video), this helper approach has great potential to combine the best of practical hardware implementations of real-time methods and the power of more complex computations. The same principles can be applied to identifying critical regions in data and log files arising in EC. Automated browsing and pre-filtering algorithms can use the helper model in the reverse of the way described above; the helper makes use of compression-based learning software and hardware to aid in a sub-sampling and filtering process to select target portions and detect key events. This is a practical fit to current technology, where an inexpensive processor could make use of the computation performed by expensive real-time video processing and compression hardware.

4.6 Data Access Techniques And Visualization (Owen and Makedon)

This section describes techniques that will be applied to EBITS in order to make its large repository of data accumulated available for study by a large number of researchers. These techniques incorporate tools for data mining, analysis, and visualization, as mentioned in previous sections. How can we make these tools customizable to the various needs of different social scientists and for wide variety of data types? This is one of the questions. The answer is that these tools will be developed as general components for use by many researchers as well as customized "versions" which are classified by category of domain and application and are thus ready for the next user. Data availability to these tools is complicated by the a) lack of standard data formats, b) large volumes of data, and c) necessary processing and storage infrastructure.

Common Data Formats: EC research data includes database transaction logs, email messages, electronic funds transfer logs, electronic data transfer messages, and many other data formats. If each format is treated as a special case, it is be difficult to compare data from distinct sources or to effectively design generalized tools. Instead, this proposal focuses on the development of common, standardized data formats. We propose to use, therefore, standard data format which converts the data to XML documents, as discussed in the data repository section of Part 4. We will work with Adam's team in storing such data in a relational database, as described in (Shapiro and Owen 1999). XML, will be used to support hierarchical markup of content. Data available to this project will be in many forms. Transaction data will typically be divided into standard fields, reflecting the relational database source of the data. Document data will typically be messages with annotation information. Examples include addresses in conjunction with a message body. Financial record data will often consist of common entities in conjunction with lists of entity sets (transaction records in a statement for example). These examples include single entity sets and one-to-many relationships. XML provides a general structure that can be used as a data storage medium. A common XML data format will also require the design of conversion scripts that convert in-coming data into XML. We will develop a conversion script toolkit to simplify this task, allowing simple field selection and naming. An XML DTD will be constructed incrementally for the common format with the goal of providing a common basis for analysis tools. This advantage of choosing XML is that each data-set record can be expressed as a single XML document in a common form. The XML documents will then be stored in a relational database. The purpose of the database is to allow for fast and efficient multi-user access to the data and support for aggregate operations (such as averages and statistical analysis). Indexing fields will be extracted from the XML to form search keys. A critical implementation issue is the selection of a minimal relational decomposition for the XML to table and index mapping. In additional, an inverted word index will support more general content-based searches. Multimedia data such as images and audio will be supported with a vector-model index and initial simple annotation and general feature search tools. The goal of the data storage mechanisms is to provide a general means to access all collected data from a common source.

How to Deal with Large Data Volumes: It is anticipated that very large volumes of data will be accumulated in the course of this research. This large volume has two consequences. A high-performance database system is essential for management of this data. Fortunately, such systems are readily available and industry standard components are proposed in this application (currently Microsoft SQL Server, though any ODBC compatible SQL database will suffice). The other consequence is data access. It is not possible for all data access requirements to be planned in the design of the system. The particular queries that will be needed are, in fact, dependent upon the research needs of the system users and data analysis may require operations beyond the aggregate capabilities of SQL. Several related methods are proposed for data access in this system.

WWW-Based Access and Processing Infrastructure

The simplest access method is to provide World Wide Web (WWW) access to the data using standard queries. Query will be based on examples (partially completed data records), Boolean search forms, or custom-designed database queries in SQL. The content can be delivered in HTML pages, as computed data sets in tab-delimited form, or in the stored XML format with style sheets for presentation. This approach is simple and effective for applications that produce or must examine a limited database result.

The next level of access support is to allow custom programs to execute on a local compute server. A compute server will have direct access to the database (typically accessed with a high-speed network connection). The infrastructure will be designed to support Java server programs. Java has been selected due to its generality and available on a large set of platforms. Analysis programs written in Java need not be custom compiled for a particular server platform. A particular goal of this approach is complete platform independence. As computational needs increase, different platforms with greater performance can be substituted without affecting user programs. A server application toolkit will simplify the development of these applications. These two approaches can be mixed and can include client-side content processing. A server-side analysis application can be provided that communicates with a client-side visualization program. Data is accumulated and analyzed at the server and transmitted to the client which constructs visualization models. Java servlets and Java 3D applets are an ideal combination for this application, allows complete platform independence on both ends of the connection.

Additional support will be provided for mobile agents using the Dartmouth D'Agents software system (Brewington, et al. 1999). Mobile agents are executing programs that can migrate from machine to machine on a heterogeneous network. An advantage of this approach is that the agent can move to the data for processing and return with the results. Mobile agent support is only important if the project has many unique data servers available to the agent, which is not planned at this time. However, the D'Agents system provides a standard architecture for sending application programs to a server for execution. This mechanism is an alternative to the server application toolkit.

Automated Web Site Development Tools

A major element of this project is development and maintenance of a WWW site that will present data, provide for data analysis, and gather results of the many researchers involved in this project. This site must function nearly automatically and be accurate and consistent. Time spent managing large Web sites is not effective research time. The site-level tools developed by Owen and Makedon will be utilized to build and maintain the site (Owen and Makedon 1998). Owen and Makedon have a long track record of expertise and excellence in this domain.

Challenges

A major challenge is access to the data. Simple binary and field-based database queries do not provide sufficient detail as the database grows. More complex queries tend of reduce to scan operations, which are not efficient. Alternative indexing technologies are needed to simplify the search mechanism to allow the search to be efficient. Many researchers will choose to maintain local copies of the database. A data migration mechanism is desired that will allow local copies to be slaved to the master database. For experiment consistency, it is essential that this mechanism allow the researcher to revert to a previous state for experiment validation. It is uncertain how to best support reversion other than simple backup mechanisms. A log-based database system is not practical due to the volume of data required. A major element of this project will be data visualization tools that are general enough to be used for a wide variety of data while still providing useful data views. Specifically, how can multimedia technologies provide advanced data visualization beyond simple 3-D model-based systems? Mapping visualization to compositional multimedia presentations is an interesting new problem.

4.7 Intuitive User Interfaces (Makedon and Owen)

Interactive user interfaces is an important ingredient in making EBITS highly usable, as illustrated in the following example.

Example: A social scientist wishes to study large Visa card logs. To enhance her understanding of what she is seeing, she may choose to activate a visual map providing a graphical history of transactions, with peaks indicating high activity, correlating the peaks to meta-information showing dates or types of goods involved, or linking to geographical/demographic information. The user should be able to retrace her exploration history or record on a "sketchpad annotator" copies of her data with notes. To compare the Visa logs with American Express card logs, she uses a "data comparator" that provides standardized correlations of the data by size, relevance, interval time, conditions, etc. She can alter the comparison conditions in an interactive way as well as retrieve relevant information from EBITS-linked databases by using data mining tools as described in an earlier section. She may wish to store parts of the data in a compressed format, by passing the data through an adaptive compression tool, as described earlier, to check how visible certain patterns remain after compression. She may also wish to pass the data through different data security checks, as described in the section on data security. In all of these choices, the user has at her disposal data-enabling tools for her exploration of "what-ifs". She may then use built-in multimedia tools to present and share the results of her study with other EBITS participants. (See EBITS architecture diagram in Appendix B for reference).

The tools include: (1) Intuitive Interfaces: Support will be needed for navigating through the EBITS information space, and in formulating a search query and in designing new experiments which may use the archived data and software. (2) Retrieval and Data Mining Facilities: To collect additional or "meta" information, traditional tools of text retrieval can be combined with tools developed at Dartmouth for Cross Modal Information Retrieval to provide seamless access to heterogeneous types of data. (3) Authoring: Authoring is an integral part of research and all components for composing multimodal documents will be included. (4) Searching and Browsing: A user will be able to retrieve related information from internal or external databases (see EBITS architecture diagram) by using two types of techniques: searching--where she poses a specific query and gets a set of answers--or browsing--where she is not looking for a specific thing but wants to explore a given set of things. (5) Navigation and Conceptual Clustering: Navigation based on automatic recognition of structure and content of the Web is still not fully automated and requires human intervention to be useful.


5. Management Plan

In this part of the proposal, we outline the main tasks involved in building the EBITS infrastructure. Diagrams depicting the system components and architecture may be useful in understanding the processes involved (see Appendix B).

5.1 Execution Plan

The EBITS project team will consist of the following members (and their students and associates):

Rutgers University Nabil Adam
Vijay Atluri
Jonathan Bick
Marcus Felson
Marc Holzer
E-Commerce
Security
Tenant Law
Criminal Law
Public Administration
Dartmouth College Fillia S. Makedon
Stephen G. Powell
Multimedia
Supply Chain
Northwestern University Peter Scheuermann Data Mining
Michigan State University Charles B. Owen Data Visualization
Brandeis University James A. Storer Compression
University of California, Davis Matt Bishop Security Systems
PriceWaterHouseCoopers Peter A. Gloor E-Commerce

Team Management: Several members of the EBITS team are already collaborating in projects. We have defined the following subteams, in the sense that the members will lead the effort, while other members may simply follow a task. Drawing the line will depend on the application.

Figure B. A time line of the major activities to be undertaken.

Data Maintenance Team: (Adam, Atluri, Owen). Data enabling team: (Scheuermann, Storer, Bishop). Data Visualization Team: Owen, Makedon and Gloor). Data Domain Team: Felson, Bick, Powell, Holzer). Data Application, Dissemination and Evaluation Team: Makedon, Owen, Adam, Scheuermann.

Work is already in progress in defining standards, a glossary of terms and the specification of conditions that define EC fraud at different levels. Our collaboration has led to frequent teleconferences and meetings and we expect to continue working in that fashion. The component in the time line indicated as "Data Understanding" is a very important first step. It is important that we build data structures and tools for data that we understand what they represent and what they can tell us. Working with the domain experts will be essential. We will start by focusing on the four social science EC applications described in Part 3 of this proposal. Each social scientist will meet with the data processing experts, to define type, size, complexity and format of data to be collected. Commonalities will be derived and a standard mechanism of stepwise refinement of the data processing, as defined in Sections 4.1, 4.3 and 4.6. For the "Data Collection" process, there will be the creation of a common set of standards that classify this data as cleared, EBITS use only, public use or unacceptable as high risk or unreliable. As indicated in parts 1,2 and 3, data will come from a variety of sources and we already have three companies committed to assisting us. Data Integration will depend on the class of data and the domain. A Prototype Implementation is expected by the second year of the project making available for testing a large testbed of Supply Chain Data. This will be our "cleanest" set of data. Jointly with Powell and other specialists, we will define criteria of integration and demonstrate it. Evaluation and Dissemination are complex and described in further detail below. The success measures will be robustness (system does not break down with many users or many data), accuracy, easy of use and size of user population. The two processes will be intertwined providing stepwise refinement of the EBITS components involved.

We intend to hold a team meeting at least twice a year. In addition, some of the team members will make extended visits to CIMIC. This team has all the necessary facilities, expertise, high bandwidth (Internet II) communication, and access to excellent students to carry out the set goals.

EBITS Database Management: The database that will be built as an element of this project will be valuable to many researchers. It is essential that this database be consistent and highly available. This will be accomplished through the use of a master server located at Rutgers, software development servers located at Michigan State University and Dartmouth College, and local replication servers that can be produced on demand at other institutions. Other than the master server, the alternative servers be replicants of the master database. Automatic update tools will synchronize these systems on a daily basis during low-traffic hours.

Project Management: CIMIC is the ideal center for managing this project. It has gained a great deal of experience in EC, already has an audience of EC users and students through its EC virtual education program, and it has become a focal point in the area of EC on the Newark campus. It is currently collaborating and drawing expertise and support from (1) academic units within Rutgers, including School of Law, Faculty of Management, School of Criminal Justice, Faculty of Arts and Science and Center of Change and Governance; (2) academic institutions around the world that are part of GCOMNET; (3) industry including IBM Watson research center, IBM Toronto, PricewaterhouseCoopers, Concurrent Technologies Corporation and MFG Systems Corporation that have offered to share their relevant data to this project; and (4) government agencies such as General Services Administration(GSA). CIMIC currently has three full-time research faculty, many faculty associates within and outside Rutgers, 8 graduate and 4 undergraduate students. CIMIC currently has a funding of over $4 million for the next five years and has expertise in many areas including data management, security and environmental science. In particular, CIMIC has been involved in environmental science project MERI, data management issues similar to that of EBITS are being addressed. The data management of MERI involves integration of data collected by experts from many disciplines and provide efficient and user-friendly access to this data for several user communities including policy makers, urban planners, scientists, school children and general public. We envision the data management issues addressed in EBITS are similar in nature to that of MERI. Thus we believe CIMIC is highly capable of making such an interdisciplinary project like EBITS a success.

5.2 Implementation Details: Data Processing & Dissemination

Data will be collected from multiple sources, such as credit card companies, transactions logs, experiments, surveys, trade associations and companies that we will be dealing with. Bank of America, PriceWaterHouseCoopers, Johnson and Johnson, Motorola, Bremer Associates, the American Civil Liberties Union, Trade Unions and other institutions have expressed interest in cooperating with us. Some of the support letters are included in the Appendix. Furthermore, the Tuck School of Business at Dartmouth will provide us with their alumni network of companies and small business programs. (see Appendix for an example of the data format.)

Controlled access: In building a comprehensive resource on EC misuse, we will balance: how to give the user an insider's look at powerful ways of manipulating information, while not compromising the anonymity of the data source. Solving this is the first step to building EBITS. We will construct a layered system of data access for (a) administrators and programmers of EBITS, (b) members of the EBITS research group and (c) registered EBITS users who will have filled out a contract of data preservation and ethical conduct. The dissemination process will consider balancing needs for a democratic access to EC knowledge with the concerns outlined in the previous paragraph. It should be pointed out however that EBITS will be accessible primarily to social and behavioral scientists who register by signing a specific agreement.

Examples of Dissemination: We will provide a "public" version of the Web site to non-registered EBITS users which will include information such as, summaries of on-going and up-to-date discussions or events on EC, educational software, demo data, glossaries, canned experiments, retrievable information on businesses, tables of statistics, legal information, announcements, etc., thus providing a forum of public concerns. Since EC makes possible the distribution of the workspace in time and location (working at different hours and in physically distributed environments), the impact on hiring modes, new opportunities in business for women and minorities, are just examples of topics to be followed and supported by the archive data. Other example topics of research may include changes in how products are sold, exhibited, maintained, altered, thus providing the consumer with new ways to do "interactive" shopping via the Internet. Such new electronic communication modes are enforcing new job requirements, new expectations, and new work patterns, all of which are issues of social and behavioral science research. EBITS will provide an active list of companies, individuals and opportunities in research and development will be published in a newsletter. In addition, using our experience in electronic publishing, EBITS can easily support a publication service for scientific results in electronic journal format (Dartmouth experts have a long track record of electronic and multimedia publishing experience). Self-start programs: Small and minority businesses will participate in the following projects (in cooperation with Dartmouth's Business School and the Rutgers CIMIC): (1) The small business market place: a program that provides partners, test data and a business design. (2) The free-workplace program. (3) The social sciences affiliates program. (4) The EC information technology program. (5) EBITS Minority EC education programs (see Appendix A for details).

5.3 Evaluation

The EBITS prototype will be developed in an iterative fashion. To ensure the prototype meets the needs of our constituent user communities, we propose the following evaluation strategy. First, we will characterize our user communities according to their information requirements, and abilities. We will produce a user taxonomy that will guide further development of EBITS. Traditionally, such needs assessments are conducted through in-depth interviews, however we have been working closely with our user communities for an extended period of time and have gathered significant user requirements as a result. We will formalize this information to determine what potential information needs can be met by our project. Next, we will work closely with the technical experts and developers of the data repository, data warehouse, data mining, security, and to ensure these different components will ultimately be useful for the target user communities. We will follow up with hands-on exposure of members of these groups for interface, and system tools evaluation, and passing users' critical comments to the design team for later versions. We will undertake the evaluation of EBITS from both user-centered perspectives, through heuristic evaluation, small-scale user feedback, and larger-scale user testing. Each group of users will perform a range of activities using the system (e.g., query, data retrieval and local visualization). These tests will be performed in concert with the design team through frequent, systematic reporting mechanisms. The test results will be used to tune the system as well as to collect new user requirements for later incorporation into the design. EBITS will be designed to accommodate and adapt to the information needs and skills of many different user communities simultaneously.

Metrics

The project team is committed to demonstrating and measuring the benefits obtained from the research and development of EBITS. We propose to use the following set of metrics and measures of performance using Output, Outcomes and Impact.

Output: EBITS development will produce a number of different types of outputs including novel research findings and th components of the EBITS test-bed. Intellectually, what is learned from these projects leads to future proposals and scientific publications. Our metrics for outputs include new educational materials that will be created to educators and students. This is an important part of this project and we will generate and count education modules that tie directly to core curriculum standards in the Newark public school system and in other school districts. New decision support products will be made available via EBITS to a wide range of user communities to allow decision makers to interact with EBITS, and to provide user feedback. Publications, Conference Presentations and technical reports that are produced as a result of EBITS activities will be counted.

Outcomes: Outcomes are a measure of lasting effectiveness of the proposed work. Outcomes may result directly from outputs or they may be unintentional consequences of the work. The adoption of the EBITS test-bed as a production system will be an important milestone in the project. Our user communities will be intimately involved in the development of EBITS from the beginning and the test-bed will be used as a decision making and educational tool. We will track the usage of EBITS to gauge its long term adoption. We will also track the long term adoption of educational modules based on EBITS by working with the regional schools through our outreach efforts we will solicit and report on feedback from the user community.

Impact: Impact is the societal consequences of the project's outcomes. The magnitude of the impact is tied closely with the ability to communicate the results and ultimate utility of the project outputs, to educate other organizations and to share our knowledge. The metrics we will use to measure the on user communities in the application domains. The impact of our work within our application domains will manifest themselves in a number of ways. For example, policy makers will make more informed decisions and educators will have new and innovative tools for teaching. Our close relationships with these user communities will allow us to gauge the long term impact of EBITS in these application domains. CIMIC regularly participates in outreach efforts to educate the general public on how research work affects society, and thus will have a direct impact on the general public.


6. References

6.1 References (E-Commerce)

  • E-commerce Website: http://midir.ucd.ie/~lkelly/tech.html

  • Tewari, R., Vin, H. M., Dan, A., and Sitaram, D.: Resource-based caching for Web servers. Proc. of SPIE Multimedia Computing and Networking 1998, pp. 191-204, San Jose, California, 1/98.

  • Choi, S.-Y., Stahl, D. O., and Whinston, A. B.: Economics of Electronic Commerce. The Essential Economics of Doing Business in the Electronic Marketplace, Macmillan Technical Pub, 1997.

  • Kalakota, R. and Whinston, A. B.: Electronic Commerce: A Manager's Guide. Addison-Wesley Publishing, 1997.

  • Charles, C. A. (Editor), et al.: Globalizing Electronic Commerce: Report on the International Forum on Electronic Commerce, Beijing, China, 20-21 March 1996.

  • McKeown, P. G., et al.: Metamorphosis: A Guide to the World Wide Web & Electronic Commerce: Version 2.0. 1997.

  • NSF Workshop on E-Commerce Sponsored. Computation and Social Systems (CSS) Program NSF, Sept. 10-12, 1998, At The IC2 Inst. The Univ. of Texas at Austin, Austin, Texas http://cism.bus.utexas.edu/

  • Keen, P. G. W., and Balance, C.: On-Line Profits: A Manager's Guide to Electronic Commerce, 1997.

  • Mougayar, W.: Opening Digital Markets: Advanced Strategies for Internet-Driven Commerce. 1996.

  • Mougayar, W.: Opening Digital Markets: Battle Plans and Business Strategies for Internet Commerce. McGraw-Hill; ISBN: 0070435421

  • Minoli, D., Minoli, E.: Web Commerce Handbook. McGraw-Hill Series on Computer Communication, 1997.

  • Kosiur, David R.: Understanding Electronic Commerce. Strategic Technology Series, Microsoft Press; ISBN: 1572315601

  • 6.2 References (Criminal Patterns In Urban Devopment)

  • Felson, Marcus. Crime and Everyday Life. Second Edition. Thousand Oaks, CA: Pine Forge Press. 1998.

  • Brnatingham, P.L. and Brantingham, P.J. Criminality of place: Crime generators and crime attractors. European Journal of Criminal Policy and Research. v. 3, pp 5-26, 1995.

  • Rengert, George. The geography of illegal drugs. Boulder, CO: Westview. 1996.

  • 6.3 References (Web Site Tenant Rights Violations)

  • Sweet, C.: Look who's on the net now. Sacramemto Bee (McClatchy Newspapers, Inc.) 12/30/98,

  • Everitt, L.: Verio picks 1999 as 'the year'. The Rocky Mountain News (Denver, Co.) Copyright 1998 Denver Publishing Company 12/28/98, SECTION: BUSINESS; Ed. F; Pg. 1B which stated "(Verio) arranged to become the world's largest domain-based Web hosting company with a $ 257 deal to acquire Hiway Technologies and a $ 45.5 million deal to acquire TABNet, which together will give it 230,000 Web hosting tenants--some 10 percent of all domain names registered with InterNIC."

  • Young: Equivocation in Agreements, 64 Colum. L. Review 619 (1964).

  • Weaver v. American Oil Co. 276 N.E.2d 144.

  • 6.4 References (Improper Transactions In Supply Chains)

  • Powell, S., Bourland, K., and Pyke, D.: Exploiting Timely Demand Information to Reduce Inventories, European Journal of Operational Research, Vol. 92, Issue 2, 1996.

  • Powell, S., and Ernst, R.: Manufacturer Incentives to Improve Retail Service Levels, European Journal of Operational Research, Vol. 104, No. 3, 1998.

  • Powell, S., and Fleisch, E.: On the Value of Information in a Business Network, Draft 3.0, June 15, 1998, submitted to the Journal of Organizational Computing and Electronic Commerce.

  • PwC: http://www.pwcglobal.com/

  • http://www.pwcglobal.com/gx/eng/ins-sol/spec-int/supply/index.html

  • http://www.pwcglobal.com/extweb/newcojou.nsf/docidmanagement/
    507F96D64A6B3D538525662F0061E59B?OpenDocument

  • 6.5 References (Knowledge Discovery In Electronic Commerce Systems)

  • U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy ed. Advances in Knowledge Discovery and Data Mining. AAAI Press: Menlo Park, CA, 1996.

  • R. Kimball. The Data Warehouse Toolkit. John Wiley & Sons, Inc.: New York, 1996.

  • R. Agrawal, T. Imielinski, and A. Swami, Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD, pp. 207-216, 1993.

  • E. Knorr and R. Ng, Algorithms for mining distance-based outliers in large datasets. In Proceedings of Very Large Data Bases, pp. 392-403, 1998.

  • L. Singh, B. Chen and P. Scheuermann: Generating Association Rules from Semi-Structured Documents using an Extended Concept Hierarchy. Proc. Intern. Conf. On Information and Knowledge Management (CIKM), pp. 193-200, 1997.

  • 6.6 References (Data Security Technology To Guard Against Attacks)

  • S. Templeton, Session Characterization and Data Reduction for Misuse Detection, Department of Computer Science, UC Davis (Apr. 23, 1998); draft technical report

  • J. Hochberg, K. Jackson, et al., Addressing the Insider Threat, Proceedings of the Department of Energy Computer Security Group Conference (1993)

  • J. Anderson, Computer Security Threat Monitoring and Surveillance, J. P. Anderson Co., Fort Washington, PA (1980)

  • M. Bishop, A Standard Audit Log Format, Proc. of the 19th National Information Systems Security Conf. (10/95)

  • J. Frank, Artificial Intelligence and Intrusion Detection: Current and Future Directions, Proceedings of the Seventeenth National Computer Security Conference (1994).

  • P. Helman, G. Liepens, Statistical Foundations of Audit Trail Analysis for the Detection of Computer Misuse, IEEE Transactions on Software Engineering 19(5) (Sep. 1993)

  • M. Bishop, S. Cheung, C. Wee, J. Frank, J. Hoagland, and S. Samorodin, The Threat from the Net IEEE Spectrum, 34(8) (Aug. 1998)

  • M. Bishop, A Standard Audit Log Format. Proc. of the 1995 National Information Systems Security Conference. Baltimore, Maryland, October 10-13, 1995, pp. 136-145. (URL is http://seclab.cs.ucdavis.edu/~bishop/scriv/1995-nissc18.pdf)

  • 6.7 References (Data Compression)

  • Storer, J., and Reif, J.: Error Resilient Optimal Data Compression, SIAM Journal of Computing 26:4,934-939, 1997.

  • Storer, J., and Helfgott, H.: Lossless Image Compression by Block Matching, The Computer Journal 40:2/3, 137-145, 1997.

  • Storer, J., and Carpentieri, B.: A Video Coder Based on Split-Merge Displacement Estimation, Journal of Visual Communication and Visual Representation 7:2, 137-143, 1996.

  • Storer, J., and Constantinescu, C.: Improved Techniques for Single-Pass Vector Quantization, Proceedings of the IEEE 82:6, 933-939, 1994; an extended abstract of this paper appeared in the Proceedings DCC 1994, 410-419.

  • 6.8 References (Data Access And Visualization, User Interfaces)

  • Brewington, B., R. Gray, K. Moizumi, D. Kotz, G. Cybenko, and D. Rus.: Mobile agents in distributed information retrieval, In Matthias Klusch, editor, Intelligent Information Agents, chapter 12. Springer-Verlag, 1999.

  • Owen, C. and F. Makedon, ASML: Automatic site markup language, Multimedia Tools and Applications, Volume 17, 113-139, 1998.

  • Shapiro, N. and C. Owen, Breaking the shackles of the physical page: Site level authoring for XML using ASML, in Proceedings of the WebNet 99 World Conference of the WWW, Internet, and Intranet, October 25-30, 1999, Honolulu, HI, in submission.

  • Makedon, F. and C. Owen. Cross-Modal Retrieval of Scripted Speech Audio. Proc. of Multimedia Computing and Networking 1998, SPIE'98 San Jose, CA, January 26-28, 1998.

  • Makedon, F., J. Ford, C. Owen, and S. Rebelsky. Interactive Multimedia Publishing Systems. Chapter in Multimedia Tools and Applications, Borko Furht, ed., Kluwer Academic Press, 1996.

  • 6.9 Other Related References

  • David Neal. Internet explorer. DATAServ Inc. Presentation, http://www.redcreek.net/presentations/rclibrary/browser/index.htm,WorldWideWeb.

  • Lucy Terry Nowell, Robert K. France, Deborah Hix, Lenwood S. Heath, Edward A. Fox. Visualizing search results: Some alternatives to query-document similarity. In Proc. of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 67{75, Zurich, Switzerland, August 1996.

  • Renu Tewari, Harrick M. Vin, Asit Dan, and Dinkar Sitaram. Resource-based caching for Web servers. In Proc. of SPIE Multimedia Computing and Networking 1998, pages 191{204, San Jose, California, January 1998.

  • Diane Vizine-Goetz. Using library classi_cation schemes for internet resources. OCLC Internet Cataloging Project Colloquium Position Paper, http://www.oclc.org/oclc/man/colloq/v-g.htm, World Wide Web.

  • Computed Synchronization for Multimedia Applications, (F. Makedon, Charles B. Owen), Kluwer Academic Publishers, book to appear 1999.

  • Conference on a Disk: An Experiment in Hypermedia Publishing, (F. Makedon, P. Metaxas, J. Matthews, P. Gloor, M. Cheyney, D. Johnson), J. of Communication of ACM, January 1996, pp. 51-60.

  • Cross-modal Information Retrieval, (F. Makedon, C. Owen), Chapter in Handbook of Multimedia Computing, CRC Press, 1998.

  • Classification and characterization of digital watermarks for multimedia data, (F. Makedon, J. Ford and C. Owen), Chapter in Handbook of Multimedia Computing, CRC Press, 1998.

  • Cross-Modal Retrieval of Scripted Speech Audio, (F. Makedon, C. Owen), in Proc. of Multimedia Computing and Networking 1998, SPIE'98 San Jose, CA, January 26-28, 1998. To appear.

  • Detecting Lip Motion in Digital Video, (F. Makedon, Jim Shain and C. Owen), SPIE: International Symposium on Voice, Video and Data Communications Conf.: Multimedia Systems and Applications. Boston, Nov. 1998.

  • Digital Money: The New Era of Internet Commerce. Daniel C. Lynch, Leslie Lundquist,1995. John Wiley & Sons.

  • The Digital Economy: Promise and Peril in the Age of Networked Intelligence. Don Tapscott, 1995.

  • The Distributed Mind: Achieving High Performance Through the Collective Intelligence of Knowledge Work Teams. Kimball Fisher, Maureen Duncan Fisher, Mareen Duncan Fisher AMACOM; ISBN: 0814403670

  • Documentation Multitargeting Using ASML and Javascript, (F. Makedon, C. Owen. M. Sasles, T. Prezio), invited paper to J. Network Computer Applications (to appear 1999), also Best Paper Award in WebNet World Conf. (November 1998, Orlando)

  • The Economics of Electronic Commerce. Andrew B. Whinston, Dale O. Stahl, Soon-Yong Choi, Macmillan Technical Publishing; ISBN: 1578700140

  • Economics of Electronic Commerce. The Essential Economics of Doing Business in the Electronic Marketplace. Soon-Yong Choi, Dale O. Stahl and Andrew B. Whinston, Macmillan Technical Publishing, 1997.

  • Electronic Commerce: A Manager's Guide. Ravi Kalakota and Andrew B. Whinston, Addison-Wesley Publishing, 1997.

  • Electronic Multimedia Publishing: Enabling Technologies and Authoring Issues, F. Makedon, and S. Rebelsky editors, Book. Kluwer Academic Press (1998) ISBN 0-7923-8108-4.

  • Electronic Publishing and the Information Superhighway. Birkhauser: Boston, MA, 1995. J. Ford, F. Makedon, and S. Rebelsky (editors).

  • Enabling Technologies for Museums of the Future, (F. Makedon, J. Ford, C. Langmead, C. Owen, S. Rebelsky), Journal of Universal Computer Science (JUCS), invited paper, in preparation.

  • Exploring IBM's Bold Internet Strategy. Jim Hoskins, Vincent Lupiano, 1997.

  • Futurework: Putting Knowledge to Work in the Knowledge Economy. Charles D. Winslow, William L. Bramer (Contributor)Free Pr; ISBN: 0029354153

  • Frontiers of Electronic Commerce; Ravi Kalakota and Andrew B. Whinston, Addison-Wesley Publishing, 1996

  • From Edi to Electronic Commerce: A Business Initiative. Phyllis, K. Sokol, Hardcover, 305 pages, McGraw Hill Text, 1995.

  • Globalizing Electronic Commerce: Report on the International Forum on Electronic Commerce, Beijing, China, 20-21 March 1996. Carol Ann Charles (Editor), et al., 1996.

  • Hear Homer: A Multimedia-Data Access Resource Prototype for Ancient Texts, (F. Makedon, C. Owen, M. Owen, J. Ford, C. Metaxaki-Kossionides, and T. Steinberg), in Proc. of ED-MEDIA'98 World Conf. on Educational Multimedia and Hypermedia, June 20-25, 1998, Freiburg, Germany. In AACE Conf. Proc..

  • Information Ecology: Mastering the Information and Knowledge Environment. Thomas H. Davenport, Laurence Prusak (Contributor) 1997 Oxford Univ. Pr (Trade); ISBN: 0195111680

  • Innovation Explosion: Using Intellect and Software to Revolutionize Growth Strategies. James Brian Quinn, et al ISBN: 0684833948 ;

  • Innovation Strategy for the Knowledge Economy: The Ken Awakening (Business Briefcase Series). Debra M. Amidon Butterworth-Heinemann (Trd); ISBN: 0750698411

  • Intellectual Capital. Annie Brooking ntl Thomson Pub Education Group; ISBN: 1861520239

  • Intellectual Capital: Realizing Your Company's True Value. Finding Its Hidden Roots. Leif Edvinsson, Michael S. Malone (Contributor) Harperbusiness; ISBN: 0887308414

  • Intelligent Enterprise: A Knowledge and Service Based Paradigm for Industry. James Brian Quinn Free Pr; ISBN: 0029256151

  • Interactive Multimedia Publishing Systems, (F. Makedon, J. Ford, C. Owen, and S. Rebelsky), Chapter in Multimedia Tools and Applications, Borko Furht, ed., Kluwer Academic Press, 1996.

  • Inter-Corporate Business Engineering: Streamlining the Business Cycle from End to End. Gary G. Benesko, 1996. http://www.amazon.com

  • The Knowledge Evolution: Expanding Organizational Intelligence. Verna Allee Butterworth-Heinemann (Trd); ISBN: 075069842X

  • Knowledge Management and Organizational Design (Resources for the Knowledge-Based Economy). Paul S. Myers (Editor) Butterworth-Heinemann (Trd); ISBN: 0750697490 ;

  • Knowledge Management Tools (Resources for the Knowledge-Based Economy) Rudy L. Ruggles (Editor) Butterworth-Heinemann (Trd); ISBN: 0750698497

  • Knowledge in Organizations (Resources for the Knowledge-Based Economy) Laurence Prusak (Editor), Butterworth-Heinemann (Trd); ISBN: 0750697180

  • Metamorphosis: A Guide to the World Wide Web & Electronic Commerce: Version 2.0. Patrick G. McKeown, et al., 1997.

  • METU-Emar: An Agent-Based Electgronic Marketplace on the Web. A. Dogac, I. Durusoy, S. Arpinar, E. Gokkoca, N. Tatbul and P. Kosal. Second European Conference, ECDL'98, Crete, Greece, 9/98, Lecture Notes in Computer Science, No. 1513, Springer Verlag.

  • Multimedia Data Analysis in Automating the Analysis of Human Communication, (F. Makedon, C. Owen), Invited Paper, in Proc. of the3rd Panhellenic Conf. F. Makedon, International Participation: Didactics of Mathematics and Informatics in Education, Patras, Greece, 5/11/97

  • Multimedia Publishing Systems, (F. Makedon, C. Owen and S. Rebelsky), Chapter in Handbook of Multimedia Computing, CRC Press, 1998, to appear.

  • Multiple Media Stream Data Analysis, (F. Makedon, C. Owen) in Data Highways and Information Flooding, a Challenge for Classification and Data Analysis, Springer-Verlag, 1997.

  • Multiple Media Stream Data Analysis: Theory and Applications, (F. Makedon, C. Owen), in Proc. of Gesellschaft für Klassifikation e.V., Univ. of Potsdam, Potsdam, Germany, 1997.

  • Multimedia-based Learning and Museums: Issues and Enabling Tools,, (F. Makedon, J. Ford, C. Langmead, C. Owen, and S. Rebelsky), The Consortium for Computing in Small Colleges Second Annual Northeastern Conf., Boston, MA, April 25-26, 1997.

  • Multimedia Stimulus Tracking for Functional MRI, (F. Makedon, J. Ford, C. Owen, A. Saykin and T. Steinberg), in ACM Multimedia'98, Sept., Bristol, England.

  • NSF Workshop on E-Commerce Sponsored. Computation and Social Systems (CSS) Program National Science Foundation September 10-12, 1998, At The IC2 Institute The Univ. of Texas at Austin, Austin, Texas http://cism.bus.utexas.edu/

  • The New Organizational Wealth: Managing & Measuring Knowledge-Based Assets. Karl Erik Sveiby Berrett-Koehler Pub; ISBN: 1576750140

  • Obstacles in Web Multimedia Publishing: Bringing Conf. Proc. On-line, (F. Makedon, P. Gloor and O. Van Ligten),Chapter in special issue on Multimedia Authoring, Issues on Electronic Multimedia Publishing, Kluwer Press. Also in Journal of Multimedia Tools and Applications.

  • On-Line Profits: A Manager's Guide to Electronic Commerce, Peter G. W. Keen, Craigg Ballance, 1997.

  • On Multimedia Signatures, an Enabling Technology for Web-Supported Instruction, (F. Makedon,C. Owen and J. Ford), in Proc. of ED-MEDIA'98 World Conf. on Educational Multimedia and Hypermedia, June 20-25, 1998, Freiburg, Germany. To appear.

  • Open EDI and Law in Europe: A Regulatory Framework. Andreas Mitrakas, EURIDIS, Erasmus Univ., Rotterdam. Kluwer Law International, The Hague, August 1997, pp 343. http://www.amazon.com

  • Opening Digital Markets: Advanced Strategies for Internet-Driven Commerce. Walid Mougayar, 1996.

  • Opening Digital Markets: Battle Plans and Business Strategies for Internet Commerce. Walid Mougayar McGraw-Hill; ISBN: 0070435421

  • Opening Digital Markets: MBA Strategies for Internet-Driven Commerce; Walid Mougayar, 1997. http://www.amazon.com

  • Organizing and Operating Digital Products Companies. Anitesh Barua, Ramnath Chellappa and Andrew B. Whinston, Addison-Wesley, forthcoming, 1997.

  • Parallel Text Alignment, (F. Makedon, C. Owen, J. Ford, and T. Steinberg), Digital Libraries Conf., Crete, 9/1998

  • Process Innovation: Reengineering Work Through Information Technology;Thomas H. Davenport1992

  • Readings in Electronic Commerce edited. Ravi Kalakota and Andrew B. Whinston, Addison-Wesley Publishing, 1997.

  • Resource-Limited Hyper-Reproductions (F. Makedon, S. Rebelsky), J. of Multimedia Tools and Applications, to appear, 1998.

  • The Roles of Video in the Design, Use, and Construction of Interactive Electronic Conf. Proc.. (F. Makedon, S. A. Rebelsky, P. Gloor, P. T. Metaxas, J. Ford, C. Owen). In submission to JUCS, the Journal of Universal Computer Science

  • Secure Commerce on the Internet. Vijay Ahuja Understanding Electronic Commerce (Strategic Technology Series). David R. Kosiur, 1997.

  • The Squandered Computer: Evaluating the Business Alignment of Information Technologies. Paul A. Strassmann Information Economics Press; ISBN: 0962041319

  • Working Knowledge: How Organizations Manage What They Know. Thomas H. Davenport, Laurence Prusak, Lawrence Prusak, Harvard Business School Pr; ISBN: 0875846556

  • Internet Commerce. Andrew Dahl, et al., 1996.

  • Web Commerce Handbook (McGraw-Hill Series on Computer Communication). Daniel Minoli, Emma Minoli, 1997.

  • Security Protocols: International Workshop Cambridge, United Kingdom April 10-12, 1996: Proc. (Lecture Notes in Computer Science, 1189). Mark Lomas (Editor), 1997.

  • Understanding Electronic Commerce (Strategic Technology Series). David R. Kosiur Microsoft Press; ISBN: 1572315601


  • A. Appendix: Implementation Details & Dissemination

    A.1 Data Processing

    Data will be collected from multiple sources, such as credit card companies, transactions logs, experiments, surveys, trade associations and companies that we will be dealing with. Bank of America, PriceWaterHouseCoopers, Johnson and Johnson, Motorola, Bremer Associates and other institutions have expressed interest in cooperating with us. Some of the support letters are included in the Appendix. Furthermore, the Tuck School of Business at Dartmouth will provide us with their alumni network of companies and small business programs.

    The EBITS archived data will be available in multiple formats for diverse users (e.g., novices or experts). A user will be able to visualize a history of gigabytes of credit card logs with critical points highlighted, or she can explore "would have been" possibilities where the scale of transactions may change. Our security experts (Bishop et al) will apply different algorithms to various types of log data to automatically extract additional information that will become part of the resource (based on what a user is looking for, or the design of the log extraction). Another way we will investigate log data is to look what the vendors give us in their logs and apply programs to process these logs in a useful format. In other words, we will be translating computer and hard to read data into meaningful patterns. Explicit examples of log file entries from Bishop's system show:

    Feb 23 10:40:57 nob sendmail[15520]: KAA15520: from=bishop, size=110, class=0,
    	pri=30110, nrcpts=1, msgid=<199902231840.KAA15520@nob.cs.ucdavis.edu>,
    	relay=bishop@localhost
    
    Feb 23 10:40:57 nob sendmail[15522]: KAA15520: to=jdkrovoza@ucdavis.edu,
    	ctladdr=bishop (917/20), delay=00:00:00, xdelay=00:00:00, mailer=relay,
    	relay=baton.cs.ucdavis.edu. [169.237.6.6], stat=Sent (KAA13518 Message
    	accepted for delivery)
    

    This says he sent a letter at 10:40:57 on Feb 23 to 1 recipient (first entry), and it was passed on to the relay host baton.cs.ucdavis.edu for delivery to jdkrovoza@ucdavis.edu, and that baton is a relay, not the final host. This is an application-level log (the mail program is making it). At the system level, there would be a large number of entries for writing, connecting to the remote host, reading and writing traffic to it, checking various system statuses, etc. On a backbone host, we're easily talking about hundreds of megabytes per day. If the host is a router, the log entries are much smaller (typically, source and destination, and time and maybe size. For a transaction, one would need to log enough to reconstruct the transaction and would want it at both the application and system level. So, the precise size of the log depends on the transaction's complexity, but would be one entry per high-level action and on the order of 5-10 low-level entries.

    A.2 Dissemination Public Service By-Products

    Building a comprehensive resource to research ways of preparing the citizen on the dangers of EC misuse will need to address two considerations: how to give the user an insider's look at powerful ways of manipulating information, while at the same time not compromising the anonymity or privacy of the data source. Solving this problem is the first step to building EBITS. We will construct, therefore, a layered system of data access for (a) administrators of EBITS, (b) members of the EBITS research group and (c) registered EBITS users who will need to fill out a contract of data preservation and ethical conduct.

    The dissemination process will consider balancing needs for a democratic access to EC knowledge with the concerns outlined in the previous paragraph. It should be pointed out however that EBITS will be accessible primarily to social and behavioral scientists who register by signing a specific agreement.

    EBITS will serve a more "public service" role by providing a site to non-registered users that includes summaries of on-going and up-to-date discussions on EC, educational software, demo data, glossaries, canned experiments, retrievable information on businesses, tables of statistics, legal information, and announcements. EBITS will also support a forum of public issues. Example topics include:

    • How to prevent socio-economic inequities resulting from Electronic Commerce effects.

    • Changing working modes or distributed work places.

    • Widening information-literacy gaps affecting minorities, women, disabled, small businesses.

    EC makes possible the distribution of the workspace in time and location. This allows people to work at different hours and in physically distributed environments. This in turn changes the mode of hiring and may increase the opportunities for women and minorities in the world of business. It also changes how products are sold, exhibited, maintained, altered. The consumer has the chance to compare and inquire while shopping via the Internet. New electronic communication modes are enforcing new job requirements, new expectations, and new work patterns. This impacts an array of relations between small businesses with little technical infrastructure and big businesses who are at the forefront of technology. Social and economic gaps currently affecting minorities, women and the disabled may improve in certain cases with technology. Countries with technological infrastructure may have an easier time adapting and participating in the global economy. EBITS can provide an ongoing catalyst for interdisciplinary research and discussion otherwise not feasible. Research projects relating to these issues will link academia to business and the public sector. There will be an emphasis on problem solutions across domains rather than within a domain.

    A.3 EBITS Self-Start Programs

    EBITS will offer opportunities of new partnerships and a broader participation of small businesses, minorities and women, health-care personnel, disabled individuals and technically deprived institutions. Examples of self-start programs supported include:

    1. Small Business Market Place: a program that provides partners, test data and a business design that re-engineers the current transaction modes of a small business to fit an EC model. It provides a user community, evaluation criteria and technical tools that promote data and user security.

    2. The Free Workplace Program: This program takes individuals wishing to participate on a one to one basis through several test-cases of businesses and then helps them build their own business, while at the same time offering them support and consultation for up to one year.

    3. The Social Sciences Affiliates Program: this program is specifically designed to study real cases which are motivated and set up by social and behavioral scientists and which have the potential to lead to changes in public policy. Hands-on cases of privacy and copyrights violations are studied.

    4. The Information Technology Program: This is a program designed to follow a thread of information technology development on a case by case basis. This participant is linked to a group of resources and experts who guide him through a program that includes training, experimentation and development.

    A.4 EBITS Minority Education Programs

    Only an educated electronic commerce populace will significantly ameliorate the dangers of electronic commerce. How can we educate users (and corporations) as to what constitutes an attack, and how to respond to an attack? How can we educate system administrators to use the tools currently available to analyze logs, and how can we create new, user-friendly and user-simple interfaces to tools that analyze logs? EBITS will enable social scientists to answer some of these questions, by having data and mechanisms of manipulating the data.

    At the same time, these technological development generate problems which need to be studied such as loss of privacy, copyright infringement, and other violations for which a certain level of education is needed. The best way to achieve this education is with simulation tools that social scientists can use to ask "what if questions". In cooperation with the CIMIC center at Rutgers, education programs will be designed (already an MS in EC is administered by two investigators, Adam and Yesha). Items of education priority regarding EC are:

    1. Retraining of employees in health management.

    2. Career management and counseling for minority-businesses

    3. Informational mechanisms to ensure new means of earning

    4. Testbeds for experimenting with new products and new business ideas

    5. Training on evaluation and feedback for small venture experiments in electronic transactions

    6. Tutorials on data protection, security, encryption, authoring new multimedia documents and other.


    B. Appendix: EBITS System Diagrams


    C. Appendix: Intuitive User Interfaces

    The example in section 4.7 illustrates the need for intuitive interfaces that can guide the user through the EBITS information space. Support will also be needed in formulating a search query and in designing new experiments which may use the archived data and software. All the complex software will be invisible to the user. Depending on the desired parameters of the search (i.e., user expertise or complexity of domain) and the features of the data (size and format), a taxonomy of stored data will be provided as well as a glossary of EC terms.

    Retrieval and Data Mining Facilities

    To collect additional or "meta" information, traditional tools of text retrieval can be combined with tools developed at Dartmouth for Cross Modal Information Retrieval to provide seamless access to heterogeneous types of data (such as searching for a segment of speech through its related textual transcript, or searching for a video segment through a video frame). These tools allow a user to query for information "across modalities".

    Authoring

    Authoring is an integral part of research and all components for composing multimodal documents (which include data, statistics, tables, audio, graphics, and other) will be included. The processes of Authoring and Retrieval will be integrally associated.

    Searching and Browsing

    A user will be able to retrieve related information from internal or external databases (see EBITS architecture diagram) by using two types of techniques: searching--where she poses a specific query and gets a set of answers--or browsing--where she is not looking for a specific thing but wants to explore a given set of things. Different search engines will index text and provide URLs whose associated text matches search text. However, since search engines do not make it convenient to assess the contents of Web materials quickly and interactively, we will also use "Browse Engines," a term coined at the Dartmouth Experimental Visualization Laboratory (DEVLAB), to describe a class of tools designed to index Web material fast, but their presentation includes formatting, links, and reduced-quality graphics. This makes it possible to evaluate Web materials as they appear, and to browse search results by following links between documents--all without incurring the overhead of online connections to retrieve materials. Results of searches and browsing will be visualized for the user with the results ranked. Rankings can be done by using various parameters. Clustering results and displaying descriptions and representatives of each cluster is another potentially useful technique for organizing and presenting results.

    Navigation and Conceptual Clustering

    Navigation based on automatic recognition of structure and content of the Web is still not fully automated and requires human intervention to be useful. Automatic organization of retrieved or authored multimedia information requires tools for clustering (grouping) related information in a multimodal sense: not just images that look alike but images and text segments that describe similar things. However, multimodal clustering is still an unsolved problem. Cybermap is a tool has been partially developed at Dartmouth (Gloor et al) that automatically generates overview maps for textual documents. It creates a graph of a collection of nodes by clustering related documents by content into nodes as well as automatically generating links between semantically-related nodes. The resulting graph can be viewed in multiple representations, providing for quick access to information and data filtering in the Web. Cybermap is useful in organizing information and will include: a filter that extracts keywords based upon which clustering can be performed; a map-drawing facility for recording the history of searching or browsing performed; a security mechanism for checking authenticity of multimedia data added to a cluster; requests with certain traits fall into different categories and alarm systems go off when a particular situation is reached; a visual interface that allows private and public annotations next to each cluster or automatic linking to a commercial (such as medical/pharmaceutical) atlas; a help facility for altering the parameters based on which clustering is done or changing the average size of the cluster, or changing the types of data acceptable.


    EBITS   [Home]   [News]   [People]   [Projects]   [Publications]   [Facilities]

    DEVLABserver   [Home]   [News]   [People]   [Projects]   [Publications]   [Facilities]

    This experimental web server is part of the DEVLAB,
    which in turn is part of the Department of Computer Science at Dartmouth College.

    This page is maintained by devlab@cs.dartmouth.edu.