NetSANI (network trace sanitization) project (2009-2012)

This project is no longer active; this page is no longer updated.

Related projects: [CRAWDAD], [DIST], [MAP], [Wi-Fi-measurement]

Related keywords: [data], [privacy], [wifi]


Summary

The NetSANI (Network Trace Sanitization and ANonymization Infrastructure) project aimed to increase network-trace sharing by making it safer and easier to sanitize network traces (remove sensitive information). Sanitization always involves a challenging trade-off between sanitization effectiveness (providing anonymity for network users and secrecy for network operational information) and research usefulness (since only the information retained can be used by the researcher).

To this end, the NetSANI goal was to be a flexible and extensible suite of software tools for sanitizing network traces, based on user-specified sanitization goals and user-specified research goals. We never quite achieved that goal, but we conducted some anonymization (and de-anonymization) research.

People

Keren Tan, Chris McDonald, Jihwang Yeo, Phil Fazio, Guanhua Yan, and David Kotz.

Funding and acknowledgements

Funded by the US National Science Foundation (Cyber Trust) under award CNS-0831409.

The views and conclusions contained on this site and in its documents are those of the authors and should not be interpreted as necessarily representing the official position or policies, either expressed or implied, of the sponsor(s). Any mention of specific companies or products does not imply any endorsement by the authors or by the sponsor(s).


Papers (tagged 'netsani')

[Also available in BibTeX]

Papers are listed in reverse-chronological order; click an entry to pop up the abstract. For full information and pdf, please click Details link. Follow updates with RSS.

2012:
Phillip A. Fazio, Keren Tan, and David Kotz. Effects of network trace sampling methods on privacy and utility metrics. Proceedings of the Annual Workshop on Wireless Systems: Advanced Research and Development (WISARD). January 2012. [Details]

Researchers choosing to share wireless-network traces with colleagues must first anonymize sensitive information, trading off the removal of information in the interest of identity protection and the preservation of useful data within the trace. While several metrics exist to quantify this privacy-utility tradeoff, they are often computationally expensive. Computing these metrics using a sample of the trace could potentially save precious time. In this paper, we examine several sampling methods to discover their effects on measurement of the privacy-utility tradeoff when anonymizing network traces. We tested the relative accuracy of several packet and flow-sampling methods on existing privacy and utility metrics. We concluded that, for our test trace, no single sampling method we examined allowed us to accurately measure the tradeoff, and that some sampling methods can produce grossly inaccurate estimates of those values. We call for further research to develop sampling methods that maintain relevant privacy and utility properties.

2011:
Keren Tan. Large-scale Wireless Local-area Network Measurement and Privacy Analysis. PhD thesis, August 2011. Available as Dartmouth Computer Science Technical Report TR2011-703. [Details]

The edge of the Internet is increasingly becoming wireless. Understanding the wireless edge is therefore important for understanding the performance and security aspects of the Internet experience. This need is especially necessary for enterprise-wide wireless local-area networks (WLANs) as organizations increasingly depend on WLANs for mission-critical tasks. To study a live production WLAN, especially a large-scale network, is a difficult undertaking. Two fundamental difficulties involved are (1) building a scalable network measurement infrastructure to collect traces from a large-scale production WLAN, and (2) preserving user privacy while sharing these collected traces to the network research community. In this dissertation, we present our experience in designing and implementing one of the largest distributed WLAN measurement systems in the United States, the Dartmouth Internet Security Testbed (DIST), with a particular focus on our solutions to the challenges of efficiency, scalability, and security. We also present an extensive evaluation of the DIST system. To understand the severity of some potential trace-sharing risks for an enterprise-wide large-scale wireless network, we conduct privacy analysis on one kind of wireless network traces, a user-association log, collected from a large-scale WLAN. We introduce a machine-learning based approach that can extract and quantify sensitive information from a user-association log, even though it is sanitized. Finally, we present a case study that evaluates the tradeoff between utility and privacy on WLAN trace sanitization.

Phil Fazio, Keren Tan, Jihwang Yeo, and David Kotz. Short Paper: The NetSANI Framework for Analysis and Fine-tuning of Network Trace Sanitization. Proceedings of the ACM Conference on Wireless Network Security (WiSec). June 2011. [Details]

Anonymization is critical prior to sharing wireless-network traces within the research community, to protect both personal and organizational sensitive information from disclosure. One difficulty in anonymization, or more generally, sanitization, is that users lack information about the quality of a sanitization result, such as how much privacy risk a sanitized trace may expose, and how much research utility the sanitized trace may retain. We propose a framework, NetSANI, that allows users to analyze and control the privacy/utility tradeoff in network sanitization. NetSANI can accommodate most of the currently available privacy and utility metrics for network trace sanitization. This framework provides a set of APIs for analyzing the privacy/utility tradeoff by comparing the changes in privacy and utility levels of a trace for a sanitization operation. We demonstrate the framework with an quantitative evaluation on wireless-network traces.

Phillip A. Fazio. Effects of network trace sampling methods on privacy and utility metrics. Technical Report, June 2011. [Details]

Researchers studying computer networks rely on the availability of traffic trace data collected from live production networks. Those choosing to share trace data with colleagues must first remove or otherwise anonymize sensitive information. This process, called sanitization, represents a tradeoff between the removal of information in the interest of identity protection and the preservation of data within the trace that is most relevant to researchers. While several metrics exist to quantify this privacy-utility tradeoff, they are often computationally expensive. Computing these metrics using a sample of the trace, rather than the entire input trace, could potentially save precious time and space resources, provided the accuracy of these values does not suffer. In this paper, we examine several simple sampling methods to discover their effects on measurement of the privacy-utility tradeoff when anonymizing network traces prior to their sharing or publication. After sanitizing a small sample trace collected from the Dartmouth College wireless network, we tested the relative accuracy of a variety of previously implemented packet and flow-sampling methods on a few existing privacy and utility metrics. This analysis led us to conclude that, for our test trace, no single sampling method we examined allowed us to accurately measure the trade-off, and that some sampling methods can produce grossly inaccurate estimates of those values. We were unable to draw conclusions on the use of packet versus flow sampling in these instances.

Keren Tan, Guanhua Yan, Jihwang Yeo, and David Kotz. Privacy analysis of user association logs in a large-scale wireless LAN. Proceedings of the Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM) mini-conference. April 2011. [Details]

User association logs collected from a large-scale wireless LAN record where and when a user has used the network. Such information plays an important role in wireless network research. One concern of sharing these data with other researchers, however, is that the logs pose potential privacy risks for the network users. Today, the common practice in sanitizing these data before releasing them to the public is to anonymize users’ sensitive information, such as their devices’ MAC addresses and their exact association locations. In this work, we aim to study whether such sanitization measures are sufficient to protect user privacy. By simulating an adversary’s role, we propose a novel type of correlation attack in which the adversary uses the anonymized association log to build signatures against each user, and when combined with auxiliary information, such signatures can help to identify users within the anonymized log. Using a user association log that contains more than four thousand users and millions of association records, we demonstrate that this attack technique, under certain circumstances, is able to pinpoint the victim’s identity exactly with a probability as high as 70%, or narrow it down to a set of 20 candidates with a probability close to 100%. We further evaluate the effectiveness of standard anonymization techniques, including generalization and perturbation, in mitigating correlation attacks; our experimental results reveal only limited success of these methods, suggesting that more thorough treatment is needed when anonymizing wireless user association logs before public release.

Keren Tan, Guanhua Yan, Jihwang Yeo, and David Kotz. Privacy Analysis of User Association Logs in a Large-scale Wireless LAN. Technical Report, January 2011. [Details]

User association logs collected from a large-scale wireless LAN record where and when a user has used the network. Such information plays an important role in wireless network research. One concern of sharing these data with other researchers, however, is that the logs pose potential privacy risks for the network users. Today, the common practice in sanitizing these data before releasing them to the public is to anonymize users’ sensitive information, such as their devices’ MAC addresses and their exact association locations. In this work, we demonstrate that such sanitization measures are insufficient to protect user privacy because the differences between user association behaviors can be modeled and many are distinguishable. By simulating an adversary’s role, we propose a novel type of correlation attack in which the adversary uses the anonymized association log to build signatures against each user, and when combined with auxiliary information, such signatures can help to identify users within the anonymized log. On a user association log that contains more than four thousand users and millions of association records, we demonstrate that this attack technique is able to pinpoint the victim’s identity exactly with a probability as high as 70%, and narrow it down to a set of 20 candidates with a probability close to 100%. We further evaluate the effectiveness of standard anonymization techniques, including generalization and perturbation, in mitigating this correlation attack; our experimental results reveal only limited success of these methods, suggesting that more thorough treatment is needed when anonymizing wireless user association logs before public release.

Keren Tan, Jihwang Yeo, Michael E. Locasto, and David Kotz. Catch, Clean, and Release: A Survey of Obstacles and Opportunities for Network Trace Sanitization. Privacy-Aware Knowledge Discovery: Novel Applications and New Techniques. January 2011. [Details]

Network researchers benefit tremendously from access to traces of production networks, and several repositories of such network traces exist. By their very nature, these traces capture sensitive business and personal activity. Furthermore, network traces contain significant operational information about the target network, such as its structure, identity of the network provider, or addresses of important servers. To protect private or proprietary information, researchers must “sanitize” a trace before sharing it.

In this chapter, we survey the growing body of research that addresses the risks, methods, and evaluation of network trace sanitization. Research on the risks of network trace sanitization attempts to extract information from published network traces, while research on sanitization methods investigates approaches that may protect against such attacks. Although researchers have recently proposed both quantitative and qualitative methods to evaluate the effectiveness of sanitization methods, such work has several shortcomings, some of which we highlight in a discussion of open problems. Sanitizing a network trace, however challenging, remains an important method for advancing network--based research.


2010:
Keren Tan, Guanhua Yan, Jihwang Yeo, and David Kotz. A Correlation Attack Against User Mobility Privacy in a Large-scale WLAN network. Proceedings of the ACM MobiCom S3 workshop. September 2010. [Details]

User association logs collected from real-world wireless LANs have facilitated wireless network research greatly. To protect user privacy, the common practice in sanitizing these data before releasing them to the public is to anonymize users’ sensitive information such as the MAC addresses of their devices and their exact association locations. In this work,we demonstrate that these sanitization measures are insufficient in protecting user privacy from a novel type of correlation attack that is based on CRF (Conditional Random Field). In such a correlation attack, the adversary observes the victim’s AP (Access Point) association activities for a short period of time and then infers her corresponding identity in a released user association dataset. Using a user association log that contains more than three thousand users and millions of AP association records, we demonstrate that the CRF-based technique is able to pinpoint the victim’s identity exactly with a probability as high as 70%.

2009:
Jihwang Yeo, Keren Tan, and David Kotz. User survey regarding the needs of network researchers in trace-anonymization tools. Technical Report, November 2009. [Details]

To understand the needs of network researchers in an anonymization tool, we conducted a survey on the network researchers. We invited network researchers world-wide to the survey by sending invitation emails to well-known mailing lists whose subscribers may be interested in network research with collecting, sharing and sanitizing network traces.


[Kotz research]