DIAGNOSIS AND ASSESSMENT OF FAULTS, MISBEHAVIOR AND THREATS IN DISTRIBUTED SYSTEMS AND NETWORKS

Funded by NSF, under Information Technology Research

Universities involved: University of Illinois (lead), Boston University, MIT, University of Oklahoma, Yale



Research

People

Seminars

Publications

Teaching

Overall Objectives:

This research project develops theory and techniques for monitoring and diagnosing faults, hazards or, more generally, functional changes in dynamic systems and networks, under limited and possibly corrupted information. We aim to develop a unifying and multifaceted approach to this problem by decomposing the large body of fault diagnosis research into six topics:

  • deterministic fault diagnosis,
  • model-based probabilistic diagnosis,
  • adaptive and sequential diagnosis,
  • distributed system-level diagnosis with communication constraints in wired/wireless networks,
  • fault diagnosis via distributed belief propagation algorithms, and
  • model-independent diagnosis.

Our research team will leverage its expertise in the areas of fault diagnosis, sequential detection, system-level diagnosis, distributed control, modeling, analysis and performance evaluation, applied probability, graph theory, belief propagation and model reduction to the problem of detecting, identifying and localizing faults and abnormalities in dynamically evolving environments.

Beyond intellectual value, the research program proposed will have broader impacts in a variety of ways. As networks and networked systems are increasingly solidifying their roles as building blocks of the nation's economic and social foundation (with numerous emerging commercial, governmental, medical, military and security applications), there emerges a growing need for ensuring that these critical infrastructures are reliable and trustworthy in spite of malicious or non-malicious disruptions. Building trustworthy networked systems using off-the-shelf components and software presents a significant hurdle that needs to be overcome in order to exploit the full potential of networked systems. This project aims to outline a synergistic and comprehensive approach for scalable methodologies for diagnosing faults, adversarial behavior and threats in complex systems and networks, under uncertain information and possibly in the presence of communication errors and constra! ints. The project will have ramifications in the monitoring, testing, and reliable and secure operation of networked systems, communication networks, and complex digital systems; it will also contribute to the development of distributed algorithms for fault diagnosis and result in the overall enhancement of distributed systems in ways that make them more reliable.

Intellectual Merit:

The intellectual merit of this proposal lies in the synergistic and comprehensive exploration of different dimensions within the broad area of detection and identification of faults or, more generally, abnormal behavior in complex dynamic systems and networks. The ultimate goal is to develop appropriate models and innovative distributed algorithms that integrate and unify techniques from a number of diverse disciplines, including fault diagnosis in discrete event systems, detection and estimation, graph theory and optimization, distributed system-level diagnosis, belief propagation, model reduction and information theory. Apart from advancing the forefront of the various individual approaches to diagnosis, the overarching theme is the integration of these ideas into a well-defined approach that achieves the advantages of both deterministic and probabilistic methodologies via scalable models and algorithms. While extending the frontiers in the broad area of fault di! agnosis in complex dynamic systems and networks, this research will at the same time leverage the applicability of these techniques to the design of test platforms for experimenting with distributed fault diagnosis in ad-hoc mobile networks and fault localization in indoor sensor networks.

Educational Impact and Objectives:

The main educational goals of this program are two-fold:

(i) To develop courses and educational materials that discuss systematic approaches for algorithms and architectures for fault diagnosis and tolerance in complex systems and networks.

(ii) To continue to actively recruit and mentor participants from underrepresented groups in our respective research programs.