Office 365 ediscovery

New to Office 365: Advanced eDiscovery


In January 2015 Microsoft announced that they acquired Equivio, a provider of eDiscovery, compliance and governance tools. Equivio use machine learning techniques in their technology to improve the process of eDiscovery.

Today we are able to see how Microsoft has integrated Equivio product features such as text analytics and predictive coding into the Office 365 Advanced eDiscovery (O365 AED) tool.

In this short post, I present a high-level definition of eDiscovery; a brief overview of the kinds of activities where Office 365 Advanced eDiscovery can be applied; how the integration of Equivio improves the product; and list business scenarios where this technology can be effective.

What is eDiscovery?

eDiscovery is a type of software tool which improves the process of collecting information from digital sources such as File Shares, Documentation, Emails, SMS, IM, Web Pages and Social Networks.

Most eDiscovery tools are designed to reduce time and costs during digital evidence gathering by a legal team.

eDiscovery is especially useful when attempting to pull together chunks of hard-to-find unstructured information like text, images or videos found in digital sources that lack custom metadata.

An eDiscovery tool also focuses on improving the process of reviewing, and exporting information. Currently, this makes it better than using an advanced search for digital evidence gathering.

What can I do with Office 365 Advanced eDiscovery?

It seems Microsoft has built many useful features into Office 365 Advanced eDiscovery to meet the demands of users during most types of information gathering.

Below I have highlighted the O365 AED capabilities which I think are most appropriate during evidence gathering:

  • Automates grouping of similar information for review by user groups.
  • Costs of reviewing documents are automatically calculated after the filtering and surfacing of information.
  • Makes easy work of mining information from large information sources or datasets.
  • Decides which information is ‘relevant’.
  • Export filtered information into a package and/or generate reports for submission to government, legal or regulatory body.

How does the integration of Equivio Zoom improve the eDiscovery process?

Microsoft have integrated the following features of Equivio Zoom to allow Office 365 Advanced eDiscovery to be more efficient:

  • Multidimensional Analysis: Analyse data which concerns more than two data dimensions. An example would be to obtain a data view such as the number of customers which transacted more than £500 in a day over 5 years of transactional history, compared with the average customer transaction per day.
  • Machine Learning: Equivio’s state of the art machine-learning algorithms have been incorporated into near-duplication detection, email threading, new generation clustering, and predictive coding features
  • Near Duplication Detection: To avoid duplicated effort, automatic grouping of similar documents to one user review is possible, instead of separating out similar documents for review by multiple people.
  • Email Threading: Computerised separation of unique messages from extensive email threads.
  • Themes (aka new generation clustering): Intuitive grouping of similar documents.
  • Predictive Coding: Users are now able to teach eDiscovery to predict which documents are relevant (R) and not relevant (NR) for review by manually assigning R or NR to a small percentage of the processed information. Utilising machine learning, keyword search, and filtering, eDiscovery predicts the relevant information to be surfaced for review.
  • Text Analytics: With some configuration effort, users are able to identify sentiment, key phrases, language, and topics from unstructured information.

Office 365 Advanced eDiscovery User Cases

Below I have listed a range of corporate legal scenarios which Office 365 Advanced eDiscovery could be used to improve the evidence gathering process in your organisation:

  • Breach of Security: Quickly pin-point how, where, and when a breach of security or data leak occurred.
  • Compliance with Regulations: A regulatory body requests a sizable chunk of information about your organisation such as financial records or business processes for prompt submission.
  • Transaction Dispute: Disputes regarding multiple, complex, and historical transactions. For example, an organisation is required to submit transactional data for legal analysis to prove their innocents in a trading dispute.
  • Fraudulent Activity: Information requested to be surfaced rapidly for submission during a fraud investigation.
  • Employee Tribunals: Speedy collection of employee information for review before a tribunal.
  • Criminal Investigations: Collection of specific information requested by the police for assessment.


Unstructured information – Information which lacks a data model or is not organised in a defined manner. Unstructured information typically is text-heavy, but could contain dates, numbers and facts.

Structured Information – The opposite of unstructured information. Information which incorporates a data model and is organised in a defined manner. Structured information is typically found in a database or table. It contains tags, markers, hierarchies, and can be relational to other information.

Machine learning – Basic artificial intelligence which allows a computer to ‘learn’ within defined parameters of its core programming. Machine-learning empowers computers to perform analytical activities efficiently so that they may recognise patterns and make predictions based on data.

Custom metadata – Information created by a user to ‘describe’ information. A basic example would be tagging a document with specific information such as a time, date, or author.

Digital evidence – Any electronic based evidence. Not physical