README¶

PROVEE - PROgressiVe Explainable Embeddings¶

Deep Neural Networks (DNNs), and their resulting **latent or embedding data spaces, are key to analyzing big data** in various domains such as vision, speech recognition, and natural language processing (NLP). However, embedding spaces are high-dimensional and abstract, thus not directly understandable. We aim to develop a software framework to visually explore and explain how embeddings relate to the actual data fed to the DNN. This enables both DNN developers and end-users to understand the currently black-box working of DNNs, leading to better-engineered networks, and explainable, transparent DNN systems whose behavior can be trusted by their end-users. Our central aim is to open DNN black-boxes, making complex data understandable for data science novices, and raising trust/transparency are core topics in VA and NLP research. PROVEE will advertise and apply VA in a wider scope with impact across sciences (medicine, engineering, biology, physics) where researchers use big data and deep learning.

📝 Table of Contents¶

About
Feature/Performance Comparison
Getting Started
Running Tests
Usage
Deployment
Built Using
Contributing
Authors
Acknowledgments

🧐 About ¶

In this repository you will find PROVEE, short for Progressive Explainable Embeddings, a visual-interactive system for representing the embedding data spaces in a user-friendly 2D projection. The idea behind Progressive Analytics, such as described e.g. by Fekete and Primet, is to provide a rapid data exploration pipeline with a feedback loop from the system to the analyst with a latency below about 10 seconds. Research has shown that when performing exploratory analysis humans need a latency below about 10 seconds to remain focused and use their short-term memory efficiently. Therefore, PROVEE’s goals are (1) to provide increasingly meaningful partial results as the algorithms execute and (2) provide visualizations that minimize distractions by not changing views excessively. All of this with a high scalability of the input data in combination with memory efficiency. Note that these goals are adapted from the aforementioned publication.

PROVEE’s architecture includes (1) analysis algorithms (particularly, incremental projection algoritms like IPCA), (2) intuitive, local user interfaces/visualizations and (3) intermediate data storage and transfer. Core to our system is an innovative, progressive analysis workflow targeting a human-algorithm feedback-loop with a latency under ~10 seconds to maintain the user’s efficiency during exploration tasks. PROVEE will be scalable to big data; generic (handle data from many application domains); and easy to use (requires no specialist programming from the user). Please also refer to our Performance and feature comparison to see the available (visualization and analysis) tools that we used as to compare PROVEE to.

🏁 Getting Started ¶

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites¶

Python 3
Conda

Installing¶

Clone the latest Provee directory from Gitlab

git clone https://git.science.uu.nl/vig/provee/provee-local-projector.git

Create a conda environment. Your new environment will be named ‘provee’.

conda env create -f environment.yml

Activate the environment.

conda activate provee

To deactivate the environment, use

conda deactivate

Running¶

To run the project, first activate the conda environment. Afterwards run main.py while in LocalProjector.

cd LocalProjector\
python main.py

🔧 Running the tests ¶

The tests can be found under the folder ‘LocalProjector/test’.

Basic Unit Tests¶

Tests can be run using pytest. First activate the conda environment. From the root folder, tests can be run using:

pytest LocalProjector/test/

To enable coverage, use:

pytest --cov=LocalProjector/src/ LocalProjector/test/

🎈 Usage ¶

Notes about how to use the system are TBD, Video coming soon.

🚀 Deployment ¶

If you want to deploy a live system refer to the Deployment Guide.

⛏️ Built Using ¶

Vispy - 2D visualization
Faiss - K-Nearest Neighbours
gRPC - Microservices
PyQt5 - Signaling & Service

✍️ Authors ¶

Michael Behrisch - Idea & Initial work
Alex Telea - Idea & Initial work
Dong Nguyen - Idea & Initial work
Simen van Herpt - Backend & Infrastructure
Dennis Owolabi - Code management & Infrastructure
Sinie van der Ben - Comparison & Faiss

See also the list of contributors who participated in this project.

🎉 Acknowledgements ¶

Hat tip to anyone whose code was used