Modified: 2024-04-30

Executive summary

Strise’s technology is able to discover relevance and patterns in large volumes of external and unstructured data through the utilization of a rich and continuously evolving Knowledge Graph.

A complete machine learning ecosystem for handling all stages from supervision and training of models to evaluation and operational deployment is engineered. The system is designed to scale and built on a flexible cloud infrastructure and state-of-the-art big data frameworks serving stream, batch, and graph processing. Reinforced with strict quality assurance, multiple levels of testing, monitoring, and analytics, Strise delivers a world-class technology platform performing beyond established industry standards in areas of machine learning, code quality, and product usability.

The heart of Strise's technology is the Knowledge Graph; a comprehensive ontology of more than 1,2 billion interconnected entities, continuously updated and refined by a daily stream of ~3 million events from over 200,000 sources covering more than 70 languages. This makes Strise capable of representing a unique and unbiased snapshot of how the business world is connected at any given moment.

By utilizing proprietary techniques combining Entity Linking, clustering, and classification, Strise is capable of detecting relevant events related entities like companies, industries, products or locations. Combined with sophisticated graph traversal, the system is able to see how implications and relevance of events propagate over both explicit event mentions and implicit associative relationships (e.g. identifying that a UBO is on a board of a company that has recently been sanctioned and proactively notifying on adverse signals within your customer base).

Architectural overview

Strise has designed and built a sophisticated SaaS architecture to handle and process real-time data in parallel with batch operations for training machine learning, entity linking models and the engineering of knowledge graphs.

The exchange of data, models, and results is done by leveraging technologies such as GraphQL, GRPC, Apache Spark, Neo4j, and Elasticsearch — all deployed to Kubernetes in Google Cloud orchestrated by Helm, and fully automated through a robust CI/CD pipeline using Jenkins and Terraform. This ensures that every update to our core system and related services is thoroughly tested and benchmarked before it is made available to our users.

Both the data processing and API layer is implemented using an autoscaled ensemble of services mainly written in Scala. Client-facing applications are built with TypeScript and React. The rationale for using Scala on the backend is because it's a robust, flexible, and concisely typed language, created with the functional programming paradigm in mind, giving Strise an agility in terms of developing new features and short time-to-market.

An overview of the core components of Strise’s technological architecture is shown below.

Untitled

Strise core technology architecture

(1) Content enrichers

Strise ingests content from a variety of sources, ranging from official updates about a company (e.g. BRREG) to media content, and social media updates. This means the system needs to handle a high number of integrations, as well as exotic data formats and delivery mechanisms.

To achieve this in a scalable way, all the incoming data goes through a content normalization step, powered by a swarm of services responsible for fetching, cleaning, and normalizing content into a common format for further processing.

While some preliminary content processing is done in this step, the content remains rather unstructured in terms of semantic understanding.

(2) Real-time NLP pipeline

Once the content is translated and prepared in a common format, the work of extracting the semantic information from the content begins. In layman terms, this means “understanding what the content is really about”. This includes identifying Entities such as Companies, Locations, Topics and so forth, as well as asserting their relevance in relation to the content.