Assurance

The Detect technology stack is composed of people, processes and lastly, technology.


Internally, we don't consider it a "technology stack" but instead a belief that building software can be different.


Similar to a secure by design operating system, the Detect stack was built from the get-go around safety and security. It is broken down into components, each an encapsulated service containing a series of safety nets and contingencies to ensure it works as expected. The overall system weaves these components together into a larger ecosystem.


Assurance is approached through the following:

  • Safety mechanisms built into each component and made transparent to the end user.
  • An alert management process to ensure Detect runs quietly and safely.
  • A robust compiling process for building security tests.
  • An exhaustive validation process for new security tests.

The next few sections will outline these approaches.

Alerts

Two things should happen when an offensive action, such as a Verified Security Test, runs on an endpoint: it should be prevented and an alert should fire to a remote system where analysts can triage it.


Triaging is the process of separating real events from false positives and resolving any issues.

About alerting

Endpoints often run two agents, an EDR for preventing malicious behavior and a logging utility for sending all events to a SIEM. An alert management strategy can revolve around either.


If EDR, alerts are sent to a dashboard and often proxied into a SIEM (example: Splunk), ticketing system (example: JIRA), instant messenger (example: Slack) or infrastructure alerting system (example: Pager Duty). Analysts may triage alerts from any or multiple of these sources.


If SIEM, alerts are stored in a database holding all events (alerts and other) collected from all endpoints through the separate logging agent. Analysts triage alerts by correlating events with detection rules through a process known as detection engineering.

Alert strategy

When deploying Detect at scale it is important to adjust your alert management strategy to avoid a large number of false positives. Partner integrations offer several built-in options which you can incorporate in your larger plan.

EDR

Most vendors support exceptions or alert suppression when an event matches a particular signature, such as sourced from a particular process or file. Detect runs all tests from a signed executable which has several characteristics to build an exception around: process name, file location, file signature, file hash, test directory, test hashes, etc. Depending on vendor, you can select one or many of these characteristics and either mark alerts as informational or ignored.

SIEM

Every Verified Security Test has several identifiable characteristics, most prominently the file hash. These characteristics can be automatically forwarded - as an event - to a SIEM immediately after execution. From here, these events can be matched to any alerts uncovered through the detection engineering process - and ignored as false positives. The correlation method used will depend on the rule framework in place (example: Sigma) and the process used to move matches into a ticketing system.

Compute

Prelude hosts an internal service, called Compute, responsible for providing assurance that a Verified Security Test (VST) is production-ready.


Compute is a web service that accepts a test identifier and the account identifier which owns it. Compute then performs the following actions:

  • Downloads the source code file from the cloud directory containing the test
  • Compiles the code for all applicable operating systems.
  • Scans the compiled binaries against all malware YARA rules provided by Virus Total's open-source project
  • Uploads the compiled binaries back to the cloud directory

During the compile step, a technique is used to ensure a file gets a unique hash every time it is sent to Compute - even if the file hasn't changed.

The final result is a test is converted from source code into an artifact that can be scheduled (for probe execution).

External use

Security engineers automatically leverage Compute when writing and uploading their own VSTs. Each upload fires off a request to Compute, which performs the chain of actions described above.


Follow the guide here to learn how to write your own VSTs.

Internal use

Prelude uses the Compute server in two ways:

  1. Each test the internal security team writes is compiled using Compute, to ensure consistency. Each test is then resent to Compute on a regular cadence to ensure it is rotating its file hash. This process ensures a defensive control cannot rely solely on signature matching to protect an endpoint.
  2. The Prelude team then sends all tests - which the internal team writes - into a continuous test range that runs them in various scenarios on all supported major/minor operating systems. This process aims to flush out any edge cases tests may encounter in the wild.


Range

Range is a multi-segment testing environment to validate probes and security tests.

Range is the only server-based aspect of the Prelude tech stack, however it is managed through a Serverless Application Model (SAM) application to retain the ephemeral properties of spinning up/down on demand.

Each machine in the range runs a probe and is hooked up to a Prelude account managed by the security team, for validating new tests. The range rebuilds itself weekly, to re-test the probe installers from scratch.

Range is broken into the following segments:

  • Base: one of every supported operating system (major/minor versions) with default configurations and defenses
  • Crowdstrike: one of every supported operating system, running middle-of-the-line prevention policies
  • Windows: an Active Directory environment running enterprise Defender

Each machine in the range includes a chaos monkey installation which constantly - and randomly - updates system configurations. This process is intended to construct dynamic environments that are difficult to predict in a deliberate manner.

Safety

Detect is designed to run in production environments at scale, whether that means 100 or 100,000 endpoints. To be confident running at that scale you should become familiar with the safety mechanisms designed into the system.


Detect probes are designed to satisfy both functional and safety requirements. To achieve both without compromising either, probe structure and code are simple and open, allowing users to examine and verify their functionality. Because the code is open, users can make suggestions (in the form of a pull request) or fork the code and modify it to suit your specific requirements.

Probes

Below are the safety mechanisms built into probes.

Open-source

Detect probes are fully open-source. You should never run a binary on your production systems without first validating the code - regardless of how much you trust the provider. Prelude believes transparency is the key that unlocks good security, so we made all our probes open-source. If you're comfortable, we advise users to read the source code and ask us questions.

Lightweight

Detect probes range from 1-2KB in size, or 30-100 lines of code depending on which one you select. The common rule of thumb is that 15 bugs, including security vulnerabilities, exist for every 1000 lines of code. By decreasing the probe's footprint, safety is conversely increased.

Minimal dependencies

Detect probes are written in a high level language, using standard OS tools. This gives end users more control over the code run on their endpoints, how they are patched, and how they are audited.

No installation

Detect probes do not require installation. Instead, they are ephemeral processes that run as long (or short) as you want. To start a probe, either include it in your own code or download and start a standalone probe binary. Probes are incredibly small, weighing in between 1-2KB in size, depending on which you choose.

No special privileges

Probes are designed to be run as a "normal" user. Unlike most endpoint agents, we advise against running the process as root or administrator. This immediately limits the surface area and restricts the process to user land.

Limited network requests

Probes do not require a constant network connection to the Prelude Service. Instead, probes initiate a handful of requests per day to the Service API over HTTPS. Each request passes the endpoint's platform, architecture and token into the API and asks if you've queued any tests for it.

Resource controls

When a probe isn't making one of its few requests per day, it's dormant. Probes do nothing except make their requests and execute any returned instructions. When running instructions, probes limit their runtime to a few seconds per test. If a test exceeds this limit, it is stopped and a timeout code is sent back to your dashboard for you to review.

Authority validation

When a probe receives a security test to run, it's actually receiving a pre-signed redirect URL to the location where the test is stored. Redirect URLs point to an S3 bucket controlled by Prelude.

Prelude takes rigorous measures to secure and audit our S3 resources. If you'd like information on our process, just ask.

Telemetry

Probes are designed to ship as little data off each device as possible. They do this by capturing a single code (integer) per executed test and only that code, and the test identifier that ran, are sent to the Prelude Service. These codes are checked against our lookup table so we can build a picture of what occurred on the endpoint.

Tests

Below are the safety mechanisms built into tests.

Test cleanup

Security tests have two parts: the test and the cleanup. The cleanup is run in a separate process from the test and will reverse any effects (if any) that your test created during runtime.

Test compilation

Each test that Prelude authors and makes available inside Detect goes through a rigorous manual and automated test cycle. This cycle starts with manual code reviews and testing across commodity devices. This testing is done by the internal Prelude security team.


Once complete, a test is scheduled inside the autonomous Prelude Test Range, a collection of all major/minor operating systems supported by Detect. When a test is first uploaded, it is sent to the Prelude Compute server for compilation. This server ensures all tests are consistently compiled and then scanned by all (open-source) YARA signatures hosted by Virus Total. This step is done to separate tests from code that is harmful.


The Prelude Test Range runs all prelude authored tests on repeat to verify they are not causing stability or false detection issues.

Auditing

All user interactions are logged and made available to Administrators for auditing purposes. You can retrieve the audit logs for your account through the Prelude CLI:

prelude iam logs

You will receive a response containing the timestamp, user, and event:

[
  {
    "@timestamp": "2023-04-12 12:07:56.435",
    "account_id": "0b5319f5cf8ba14cdef96afd0fdada99",
    "user": "user@example.com",
    "event": "register account"
  }
]