Connector Model🔗
Connectors are the integration layer between DSX-Connect and external repositories.
They are responsible for:
- Enumerating items in a repository
- Retrieving file content
- Executing remediation actions
- Exposing a consistent API contract
Connectors are intentionally stateless. All orchestration, scanning, retry logic, scaling, and result persistence are handled by DSX-Connect Core.
This separation allows DSX-Connect to remain repository-agnostic while enabling new integrations to be added without modifying the core system.
Standard Connector API🔗
All connectors implement the same core API surface:
full_scan— enumerate items and enqueue scan requestsread_file— retrieve file content as a binary streamitem_action— perform remediation (delete, move, tag, etc.)webhook_event— optional event ingestionrepo_check— health validation
This uniform contract ensures consistent scanning behavior across:
- Filesystem
- AWS S3
- Azure Blob Storage
- Google Cloud Storage
- SharePoint / OneDrive
- Salesforce
- Other supported repositories
Scan Lifecycle (Connector Interaction)🔗
During a full scan:
- A job is created via API or UI.
- DSX-Connect calls the connector’s
full_scan. - The connector enumerates items under its configured asset.
- For each matching item, a Scan Request is created.
- The Scan Request Worker dequeues the request.
- The worker calls
read_fileon the connector. - File content is streamed to DSXA.
- The DSXA verdict is processed.
- If malicious, the worker calls
item_action. - Results are persisted and broadcast.
Connectors never perform scanning themselves. They provide access and remediation capabilities only.
This design enables:
- Queue-based resilience
- Horizontal worker scaling
- Retry and DLQ handling
- Repository isolation
Filesystem Connector (Scan Lifecycle Walkthrough)🔗
If you completed Getting Started, you deployed a Filesystem Connector.
Because filesystems are familiar and transparent, they provide a clear illustration of how the connector model works.
Figure 1: Filesystem Connector workflow
Assume the connector is configured as:
DSXCONNECTOR_ASSET=~/Documents
DSXCONNECTOR_ITEM_ACTION_MOVE_METAINFO=~/Documents/quarantine
Full Scan Example🔗
When Full Scan is invoked:
- DSX-Connect calls the connector’s
full_scan. - The connector enumerates all files under
~/Documents. - Filters (if defined) are applied.
- For each matching file, a Scan Request is sent to DSX-Connect.
- The Scan Request Worker calls
read_file. - The file is streamed to DSXA.
- The verdict is queued.
- The Verdict Worker calls
item_actionif malicious. - The file is moved to
~/Documents/quarantine. - Results are persisted and broadcast.
This flow is identical for all other connectors.
Note
The DSXA Scanner sends malicious verdicts to the Deep Instinct Management Console. The Deep Instinct Management Console should always be considered the definitive source of malicious events
The only difference is how enumeration and remediation are implemented in the connector itself.
For example, when the Filesystem Connector is asked to quarantine a file, moves the file to ~/Documents/quarantine.
For a GCP connector, the file may be modified in place and prefixed. Each connector will specify its own implementation.
Why This Architecture Works🔗
You could write a script that:
- Walks a directory
- Reads files
- Sends them to DSXA
- Moves malicious files
But that script would need:
- Retry logic
- Backoff handling
- Progress tracking
- Parallelism
- Logging
- Failure isolation
DSX-Connect provides these capabilities centrally.
Connectors remain simple:
- List files
- Read files
- Act on files
As new connectors are added, they inherit the same resilience and scaling characteristics automatically.
Asset and Filter Model🔗
Connectors define their scan scope using two mechanisms:
DSXCONNECTOR_ASSETDSXCONNECTOR_FILTER
Asset🔗
DSXCONNECTOR_ASSET defines the exact root the connector owns. Full scans start here, and “on-access” feeds (webhooks, monitors) scope themselves to the same root. The exact meaning depends on the backend:
| Repository | Example Asset |
|---|---|
| AWS S3 | bucket-name or bucket-name/prefix |
| Azure Blob | container-name |
| GCS | bucket-name |
| Filesystem | /data/scan_root |
| SharePoint | site or document library root |
Asset defines the coarse boundary of enumeration.
Providers can often optimize listing operations when the asset is narrowly defined.
Providers can often narrow list operations to name_starts_with that root/prefix, which keeps enumeration fast (listing is usually the slowest, most serial part of a full scan). Filters (below) are applied inside the connector after the provider lists everything under the asset—most backends do not support server-side include/exclude.
Always prefer the narrowest practical asset root.
Always set the asset to a stable, exact root — no wildcards. If you need multiple roots, deploy multiple connector instances.
Filter🔗
DSXCONNECTOR_FILTER applies rsync-like include/exclude rules under the asset.
Important:
Filters are evaluated inside the connector after enumeration.
Most providers do not support server-side include/exclude filtering beyond prefix scoping. This means:
- The connector still lists everything under the asset.
- Filters only reduce what becomes a scan request.
- Filters do not necessarily reduce enumeration cost.
Therefore:
- Use asset for coarse partitioning.
- Use filters for fine-grained tuning.
See Reference → Filters for detailed syntax.
Asset vs Filter Best Practices🔗
Prefer pushing boundaries into the asset:
| OK | Better |
|---|---|
asset=my-bucket, filter=prefix1/** |
asset=my-bucket/prefix1, filter="" |
asset=my-bucket, filter=sub1/* |
asset=my-bucket/sub1, filter="*" |
Filters vs Assets — Pros & Cons🔗
- Assets (partitioning at source):
- Pros: enables parallel enumeration; reduces per‑connector list volume; isolates failures per shard
- Cons: requires coordination of partitioning (naming/scope decisions)
- Filters (evaluation at connector):
- Pros: simple per‑connector scoping without changing infrastructure; expressive (rsync‑like)
- Cons: filters are applied after listing within the asset; for very large repos, exhaustive filters can still incur heavy list operations
Guidance:
- Prefer asset for coarse boundaries (folders/prefixes/libraries).
- Use filter for light include/exclude tuning.
- Keep filters simple.
- Complex filters can force broad listings.
Sharding and Scaling Strategy🔗
For very large repositories (millions to billions of objects), a single connector instance may become enumeration-bound.
The correct scaling strategy is asset-based sharding.
Deploy multiple connector instances, each with a distinct asset partition.
Examples:
-
S3:
bucket/Abucket/Bbucket/C
-
Time partitions:
bucket/2025-01bucket/2025-02
-
Filesystem:
/data/shard1/data/shard2
-
SharePoint:
- Separate document libraries or folder scopes
This approach:
- Parallelizes enumeration
- Reduces list volume per connector
- Isolates failures per shard
- Aligns with horizontal worker scaling
Filter-based sharding is possible but less efficient because enumeration still occurs at the broader asset level.
Item Actions🔗
DSXCONNECTOR_ITEM_ACTION defines what happens when a file is marked malicious.
| Value | Behavior |
|---|---|
nothing |
Report only |
delete |
Remove the object |
tag |
Apply provider-specific metadata |
move |
Relocate to quarantine |
move_tag |
Move and tag |
When using move or move_tag, configure:
DSXCONNECTOR_ITEM_ACTION_MOVE_METAINFO
Its meaning depends on the repository (directory, prefix, folder, etc.) and connector. As an example (filesystem connector), this setting refers to a quarantine folder within the asset root.
Deployment Considerations🔗
Connector scaling differs by deployment model.
In Docker Compose:
- Run multiple connector containers manually for sharding.
In Kubernetes:
- Deploy multiple releases or replicas with distinct assets.
- Combine connector sharding with worker concurrency and replica scaling.
- Use resource requests/limits and autoscaling where appropriate.
Connector strategy should align with your overall deployment model and scaling goals.