DSXA SDK: Concurrency, Batching, and Throughput🔗
This page clarifies where concurrency lives when using the DSXA SDKs, and how to get high throughput safely.
Language references🔗
Primary use cases🔗
- Scan a file that already exists on disk.
- Scan file bytes/streams received by your backend during upload handling.
- Scan many files concurrently from a queue or folder walk with bounded workers.
Quick answer🔗
The Python and Swift SDKs do not currently expose a single "batch scan" API endpoint.
- The SDK clients provide single-request scan operations (
scan_file,scan_binary,scan_by_path, etc.). - Concurrent scanning is implemented by the caller (CLI, GUI, or your application) by running many scan calls in parallel.
In other words:
- SDK: request construction, transport/session management, auth headers, response parsing, error mapping.
- Caller: work queue, concurrency limit, retries/backoff policy, cancellation strategy.
Where concurrency is handled today🔗
Python🔗
- SDK client:
DSXAClient/AsyncDSXAClient - Single-item APIs
- Shared
httpx.Client/httpx.AsyncClientper client instance - CLI/GUI:
scan-folder/scan-filesuse async semaphore-based fan-out- GUI folder scans use bounded thread pools
Swift🔗
- SDK client:
DSXAClient - Single-item async APIs
- Shared
URLSessionpool (SessionPool.secure/SessionPool.insecure) - GUI:
- Folder scans run with a bounded
TaskGroupfan-out
Why use the SDK (instead of ad-hoc HTTP)🔗
Using the SDK still gives important performance and reliability benefits:
- Connection pool reuse
- Python: one
httpxclient reused across scans. - Swift: shared
URLSessioninstances with configured max host connections. -
Benefit: lower handshake/setup overhead, better sustained throughput.
-
Consistent request headers
- Handles
AUTH/AUTH_TOKEN, metadata, protected entity, password encoding. -
Benefit: fewer subtle integration bugs.
-
Typed responses and normalized errors
- Structured response models and centralized HTTP error mapping.
-
Benefit: simpler caller code and better diagnostics.
-
Streaming support (Python)
scan_binary_streamavoids loading whole file bytes into memory first.-
Benefit: lower peak memory and better large-file behavior.
-
Shared behavior across tools
- CLI, GUI, and custom apps can all rely on the same transport/auth semantics.
- Benefit: easier debugging and parity across environments.
Usage patterns🔗
Pattern A: File on disk🔗
Use SDK convenience helpers (scan_file / scanFile) for straightforward local-file scanning.
Pattern B: Upload data in a web backend🔗
For web/API services receiving file uploads:
- Prefer scanning bytes/streams in backend code before storing.
- Use streaming APIs where available (Python:
scan_binary_stream) to reduce peak memory. - Keep frontend clients from calling DSXA directly in most production architectures; centralize auth and policy in your backend service.
Pattern C: Parallel scans🔗
Use SDK client calls inside a bounded worker model:
- Python:
asyncio+ semaphore, or thread pool for sync workflows. - Swift:
TaskGroupwith explicit max in-flight tasks.
Best practices for high throughput🔗
- Reuse one client instance
- Do not create a new SDK client per file.
-
Create once per scan run, close at the end.
-
Use bounded concurrency
- Start with
4-8workers (or lower on constrained hosts). -
Increase gradually while monitoring DSXA latency/error rate.
-
Avoid unbounded fan-out
- Use semaphores/task groups/thread pools with explicit limits.
-
This prevents local resource exhaustion and DSXA overload.
-
Prefer streaming for large files (Python)
-
Use
scan_binary_streamwith chunked reads. -
Add retry/backoff only for transient failures
- Retry connection resets/timeouts/5xx.
-
Do not blindly retry 4xx auth/validation errors.
-
Keep file IO and network balanced
- If reads are slow, extra HTTP concurrency will not help.
- Profile disk/network bottlenecks before raising concurrency further.
Example architecture🔗
Typical high-throughput pattern in both languages:
- Enumerate file paths.
- Push paths into a bounded worker fan-out.
- Each worker calls SDK scan APIs.
- Aggregate results and failures.
- Emit summary metrics (elapsed time, per-file scan time, p50/p95, errors).
Future enhancement direction🔗
If desired, a helper can be added in each SDK (for example scan_folder(...)) that wraps:
- enumeration
- bounded concurrency
- optional retry policy
- summary stats
This would provide a standard batching utility while still using the current DSXA per-file scan endpoints.
JavaScript SDK direction🔗
A JavaScript/TypeScript SDK is a valid next step for web-backend and Node.js workflows.
Suggested initial scope:
scanBinary(Buffer/Uint8Array)scanStream(Node.js Readable stream)scanByPathand polling helpers- Shared auth/header helpers and typed response models
- Error mapping aligned with Python/Swift SDK semantics