vllm.config.kv_transfer ¶
KVTransferConfig ¶
Configuration for distributed KV cache transfer.
Source code in vllm/config/kv_transfer.py
engine_id class-attribute instance-attribute ¶
engine_id: str = Field(default=None, validate_default=True)
The engine id for KV transfers.
kv_buffer_device class-attribute instance-attribute ¶
kv_buffer_device: Literal['cuda', 'cpu'] = 'cuda'
The device used by kv connector to buffer the KV cache.
kv_buffer_size class-attribute instance-attribute ¶
kv_buffer_size: float = Field(default=1000000000.0, gt=0)
The buffer size for TorchDistributedConnector. Measured in number of bytes. Recommended value: 1e9 (about 1GB).
kv_connector class-attribute instance-attribute ¶
kv_connector: str | None = None
The KV connector for vLLM to transmit KV caches between vLLM instances.
kv_connector_extra_config class-attribute instance-attribute ¶
any extra config that the connector may need.
kv_connector_module_path class-attribute instance-attribute ¶
kv_connector_module_path: str | None = None
The Python module path to dynamically load the KV connector from. Only supported in V1.
kv_ip class-attribute instance-attribute ¶
kv_ip: str = '127.0.0.1'
The KV connector ip, used to build distributed connection.
kv_parallel_size class-attribute instance-attribute ¶
kv_parallel_size: int = Field(default=1, ge=1)
The number of parallel instances for KV cache transfer. For P2pNcclConnector, this should be 2.
kv_port class-attribute instance-attribute ¶
kv_port: int = 14579
The KV connector port, used to build distributed connection.
kv_rank class-attribute instance-attribute ¶
kv_rank: int | None = None
The rank of this vLLM instance in the KV cache transfer. Typical value: 0 for prefill instance, 1 for decode instance. Currently only 1P1D is supported.
kv_role class-attribute instance-attribute ¶
kv_role: KVRole | None = None
Whether this vLLM instance produces, consumes KV cache, or both. Choices are 'kv_producer', 'kv_consumer', and 'kv_both'.
_validate_engine_id classmethod ¶
Must be set here instead of default_factory to ensure that each instance of KVTransferConfig gets a unique engine_id.
Source code in vllm/config/kv_transfer.py
_validate_kv_transfer_config ¶
_validate_kv_transfer_config() -> Self
Source code in vllm/config/kv_transfer.py
compute_hash ¶
compute_hash() -> str
WARNING: Whenever a new field is added to this config, ensure that it is included in the factors list if it affects the computation graph.
Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.