jp6/cu126/: minference versions
Because this project isn't in the mirror_whitelist
,
no releases from root/pypi are included.
Latest version on stage is: 0.1.6.0
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
Index | Version | Documentation |
---|---|---|
jp6/cu126 | 0.1.6.0 |