Role Overview
Fireworks AI is building a blazing-fast inference platform for generative AI, enabling developers and enterprises to deploy and scale large language models with unprecedented speed and efficiency. As a Software Engineer at Fireworks AI, you will be crucial in developing and optimizing high-performance systems for AI model serving, distributed computing, and low-latency inference. This role requires expertise in systems programming, distributed systems, and a deep understanding of machine learning infrastructure.
Key Responsibilities
- Design, implement, and optimize high-performance inference engines and serving infrastructure for large generative AI models.
- Develop and maintain distributed systems for model loading, routing, and scaling across various hardware accelerators (GPUs).
- Work on low-level performance optimizations, including kernel development, memory management, and network communication for AI workloads.
- Collaborate with AI/ML researchers to integrate new model architectures and ensure optimal performance on the platform.
- Contribute to the development of APIs and SDKs that allow
