Member-only story

AI landscape is shifting from GPU to AI Accelerator

My thoughts on shifting AI landscape from GPU to AI Accelerator

Sung Kim
5 min readAug 25, 2023

This article is not very well-researched; it consists merely of my thoughts, expressed on this so-called blog. I hope to evolve this blog as I research the topics more.

Photo by PAUL SMITH on Unsplash

Many people assume that Nvidia’s dominance in the AI hardware market will continue for a few years. This assumption is understandable, given that the number of people pre-training Large Language Models (LLMs) or fine-tuning LLMs is increasing at an exponential rate. Evidence of this growth can be seen by simply looking at the number of models being uploaded to the Hugging Face Hub on a daily basis. Most likely, the majority of these individuals are using Nvidia GPUs or Google’s TPUs to train these models.

I would like to argue that the AI landscape is shifting away from GPUs to AI accelerators as people start implementing these LLMs in production. The very success Nvidia is experiencing today will result in them losing their virtual monopoly in AI hardware to other AI hardware companies, losing an AI inference market that will be exponentially bigger than the AI training market.

Problem Statement

Let’s me illustrate this with a typical business scenario. Your team decides to fine-tune Llama2–70B to accelerate a critical business function. Since fine-tuning is an iterative process and not a once-and-done affair, it takes your team about 3 months to complete the fine-tuning of Llama2–70B to meet this critical business need. (Note that it may take longer, but reserving a GPU such as Nvidia’s HGX H100 8-GPU for more than 3 months is really difficult nowadays.)

Back-of-the-napkin calculation: Let’s say one H100 per hour costs $5.00. Since you are renting 8 GPUs for 3 months (2,160 hours, give or take), your total training cost would be $86,400.

Your team has tested the model with a select group of users and would like to roll it out to your business users. Based on the number of users and expected usage of the application, your team expects that the application needs to scale to support 50 concurrent users. Since Llama2–70B requires a minimum of 40GB VRAM to run, you are looking at needing 100 Nvidia L4 24GB…

--

--

Sung Kim
Sung Kim

Written by Sung Kim

A business analyst at heart who dabbles in ai engineering, machine learning, data science, and data engineering. threads: @sung.kim.mw

Responses (5)

Write a response