Adaface Sample LLM Inference Questions

Here are some sample LLM Inference questions from our premium questions library (10273 non-googleable questions).

Skills

🧐 Question
Medium Misleading GPU Utilization AI Infrastructure GPU Utilization Bottleneck Diagnosis	Solve
A training job is running slower than expected. Your dashboard shows GPU utilization steady at 95%, so a teammate concludes the GPUs are the bottleneck and asks to buy faster ones. You dig deeper and pull these counters: gpu_utilization (any-kernel-active): 95% sm_occupancy (compute throughput): 12% dram_read_throughput: 8% of peak host_to_device_copy_active: 89% of step time dataloader_workers: 2 What is the most likely real bottleneck, and why is "buy faster GPUs" wrong? A. The GPU interconnect is saturated, so the resolution is adding more NVLink bandwidth between the devices on the node B. The GPUs are truly saturated at 95%, so adding faster accelerators is the correct and direct remedy in this case C. The DRAM is overheating and throttling, so the proper fix is better cooling rather than touching the input pipeline D. The model is too small to fill the GPUs, so the right move is increasing model depth until utilization climbs higher E. The data input pipeline is starving the GPU, which sits busy copying and idle on compute rather than doing real work
Medium Parallelism Strategy Selection Tensor Parallelism	Solve
A 70B model is too large to fit on a single GPU. You have one node with 8 GPUs connected to each other by high-bandwidth NVLink, and your production workload is dominated by latency-sensitive single-request and very small-batch inference. You must split the model across the 8 GPUs and are weighing tensor parallelism (TP), which shards each layer's matrices, against pipeline parallelism (PP), which assigns consecutive layer groups to different GPUs. Which choice and justification fit this scenario best? A. Use pipeline parallelism, since splitting by layers minimizes total cross-GPU traffic regardless of the batch size involved B. Use pipeline parallelism, since it is the only one of the two methods able to host a model larger than a single GPU C. Use tensor parallelism, since its per-layer all-reduce over NVLink is cheap and it avoids PP's idle stages at small batch D. Use tensor parallelism, since it removes the need for any inter-GPU communication whatsoever during the decode phase E. Split the node so four GPUs run TP and four run PP, since that evenly balances latency against throughput by design

	🧐 Question	🔧 Skill
	Medium Misleading GPU Utilization AI Infrastructure GPU Utilization Bottleneck Diagnosis	2 mins LLM Inference	Solve
A training job is running slower than expected. Your dashboard shows GPU utilization steady at 95%, so a teammate concludes the GPUs are the bottleneck and asks to buy faster ones. You dig deeper and pull these counters: gpu_utilization (any-kernel-active): 95% sm_occupancy (compute throughput): 12% dram_read_throughput: 8% of peak host_to_device_copy_active: 89% of step time dataloader_workers: 2 What is the most likely real bottleneck, and why is "buy faster GPUs" wrong? A. The GPU interconnect is saturated, so the resolution is adding more NVLink bandwidth between the devices on the node B. The GPUs are truly saturated at 95%, so adding faster accelerators is the correct and direct remedy in this case C. The DRAM is overheating and throttling, so the proper fix is better cooling rather than touching the input pipeline D. The model is too small to fill the GPUs, so the right move is increasing model depth until utilization climbs higher E. The data input pipeline is starving the GPU, which sits busy copying and idle on compute rather than doing real work
	Medium Parallelism Strategy Selection Tensor Parallelism	2 mins LLM Inference	Solve
A 70B model is too large to fit on a single GPU. You have one node with 8 GPUs connected to each other by high-bandwidth NVLink, and your production workload is dominated by latency-sensitive single-request and very small-batch inference. You must split the model across the 8 GPUs and are weighing tensor parallelism (TP), which shards each layer's matrices, against pipeline parallelism (PP), which assigns consecutive layer groups to different GPUs. Which choice and justification fit this scenario best? A. Use pipeline parallelism, since splitting by layers minimizes total cross-GPU traffic regardless of the batch size involved B. Use pipeline parallelism, since it is the only one of the two methods able to host a model larger than a single GPU C. Use tensor parallelism, since its per-layer all-reduce over NVLink is cheap and it avoids PP's idle stages at small batch D. Use tensor parallelism, since it removes the need for any inter-GPU communication whatsoever during the decode phase E. Split the node so four GPUs run TP and four run PP, since that evenly balances latency against throughput by design

	🧐 Question	🔧 Skill	💪 Difficulty	⌛ Time
	Misleading GPU Utilization AI Infrastructure GPU Utilization Bottleneck Diagnosis	LLM Inference	Medium	2 mins	Solve
A training job is running slower than expected. Your dashboard shows GPU utilization steady at 95%, so a teammate concludes the GPUs are the bottleneck and asks to buy faster ones. You dig deeper and pull these counters: gpu_utilization (any-kernel-active): 95% sm_occupancy (compute throughput): 12% dram_read_throughput: 8% of peak host_to_device_copy_active: 89% of step time dataloader_workers: 2 What is the most likely real bottleneck, and why is "buy faster GPUs" wrong? A. The GPU interconnect is saturated, so the resolution is adding more NVLink bandwidth between the devices on the node B. The GPUs are truly saturated at 95%, so adding faster accelerators is the correct and direct remedy in this case C. The DRAM is overheating and throttling, so the proper fix is better cooling rather than touching the input pipeline D. The model is too small to fill the GPUs, so the right move is increasing model depth until utilization climbs higher E. The data input pipeline is starving the GPU, which sits busy copying and idle on compute rather than doing real work
	Parallelism Strategy Selection Tensor Parallelism	LLM Inference	Medium	2 mins	Solve
A 70B model is too large to fit on a single GPU. You have one node with 8 GPUs connected to each other by high-bandwidth NVLink, and your production workload is dominated by latency-sensitive single-request and very small-batch inference. You must split the model across the 8 GPUs and are weighing tensor parallelism (TP), which shards each layer's matrices, against pipeline parallelism (PP), which assigns consecutive layer groups to different GPUs. Which choice and justification fit this scenario best? A. Use pipeline parallelism, since splitting by layers minimizes total cross-GPU traffic regardless of the batch size involved B. Use pipeline parallelism, since it is the only one of the two methods able to host a model larger than a single GPU C. Use tensor parallelism, since its per-layer all-reduce over NVLink is cheap and it avoids PP's idle stages at small batch D. Use tensor parallelism, since it removes the need for any inter-GPU communication whatsoever during the decode phase E. Split the node so four GPUs run TP and four run PP, since that evenly balances latency against throughput by design

Trusted by recruitment teams in enterprises globally

We evaluated several of their competitors and found Adaface to be the most compelling. Great library of questions that are designed to test for fit rather than memorization of algorithms.

Swayam Narain, CTO, Affable

Join 1200+ companies in 80+ countries.

Try the most candidate friendly skills assessment tool today.

GET STARTED FOR FREE

Ready to streamline your recruitment efforts with Adaface?

Chat with us

Start 14-day free trial

40 min tests.
No trick questions.
Accurate shortlisting.

Pricing

Features

Integrations

AI Resume Parser

Singapore (HQ)
32 Carpenter Street, Singapore 059911

Contact: +65 9447 0488
India
WeWork Prestige Atlanta, 80 Feet Main Road, Koramangala 1A Block, Bengaluru, Karnataka, 560034
Contact: +91 6305713227

Adaface Sample LLM Inference Questions

Skills

Aptitude & Soft Skills

Product & Design

Visualization & BI Tools

Programming Languages

Frontend Development

Backend Development

Mobile Development

Data Science & AI

Data Engineering & Databases

Cloud & DevOps

Testing & QA

Languages

Accounting & Finance

Microsoft & Power Platform

Integration & Middleware

CRM & ERP Platforms

Cybersecurity & Networking

Marketing & Growth

SAP Technologies

Oracle Technologies

Other Tools & Technologies

Trusted by recruitment teams in enterprises globally

40%