Search test library by skills or roles
⌘ K

Adaface Sample LLM Inference Questions

Here are some sample LLM Inference questions from our premium questions library (10273 non-googleable questions).

Skills

Visualization & BI Tools

Microsoft & Power Platform

Integration & Middleware

Cybersecurity & Networking

Oracle Technologies

🧐 Question

Medium

Misleading GPU Utilization
AI Infrastructure
GPU Utilization
Bottleneck Diagnosis
Solve
A training job is running slower than expected. Your dashboard shows GPU utilization steady at 95%, so a teammate concludes the GPUs are the bottleneck and asks to buy faster ones. You dig deeper and pull these counters:
gpu_utilization (any-kernel-active):   95%
            sm_occupancy (compute throughput):     12%
            dram_read_throughput:                   8% of peak
            host_to_device_copy_active:            89% of step time
            dataloader_workers:                     2
What is the most likely real bottleneck, and why is "buy faster GPUs" wrong?
            
            A. The GPU interconnect is saturated, so the resolution is adding more NVLink bandwidth between the devices on the node
            B. The GPUs are truly saturated at 95%, so adding faster accelerators is the correct and direct remedy in this case
            C. The DRAM is overheating and throttling, so the proper fix is better cooling rather than touching the input pipeline
            D. The model is too small to fill the GPUs, so the right move is increasing model depth until utilization climbs higher
            E. The data input pipeline is starving the GPU, which sits busy copying and idle on compute rather than doing real work

Medium

Parallelism Strategy Selection
Tensor Parallelism
Solve
A 70B model is too large to fit on a single GPU. You have one node with 8 GPUs connected to each other by high-bandwidth NVLink, and your production workload is dominated by latency-sensitive single-request and very small-batch inference. You must split the model across the 8 GPUs and are weighing tensor parallelism (TP), which shards each layer's matrices, against pipeline parallelism (PP), which assigns consecutive layer groups to different GPUs. Which choice and justification fit this scenario best?
            
            A. Use pipeline parallelism, since splitting by layers minimizes total cross-GPU traffic regardless of the batch size involved
            B. Use pipeline parallelism, since it is the only one of the two methods able to host a model larger than a single GPU
            C. Use tensor parallelism, since its per-layer all-reduce over NVLink is cheap and it avoids PP's idle stages at small batch
            D. Use tensor parallelism, since it removes the need for any inter-GPU communication whatsoever during the decode phase
            E. Split the node so four GPUs run TP and four run PP, since that evenly balances latency against throughput by design
🧐 Question 🔧 Skill

Medium

Misleading GPU Utilization
AI Infrastructure
GPU Utilization
Bottleneck Diagnosis

2 mins

LLM Inference
Solve

Medium

Parallelism Strategy Selection
Tensor Parallelism

2 mins

LLM Inference
Solve
🧐 Question 🔧 Skill 💪 Difficulty ⌛ Time
Misleading GPU Utilization
AI Infrastructure
GPU Utilization
Bottleneck Diagnosis
LLM Inference
Medium 2 mins
Solve
Parallelism Strategy Selection
Tensor Parallelism
LLM Inference
Medium 2 mins
Solve

Trusted by recruitment teams in enterprises globally

Amazon Morgan Stanley Vodafone United Nations HCL PayPal Bosch WeWork Optimum Solutions Deloitte Microsoft NCS Doubtnut Sokrati J&T Express Capegemini

We evaluated several of their competitors and found Adaface to be the most compelling. Great library of questions that are designed to test for fit rather than memorization of algorithms.


Swayam Narain, CTO, Affable

hashtag image heart icon Swayam
customers across world
Join 1200+ companies in 80+ countries.
Try the most candidate friendly skills assessment tool today.
g2 badges
Ready to streamline your recruitment efforts with Adaface?
Ready to streamline your recruitment efforts with Adaface?
logo
40 min tests.
No trick questions.
Accurate shortlisting.
ada
Ada
● Online
Previous
Score: NA
Next
✖️