Search test library by skills or roles
⌘ K

About the test:

The Site Reliability Engineer (SRE) Test uses scenario-based questions to evaluate knowledge of cloud technologies, system design, automation, and troubleshooting skills. It assesses understanding of infrastructure as code, continuous integration and deployment, and monitoring systems. The test also measures proficiency in scripting languages and hands-on coding for infrastructure problem-solving. It further includes real-world situations to examine critical thinking and incident management abilities.

Covered skills:

  • Cloud Technologies and Platforms
  • Automation Tools and Practices
  • System Design and Architecture
  • Infrastructure as Code (IaC)
See all covered skills

9 reasons why
9 reasons why

Adaface Site Reliability Test is the most accurate way to shortlist Site Reliability Engineers



Reason #1

Tests for on-the-job skills

The Site Reliability Test helps recruiters and hiring managers identify qualified candidates from a pool of resumes, and helps in taking objective hiring decisions. It reduces the administrative overhead of interviewing too many candidates and saves time by filtering out unqualified candidates at the first step of the hiring process.

Non-googleable questions & proctoring features enable you to be comfortable with conducting assessments online. The Site Reliability Test is ideal for helping recruiters identify which candidates have the skills to do well on the job.

Reason #2

No trick questions

no trick questions

Traditional assessment tools use trick questions and puzzles for the screening, which creates a lot of frustration among candidates about having to go through irrelevant screening assessments.

The main reason we started Adaface is that traditional pre-employment assessment platforms are not a fair way for companies to evaluate candidates. At Adaface, our mission is to help companies find great candidates by assessing on-the-job skills required for a role.

Why we started Adaface ->
Reason #3

Non-googleable questions

We have a very high focus on the quality of questions that test for on-the-job skills. Every question is non-googleable and we have a very high bar for the level of subject matter experts we onboard to create these questions. We have crawlers to check if any of the questions are leaked online. If/ when a question gets leaked, we get an alert. We change the question for you & let you know.

These are just a small sample from our library of 10,000+ questions. The actual questions on this Site Reliability Test will be non-googleable.

🧐 Question

Medium

Circuit Breakers in Microservices
Load Balancing
Circuit Breaker Pattern
Microservices Architecture
Solve
You are a site reliability engineer at a company that has recently migrated to a microservices architecture. The company has three microservices - Service A, Service B, and Service C. Service A receives the most traffic and consequently, experiences the most load. To manage the load, you've implemented a Round-Robin load balancer. Simultaneously, you've introduced a circuit breaker in Service B due to its dependency on an external service that occasionally experiences downtime. 

Pseudo-code of the services:
 image
The implemented load balancer and circuit breaker are working as expected. However, you've noticed that when the external service that Service B depends on experiences downtime, there's an increase in error rate and user complaints, as requests are still being routed to Service B.

To minimize the impact of the external service's downtime on your system, which of the following steps should you consider?
A: Modify the load balancer to send all traffic to Service B only when the circuit breaker is open.
B: Modify the load balancer to stop sending traffic to Service B when the circuit breaker is open.
C: Modify Service B to handle all requests, regardless of the state of the circuit breaker.
D: Modify the circuit breaker to open only when the external service is up.
E: Modify the circuit breaker to close only when the external service is down.

Medium

Error Budget Management
Latency Monitoring
Error Budgets
Distributed Tracing
Solve
You are a site reliability engineer responsible for maintaining a microservices-based e-commerce platform. Your system consists of several independent services, each deployed on its separate container within a Kubernetes cluster.

Your organization follows a strict Service Level Objective (SLO) to maintain user satisfaction, which mandates that the 95th percentile latency for all requests over a 30-day period should not exceed 200 ms.

The following pseudo-code represents a simplified version of the request processing in your system:
 image
You realize that over the first two weeks of the current 30-day window, the 95th percentile latency has risen to 250 ms. Analyzing further, you discover that out of 10 million requests, 600,000 requests took more than 200 ms to complete.

Given these facts, which of the following is the most effective course of action that you can take to troubleshoot and reduce the system's latency issues?
A: Change the latency log level to debug to gather more information.
B: Increase the SLO for latency to 250 ms to accommodate the current system performance.
C: Introduce more instances of each microservice to handle the increased load.
D: Implement a distributed tracing mechanism to identify the microservices contributing most to the latency.
E: Implement request throttling to reduce the overall number of requests.

Medium

Incident Response Procedure
Incident Management
Disaster Recovery
System Optimization
Solve
You are an SRE for a large-scale distributed system. The system architecture includes five primary servers (P1 to P5) and three backup servers (B1 to B3). The system uses an advanced load balancer that distributes the workload across the primary servers evenly. 

One day, the monitoring system triggers an alert that server P5 is not responding. The pseudo-code for the current incident response procedure is as follows:
 image
The function 'replaceServer(server)' replaces the failed server with a new one from a pool of spare servers, which takes around 30 minutes. 

The current discussion revolves around modifying this procedure to improve system resilience and minimize potential downtime. The backup servers are underutilized and could be leveraged more effectively. Also, the load balancer can dynamically shift workloads based on server availability and response time.

Based on the situation above, what is the best approach to optimize the incident response procedure?
A: Implement an early warning system to predict server failures and prevent them.
B: Upon failure detection, immediately divert traffic to backup servers, then attempt to reboot the primary server, and replace if necessary.
C: Replace the failed server without attempting a reboot and keep the traffic on primary servers.
D: Enable auto-scaling to add more servers when a primary server fails.
E: Switch to a more advanced load balancer that can detect and handle server failures independently.

Medium

Service Balancer Decision-making
Load Balancing
Distributed Systems
Concurrent Processing
Solve
You are a Site Reliability Engineer (SRE) working on a distributed system with a load balancer that distributes requests across a number of servers based on the current load. The decision algorithm for load balancing is written in pseudo-code as follows:
 image
The system receives a large burst of requests. In response to this, some engineers propose increasing the `threshold` value to allow for more requests to be handled concurrently by each server. Others argue that instead, we should increase the number of servers to distribute the load more evenly. 

Consider that the system has auto-scaling capabilities based on the average load of all servers, but the scaling operation takes about 15 minutes to add new servers to the pool. Also, the servers' performance degrades sharply if the load is much above the threshold.

One of the engineers also proposes modifying the getServer function logic to distribute the incoming load one by one across all servers to trigger the average load to rise faster.

Based on this scenario, what is the best approach?
A: Increase the `threshold` value to allow more requests on each server.
B: Add more servers to distribute the load, regardless of the auto-scaling delay.
C: Modify the getServer function to distribute the incoming load one by one across all servers to trigger the average load to rise faster.
D: Increase the `threshold` and add more servers simultaneously.
E: Manually trigger the auto-scaling process before the load increases.
🧐 Question🔧 Skill

Medium

Circuit Breakers in Microservices
Load Balancing
Circuit Breaker Pattern
Microservices Architecture
3 mins
Site Reliability Engineering
Solve

Medium

Error Budget Management
Latency Monitoring
Error Budgets
Distributed Tracing
3 mins
Site Reliability Engineering
Solve

Medium

Incident Response Procedure
Incident Management
Disaster Recovery
System Optimization
3 mins
Site Reliability Engineering
Solve

Medium

Service Balancer Decision-making
Load Balancing
Distributed Systems
Concurrent Processing
2 mins
Site Reliability Engineering
Solve
🧐 Question🔧 Skill💪 Difficulty⌛ Time
Circuit Breakers in Microservices
Load Balancing
Circuit Breaker Pattern
Microservices Architecture
Site Reliability Engineering
Medium3 mins
Solve
Error Budget Management
Latency Monitoring
Error Budgets
Distributed Tracing
Site Reliability Engineering
Medium3 mins
Solve
Incident Response Procedure
Incident Management
Disaster Recovery
System Optimization
Site Reliability Engineering
Medium3 mins
Solve
Service Balancer Decision-making
Load Balancing
Distributed Systems
Concurrent Processing
Site Reliability Engineering
Medium2 mins
Solve
Reason #4

1200+ customers in 75 countries

customers in 75 countries
Brandon

With Adaface, we were able to optimise our initial screening process by upwards of 75%, freeing up precious time for both hiring managers and our talent acquisition team alike!


Brandon Lee, Head of People, Love, Bonito

Reason #5

Designed for elimination, not selection

The most important thing while implementing the pre-employment Site Reliability Test in your hiring process is that it is an elimination tool, not a selection tool. In other words: you want to use the test to eliminate the candidates who do poorly on the test, not to select the candidates who come out at the top. While they are super valuable, pre-employment tests do not paint the entire picture of a candidate’s abilities, knowledge, and motivations. Multiple easy questions are more predictive of a candidate's ability than fewer hard questions. Harder questions are often "trick" based questions, which do not provide any meaningful signal about the candidate's skillset.

Reason #6

1 click candidate invites

Email invites: You can send candidates an email invite to the Site Reliability Test from your dashboard by entering their email address.

Public link: You can create a public link for each test that you can share with candidates.

API or integrations: You can invite candidates directly from your ATS by using our pre-built integrations with popular ATS systems or building a custom integration with your in-house ATS.

invite candidates
Reason #7

Detailed scorecards & benchmarks

Reason #8

High completion rate

Adaface tests are conversational, low-stress, and take just 25-40 mins to complete.

This is why Adaface has the highest test-completion rate (86%), which is more than 2x better than traditional assessments.

test completion rate
Reason #9

Advanced Proctoring


What topics are covered in the Site Reliability Test?

Cloud Technologies
System Design
Automation Practices
Infrastructure as Code
Continuous Integration
Continuous Deployment
Network Troubleshooting
Monitoring Systems
Scripting for Infrastructure
Incident Management
Performance Tuning
Load Balancing
Database Reliability
Security Principles
Disaster Recovery Planning
Containerization
Service Level Objectives
Error Budget Management
Traffic Management
Distributed Systems
High Availability Strategies
Capacity Planning
Resource Optimization
Software Lifecycle Management
Scalability
Redundancy Planning
Log Analysis
Post-Mortem Analysis
Virtual Machines
Storage Management
Latency and Performance Metrics
Infrastructure Monitoring
Application Performance Monitoring
Network Architecture
Configuration Management
Serverless Computing
Distributed Storage Systems
Network Protocols
Application Debugging
Distributed Database Systems
Singapore government logo

The hiring managers felt that through the technical questions that they asked during the panel interviews, they were able to tell which candidates had better scores, and differentiated with those who did not score as well. They are highly satisfied with the quality of candidates shortlisted with the Adaface screening.


85%
reduction in screening time

FAQs

Can I combine multiple skills into one custom assessment?

Yes, absolutely. Custom assessments are set up based on your job description, and will include questions on all must-have skills you specify.

Do you have any anti-cheating or proctoring features in place?

We have the following anti-cheating features in place:

  • Non-googleable questions
  • IP proctoring
  • Web proctoring
  • Webcam proctoring
  • Plagiarism detection
  • Secure browser

Read more about the proctoring features.

How do I interpret test scores?

The primary thing to keep in mind is that an assessment is an elimination tool, not a selection tool. A skills assessment is optimized to help you eliminate candidates who are not technically qualified for the role, it is not optimized to help you find the best candidate for the role. So the ideal way to use an assessment is to decide a threshold score (typically 55%, we help you benchmark) and invite all candidates who score above the threshold for the next rounds of interview.

What experience level can I use this test for?

Each Adaface assessment is customized to your job description/ ideal candidate persona (our subject matter experts will pick the right questions for your assessment from our library of 10000+ questions). This assessment can be customized for any experience level.

Does every candidate get the same questions?

Yes, it makes it much easier for you to compare candidates. Options for MCQ questions and the order of questions are randomized. We have anti-cheating/ proctoring features in place. In our enterprise plan, we also have the option to create multiple versions of the same assessment with questions of similar difficulty levels.

I'm a candidate. Can I try a practice test?

No. Unfortunately, we do not support practice tests at the moment. However, you can use our sample questions for practice.

What is the cost of using this test?

You can check out our pricing plans.

Can I get a free trial?

Yes, you can sign up for free and preview this test.

I just moved to a paid plan. How can I request a custom assessment?

Here is a quick guide on how to request a custom assessment on Adaface.

customers across world
Join 1200+ companies in 75+ countries.
Try the most candidate friendly skills assessment tool today.
g2 badges
Ready to use the Adaface Site Reliability Test?
Ready to use the Adaface Site Reliability Test?
Chat with us
logo
40 min tests.
No trick questions.
Accurate shortlisting.
Terms Privacy Trust Guide

🌎 Pick your language

English Norsk Dansk Deutsche Nederlands Svenska Français Español Chinese (简体中文) Italiano Japanese (日本語) Polskie Português Russian (русский)
ada
Ada
● Online
Previous
Score: NA
Next
✖️