| Medium Error Budget Management | 3 mins Site Reliability Engineering | Try practice test |
You are a site reliability engineer responsible for maintaining a microservices-based e-commerce platform. Your system consists of several independent services, each deployed on its separate container within a Kubernetes cluster.
Your organization follows a strict Service Level Objective (SLO) to maintain user satisfaction, which mandates that the 95th percentile latency for all requests over a 30-day period should not exceed 200 ms.
The following pseudo-code represents a simplified version of the request processing in your system:
You realize that over the first two weeks of the current 30-day window, the 95th percentile latency has risen to 250 ms. Analyzing further, you discover that out of 10 million requests, 600,000 requests took more than 200 ms to complete.
Given these facts, which of the following is the most effective course of action that you can take to troubleshoot and reduce the system's latency issues?
A: Change the latency log level to debug to gather more information.
B: Increase the SLO for latency to 250 ms to accommodate the current system performance.
C: Introduce more instances of each microservice to handle the increased load.
D: Implement a distributed tracing mechanism to identify the microservices contributing most to the latency.
E: Implement request throttling to reduce the overall number of requests.
|
| Medium Incident Response Procedure | 3 mins Site Reliability Engineering | Try practice test |
You are an SRE for a large-scale distributed system. The system architecture includes five primary servers (P1 to P5) and three backup servers (B1 to B3). The system uses an advanced load balancer that distributes the workload across the primary servers evenly.
One day, the monitoring system triggers an alert that server P5 is not responding. The pseudo-code for the current incident response procedure is as follows:
The function 'replaceServer(server)' replaces the failed server with a new one from a pool of spare servers, which takes around 30 minutes.
The current discussion revolves around modifying this procedure to improve system resilience and minimize potential downtime. The backup servers are underutilized and could be leveraged more effectively. Also, the load balancer can dynamically shift workloads based on server availability and response time.
Based on the situation above, what is the best approach to optimize the incident response procedure?
A: Implement an early warning system to predict server failures and prevent them.
B: Upon failure detection, immediately divert traffic to backup servers, then attempt to reboot the primary server, and replace if necessary.
C: Replace the failed server without attempting a reboot and keep the traffic on primary servers.
D: Enable auto-scaling to add more servers when a primary server fails.
E: Switch to a more advanced load balancer that can detect and handle server failures independently.
|
| Medium Service Balancer Decision-making | 2 mins Site Reliability Engineering | Try practice test |
You are a Site Reliability Engineer (SRE) working on a distributed system with a load balancer that distributes requests across a number of servers based on the current load. The decision algorithm for load balancing is written in pseudo-code as follows:
The system receives a large burst of requests. In response to this, some engineers propose increasing the `threshold` value to allow for more requests to be handled concurrently by each server. Others argue that instead, we should increase the number of servers to distribute the load more evenly.
Consider that the system has auto-scaling capabilities based on the average load of all servers, but the scaling operation takes about 15 minutes to add new servers to the pool. Also, the servers' performance degrades sharply if the load is much above the threshold.
One of the engineers also proposes modifying the getServer function logic to distribute the incoming load one by one across all servers to trigger the average load to rise faster.
Based on this scenario, what is the best approach?
A: Increase the `threshold` value to allow more requests on each server.
B: Add more servers to distribute the load, regardless of the auto-scaling delay.
C: Modify the getServer function to distribute the incoming load one by one across all servers to trigger the average load to rise faster.
D: Increase the `threshold` and add more servers simultaneously.
E: Manually trigger the auto-scaling process before the load increases.
|
| Medium Resource Analysis | 3 mins DevOps | Try practice test |
As a senior DevOps engineer, you are tasked with diagnosing performance issues on a Linux server running Ubuntu 20.04. The server hosts several critical applications, but lately, users have been experiencing significant slowness. Initial monitoring shows that CPU and memory utilization are consistently high. To identify the root cause, you check the output of `top` and `ps` commands, which indicate that a particular process is consuming an unusually high amount of resources. However, the process name is generic and does not clearly indicate which application or service it belongs to. You also examine `/var/log/syslog` for any unusual entries but find nothing out of the ordinary. Based on this situation, which of the following steps would most effectively help you identify and resolve the performance issue?
A: Increase the server's physical memory and CPU capacity.
B: Use the `lsof` command to identify the files opened by the suspect process.
C: Reboot the server to reset all processes.
D: Examine the `/etc/hosts` file for any incorrect configurations.
E: Run the `netstat` command to check for abnormal network activity.
F: Check the crontab for any recently added scheduled tasks.
|
| Medium Streamlined DevOps | 2 mins DevOps | Try practice test |
You are in charge of developing a Bash script for setting up a continuous integration pipeline for a web application. The source code is hosted in a Git repository. The script's goals include:
1. Ensuring the local copy of the repository in /var/www/html is updated to the latest version.
2. Creating a .env file with APP_ENV=production in the project root if it doesn't already exist.
3. Running a test suite with ./run_tests.sh and handling any test failures appropriately.
4. Logging the current timestamp and commit hash in deployment_log.txt in the project root if tests pass.
Which of the following script options would most effectively and safely accomplish these tasks?
|
| Medium Docker Multistage Build Analysis | 3 mins Docker | Try practice test |
Consider the following Dockerfile, which utilizes multistage builds. The aim is to build a lightweight, optimized image that just runs the application.
The Dockerfile first defines a base image that includes Node.js and npm, then it creates an intermediate image to install the npm dependencies. Afterwards, it runs the tests in another stage and finally, creates the release image.
Which of the following statements are true?
A: The final image will include the test scripts.
B: If a test fails, the final image will not be created.
C: The node_modules directory in the final image comes from the base image.
D: The final image will only contain the necessary application files and dependencies.
E: If the application's source code changes, only the release stage needs to be rebuilt.
|
| Easy Docker Networking and Volume Mounting Interplay | 3 mins Docker | Try practice test |
You have two docker containers, X and Y. Container X is running a web service listening on port 8080, and container Y is supposed to consume this service. Both containers are created from images that don't have any special network configurations.
Container X has a Dockerfile as follows:
And, you build and run it with the following commands:
Container Y is also running alpine with python installed, and it's supposed to read data from the `/app/data` directory and send a GET request to `http://localhost:8080` every 5 minutes. The Dockerfile for container B is:
Assuming all the python scripts work perfectly and firewall isn't blocking any connections, you find that container Y can't access the web service of container X via `http://localhost:8080` and also it can't read the data in `/app/data` directory. What could be the potential reason(s)?
A: Y can't access X's web service because they're in different Docker networks.
B: Y can't read the data because the volume is not shared correctly.
C: Both A and B are correct.
D: Both A and B are incorrect.
|
| Medium Dockerfile Optimization | 2 mins Docker | Try practice test |
You have been asked to optimize a Dockerfile for a Python application that involves a heavy dependency installation. Here is the Dockerfile you are starting with:
Given that the application's source code changes frequently but the dependencies listed in requirements.txt rarely change, how can you optimize this Dockerfile to take advantage of Docker's layer caching, reducing the build time?
A: Move the `RUN pip install` command to before the `COPY` command.
B: Change `COPY . /app` to `COPY ./app.py /app` and move the `RUN pip install` command to before the `COPY` command.
C: Add `RUN pip cache purge` before `RUN pip install`.
D: Replace the base image with `python:3.8-slim`.
E: Implement multi-stage builds.
|
| Medium Dockerfile Updates | 2 mins Docker | Try practice test |
Check the following Dockerfile used for a project (STAGE 1):
We created an image from this Dockerfile on Dec 14 2021. A couple of weeks after Dec 14 2021, Ubuntu released new security updates to their repository. After 2 months, we modified the file (STAGE 2):
Couple of weeks later, we further modified the file to add a local file ada.txt to /ada.txt (STAGE 3): (Note that ada.txt exists in /home/adaface and the dockerfile exists in /home/code folders)
Pick correct statements:
A: If we run “docker build .” at STAGE 2, new Ubuntu updates will be fetched because apt-get update will be run again since cache is invalidated for all lines/layers of Dockerfile when a new line is added.
B: If we run “docker build .” at STAGE 2, new Ubuntu updates will not be fetched since cache is invalidated only for last two lines of the updated Dockerfile. Since the first two commands remain the same, cached layers are re-used skipping apt get update.
C: To skip Cache, “docker build -no-cache .” can be used at STAGE 2. This will ensure new Ubuntu updates are picked up.
D: Docker command “docker build .” at STAGE 3 works as expected and adds local file ada.txt to the image.
E: Docker command “docker build .” at STAGE 3 gives an error “no such file or directory” since /home/adaface/ada.txt is not part of the Dockerfile context.
|
| Medium Efficient Dockerfile | 2 mins Docker | Try practice test |
Review the following Dockerfiles that work on two projects (project and project2):
All Docker files have the same end result:
- ‘project’ is cloned from git. After running few commands, ‘project’ code is removed.
- ‘project2’ is copied from file system and permissions to the folder is changed.
Pick the correct statements:
A: File 1 is the most efficient of all.
B: File 2 is the most efficient of all.
C: File 3 is the most efficient of all.
D: File 4 is the most efficient of all.
E: Merging multiple RUN commands into a single RUN command is efficient for ‘project’ since each RUN command creates a new layer with changed files and folders. Deleting files with RUN only marks these files as deleted but does not reclaim disk space.
F: Copying ‘project2’ files and changing ownership in two separate commands will result in two layers since Docker duplicates all the files twice.
|
| Medium ConfigMap and Secrets Interaction | 2 mins Kubernetes | Try practice test |
In a Kubernetes cluster, you are working on configuring a new deployment that should be able to access specific environment variables through both ConfigMap and Secrets resources. The deployment YAML is structured as follows:
You have applied the above YAML successfully without any errors. Now, you are about to configure a service to expose the deployment. Before doing that, you want to confirm the security and setup implications.
Based on the above configuration, which of the following statements are true?
1. The DATABASE_PASSWORD will be mounted as an environment variable in plain text.
2. The ConfigMap data can be updated and the changes will be reflected automatically in the running pods without any need for a redeployment.
3. If a potential attacker gains access to the cluster, they would be able to retrieve the DATABASE_PASSWORD in plain text from the secrets resource as it is defined in stringData.
4. The APP_ENV and DATABASE_URL values are securely stored and cannot be accessed by non-admin users.
5. If a new container in the same pod is created, it would automatically have the DATABASE_PASSWORD environment variable configured.
|
| Medium Ingress from namespace | 3 mins Kubernetes | Try practice test |
You are tasked with deploying a Kubernetes network policy. Here are the specifications:
- Name of the policy: adaface-namespace
- Policy to be deployed in ‘chatbot’ namespace
- The policy should allow ALL traffic only from ‘tester’ namespace
- Policy should not allow communication between pods in the same namespace
- Traffic only from ‘tester’ namespace is allowed on all ports
Which of the following configuration files is BEST suited to create required dependencies and deploy the network policy?
|
| Medium Pod Affinity and Resource Quota Compliance | 2 mins Kubernetes | Try practice test |
You are working on a Kubernetes project where you need to ensure that certain pods get scheduled on nodes based on the presence of other pods and to limit the amount of resources that can be consumed in a namespace. You have been given the following YAML file which contains a combination of a pod definition and a resource quota:
With the application of the above YAML configuration, assess the validity of the statements and choose the correct option that lists all the true statements.
1. The critical-pod will only be scheduled on nodes where at least one pod with a label security=high is already running.
2. The critical-pod is adhering to the resource quotas defined in the compute-quota.
3. The compute-quota restricts the namespace to only allow a total of 1 CPU and 1Gi memory in requests and 2 CPUs and 2Gi memory in limits across all pods.
4. If a node has multiple pods labeled with security=high, the critical-pod can potentially be scheduled on that node, given other scheduling constraints are met.
5. The critical-pod exceeds the defined memory request quota as per the compute-quota.
|
| Easy Resource limits | 3 mins Kubernetes | Try practice test |
How would you deploy a Kubernetes pod with the following specifications:
- Name of pod: adaface
- Resource limits: 1 CPU and 512Mi memory
- Image: haproxy
A: kubectl run adaface --image=haproxy --limits='cpu=1,memory=512Mi'
B: kubectl run adaface --image=haproxy --requests='cpu=1,memory=512Mi'
|