System design is the use of computer engineering principles to build large-scale distributed systems. It involves converting business problems and requirements into technical solutions. Engineers use system design patterns to build reliable, scalable, and maintainable systems.
Let’s take a pizza shop.
Your pizza shop starts becoming famous, and more and more customers begin placing orders. It’s your first shop, and you have only one chef. As customer demand increases, the load on the chef also increases.
How can we handle more orders?
- We can ask the chef to work harder and motivate them by paying extra.
One option is to make the chef (server) do all the work by improving efficiency—for example, better prep work in the early morning and a smoother workflow. Increasing the resources of a single server is called vertical scaling.
Vertical scaling is replacing the current machines with more advanced machines to improve throughput and hence response time. The techniques are used in conjunction in real-world systems.
Now let’s say we go with this method. One day, the chef needs a day off. Suddenly, your shop becomes inoperable. This is called a single point of failure.
To reduce this risk, we can hire a backup chef (backup server). If the main chef (server) is unavailable for many days, we can hire the backup as full-time and add more chefs. This method—adding more servers—is called horizontal scaling.
Horizontal scaling means buying more similar machines and distributing the work. Horizontal scaling is adding more machines to deal with increasing requirements. These machines handle requests in parallel to improve user experience.
Let’s say Chef 1 and Chef 3 specialize in pizza, while Chef 2 specializes in garlic bread. How should we route incoming orders?
If we assign orders randomly, garlic bread might go to Chef 1 or 3, and pizza might go to Chef 2. That is inefficient.
A better approach is:
- Pizza orders → Chef 1 or Chef 3
- Garlic bread orders → Chef 2
This is similar to microservices architecture, where each service handles a specific responsibility.
Now imagine your shop is doing very well, but one day there is a power outage, or you lose your license temporarily. You cannot operate that day. What can we do?
We can distribute the business by opening another branch in a different location. We may keep lower capacity there, but if Shop 1 fails, Shop 2 can take over. Delivery may be slower, but business continues. This is a distributed system idea.
When a new order is placed, we can send it to Shop 1 or Shop 2 based on the customer’s location. This is called partitioning (more specifically, geographic partitioning/routing).
Customers should not decide which shop handles their request. The system should decide automatically.
So we place a central routing layer in front. This router should not route randomly; it should follow clear policies.
For example:
- Shop 1 delivery time: 1 hour 15 minutes
- Shop 2 delivery time: 1 hour 5 minutes
The router should prefer Shop 2 (if capacity and health are okay).
This router is called a load balancer.
Now our system is more:
- Scalable
- Fault-tolerant
Let’s continue the story.
After an order is prepared, a delivery agent delivers it. The delivery agent does not care whether it is pizza or burger; they only care about delivering it quickly.
This means:
- Kitchen/order processing is one system
- Delivery is another system
By separating responsibilities, we decouple the system. Decoupled systems are easier to change, scale, and extend independently.
To monitor and improve reliability, we should track:
- Monitoring & Analytics
- Auditing / Logging
- Reporting
- (Optional) ML-based forecasting or optimization
Final idea: Build decoupled systems to improve extensibility, while using scaling, partitioning, and load balancing for reliability and performance.
- How many active users (daily/monthly)?
- Read vs write ratio?
- Peak factor (traffic spikes)?
- Retention period of the storage.
There are 4 types of estimation you need to do in an interview:
- Traffic
- Storage
- Bandwidth
- Memory
The point is: during system design, we need these calculations to understand how much our system will require to function efficiently and what to expect.
Most of these numbers are assumptions, but we need to ask clarification questions based on what capacity we are calculating.
Keep in mind: don’t spend more than 5 minutes on this.
We should ask the interviewer: how many active users are there?
Let’s take Pastebin as an example (in simple terms, it is for pasting/uploading text and reading shared text).
Let’s assume we have 10 million users on this website.
Let’s be honest: we don’t care about just user count; we care about how many requests we will receive.
- Requests to upload
- Requests to read
We need to determine whether the system is read-heavy or write-heavy.
Since this website is for sharing content that will be read by other users, we can assume it is read-heavy.
Let’s say that for every 10 million users, 10% are uploading text.
That is:
10M × 0.1 = 1M uploads
For reads, we can assume ratios like:
- 10:1
- 50:1
- 100:1
These are assumptions, and we can use similar ones for many websites.
Let’s take 50:1 in this scenario.
For every 1 upload, there are 50 reads.
So:
1M × 50 = 50 million read requests per day
Now convert this to seconds:
50M / (60 × 60 × 24) = 50M / 86,400
Always round numbers for easy calculation, since these are assumptions anyway.
For example:
50M / 100,000 ≈ 500 requests per second
We only need a ballpark estimate.
Let’s do this for writes as well:
1M / 86,400 ≈ 10 writes per second
We need to estimate storage in ballpark values.
Since this website is mostly used for sharing code snippets, we can assume:
- 200 lines per request
- 10 words per line
- 5 characters per word
So:
Each character is 1 byte.
200 × 10 × 5 × 1 byte = 10,000 bytes = 10 KB
We have 1M users writing/uploading.
Useful unit table:
| Unit | Number | Value |
|---|---|---|
| Kilo | Thousand | 10^3 |
| Mega | Million | 10^6 |
| Giga | Billion | 10^9 |
| Tera | Trillion | 10^12 |
| Peta | Quadrillion | 10^15 |
1M × 10KB = 10GB per day
We should ask the interviewer whether storage has any retention period.
Let’s say retention is 5 years.
5 × 365 ≈ 2000 days (rounded)
2000 × 10GB = 20TB
So for 5 years, it is about 20TB.
This is only for original data. We also need replication.
20TB × replication factor (assume 3) = 60TB
Bandwidth is the amount of data transferred per second.
Incoming data per second (write requests):
We already calculated write requests as 10 per second.
So:
10 × 10KB = 100KB/s incoming
Outgoing data per second (read requests):
We calculated around 500 read requests per second.
So:
500 × 10KB = 5000KB/s = 5MB/s outgoing
Caching is used to retrieve read requests faster.
A general rule is the 80/20 rule, which means 20% of uploaded content causes 80% of traffic.
So we keep that 20% in cache.
20% of 10KB = 2KB
50 million read requests × 10KB per request × 0.2 =
50M × 2KB = 100GB
So, we need about 100GB memory for cache.
Formula:
Requests per second to handle / requests per second a single server can handle
= 500 / x
Requests per second a single server can handle depends on:
- CPU-bound
- Memory-bound
- I/O-bound
For example, in a CPU-bound case:
- 8 cores
- Time to process one request = 0.5 sec
x = number of physical cores / time per request
x = 8 / 0.5 = 16 requests per second
500 / 16 ≈ 30 to 50 servers needed to handle this load.
- 500K words in English
- A line of text contains 10 words
- A word contains 5 characters (5 bytes)
- HD image ≈ 3MB (size depends on height, width, and pixel depth) Example: 1280 × 720 × 3 bytes ≈ 3MB
- Profile image (300 × 300) ≈ 300KB
- Video size depends on frame size, frame rate, compression ratio, and duration
- Rough estimate: 1 minute of HD video ≈ 50MB
- Frame rate for HD video is often around 30 fps
- Formula: Frame size × Frame rate × Compression ratio × Duration Example: 3MB × 30 × (1/100) × 60 ≈ 54MB
In YouTube-like systems, we may also store multiple resolutions such as 480p, 360p, 240p, and 144p.
Let’s say HD size is X = 50MB.
Then roughly:
- 480p = X/2
- 360p = X/4
- 240p = X/8
- 144p = X/16
If we add these with HD, total storage becomes roughly around 2X.
So:
50MB × 2 = 100MB total storage per video
HTTP stands for Hypertext Transfer Protocol. It is the standard protocol used by the web browsers and servers to communicate and exchange data. Think HTTP like sending a postcard. Anyone who handles the postcard during deliver (like routers, ISPs, or hackers on the network) can read whats written on it because its all in the plain text.
- Http operates in 80 by default
- It transfers data in plain text, meaning the content is not protexted.
- Because it is unencrypted, attackers can intercept or modify the data easily
- It is still used for non-sensitive websites where security is not a concern
- Layer 7 Protocol, meaning it resides at the top of the hierarchy of OSI Layers which is Applicaiton Layer
- Simplicity: Easy to implement and use since it does not require complex encryption mechanisms.
- Speed (without encryption): Faster in small setups because no encryption or decryption is performed.
- Compatibility: Supported by all browsers, servers, and applications without extra configuration.
- No Security: Data is sent in plain text and can be intercepted by attackers.
- No User Trust: Browsers often label HTTP websites as “Not Secure.”
- Not Suitable for Sensitive Data: Cannot be used for banking, login systems, or e-commerce where private information is exchanged.
HTTPS stands for HyperText Transfer Protocol Secure. It is an extension of HTTP with added security through encryption. If HTTP is a postcard, HTTPS is like a locked envelope. Anyone can send it, but only the person with the right key (the website’s server) can open and read it. Even if attackers intercept it, they only see scrambled content.
- HTTPS operates on port 443 by default.
- It uses SSL (Secure Sockets Layer) or TLS (Transport Layer Security) to encrypt the data.
- Even if attackers capture the traffic, they cannot read the actual message.
- Browsers show a padlock symbol next to the URL to indicate that the site is secure.
- Its Layer 4 protocol (Transpot Layer)
When the client sends the data, it will encrypted with the public key but it can only decrypted by the private key which is present in the server end.
The method for having private and public key for encryption is called Assymtric Encryption.
Working of the Connection establishment over HTTPS:
The TCP/IP model is the functional standard for how data travels across a network. Think of it as a relay race where each runner (layer) has a specific job to ensure the "baton" (your data) reaches the finish line safely.
This layer is where the actual physical movement of data happens. It converts digital bits into signals.
-
Media Types: * Copper: Electrical signals.
-
Fiber Optics: Light pulses.
-
Wireless: Radio waves.
-
Key Tasks: Handles Topologies (how devices are arranged) and Transmission Modes (Simplex, Half-Duplex, or Full-Duplex).
This layer ensures data gets from one device to the next on the same local network.
- MAC Sublayer: Performs Framing. It wraps your data in a "header" (source/destination MAC addresses) and a "trailer" (error check).
- LLC Sublayer: Manages Flow Control (preventing the receiver from being overwhelmed) and Error Control.
This layer handles routing across different networks to find the best path to a destination.
- IP (Internet Protocol): The primary protocol. It fragments large data into smaller packets and uses logical IP addresses for routing.
- ARP: Translates an IP address into a physical MAC address.
- ICMP: The "messenger" that reports errors if a packet can't reach its destination.
This layer manages end-to-end communication between the sender and the receiver.
-
TCP (Transmission Control Protocol): * Reliable: Uses sequence numbers and retransmits lost data.
-
Connection-Oriented: Establishes a "handshake" before sending data.
-
UDP (User Datagram Protocol): * Fast: No handshake or error checking.
-
Connectionless: Best for streaming or gaming where speed matters more than perfect accuracy.
This is the layer you actually see. It provides the protocols that software (like Chrome or Outlook) uses to communicate.
- Web: HTTP/HTTPS.
- Email: SMTP.
- Files: FTP.
- Translation: DNS (turns
google.cominto an IP address).
When you send an email or click a link, your data travels down the stack:
Application Layer (Data): You create the raw data (e.g., an HTTP request).
Transport Layer (Segments): TCP breaks the data into chunks called Segments and adds a port number (so the computer knows which app to give it to).
Network Layer (Packets): The segment is wrapped in a "header" containing the Source and Destination IP Addresses. It is now a Packet.
Data Link Layer (Frames): The packet is wrapped in a header containing MAC Addresses and a trailer for error checking. It is now a Frame.
Physical Layer (Bits): The frame is converted into 1s and 0s (electrical or light pulses) and sent across the wire/air.
- So the first thing that happens is that your browser looks up in its cache to see if that website was visited before and the IP address is known.
- If it can't find the IP address for the URL requested then it asks your operating system to locate the website. The first place your operating system is going to check for the address of the URL you specified is in the host file. If the URL is not found inside this file, then the OS will make a DNS request to find the IP Address of the web page.
- The first step is to ask the Resolver (or Internet Service Provider) server to look up its cache to see if it knows the IP Address, if the Resolver does not know then it asks the root server to ask the .COM TLD (Top Level Domain) server - if your URL ends in .net then the TLD server would be .NET and so on - the TLD server will again check in its cache to see if the requested IP Address is there.
- If not, then it will have at least one of the authoritative name servers associated with that URL, and after going to the Name Server, it will return the IP Address associated with your URL. All this was done in a matter of milliseconds WOW!
- After the OS has the IP Address and gives it to the browser, it then makes a GET (a type of HTTP Method) to said IP Address. When the request is made the browser again makes the request to the OS which then, in turn, packs the request in the TCP traffic protocol we discussed earlier, and it is sent to the IP Address.
- On its way, it is checked by both the OS' and the server's firewall to make sure that there are no security violations. And upon receiving the request the server (usually a load balancer that directs traffic to all available servers for that website) sends a response with the IP Address of the chosen server along with the SSL (Secure Sockets Layer) certificate to initiate a secure session (HTTPS).
- Finally, the chosen server then sends the HTML, CSS, and Javascript files (If any) back to the OS who in turn gives it to the browser to interpret it. And then you get your website as you know it.
Perfect—here’s a cleaner, simpler version you can use:
A database index is like a shortcut list.
It stores:
- the value of one or more columns (called the index key), and
- a reference to where the full row is stored in the table.
Because of that, the database doesn’t have to check every row one by one. It can quickly find the key in the index, then jump to the correct row.
Imagine this table:
users(id, email, name, created_at)
with 10 million rows.
Query:
SELECT * FROM users WHERE email = 'a@x.com';- The database scans the whole table.
- It checks row 1, row 2, row 3, … up to row 10,000,000.
- This is called a full table scan.
- As the table grows, query time usually grows too.
- The database searches the
emailindex first. - It finds
'a@x.com'quickly. - It uses the stored reference to jump to the matching row(s).
- Much less data is read, so the query is faster.
That’s the speed benefit of indexing.
The database does not physically reorder the whole table for every query. That would be too expensive and impractical.
Instead, it builds a separate structure (commonly a B-Tree) for the indexed column(s). A B-Tree keeps keys in sorted order, which makes search efficient.
If you index one column, the index usually contains only:
- that indexed column’s value, and
- a row pointer/reference.
So if you index company_id, the index stores company_id values + references.
It does not automatically store unrelated columns like units or unit_cost (unless using a special “covering index” setup).

