Storage concepts and considerations in System Design

Photo by Joshua Sortino on Unsplash

Table of Contents

  1. Disk — RAID and Volume
  2. File Storage, Block Storage, and Object Storage
  3. Hadoop Distributed File System (HDFS)
  4. Storage comparisons
  5. Choose the right datastore
  6. Storage options in the Cloud

1. Disk — RAID and Volume

1.1 RAID

RAID (Redundant Array of Inexpensive Disks or Redundant Array of Independent Disks) is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both.

The standard RAID levels comprise a basic set of RAID configurations that employ the techniques of striping, mirroring, or parity to create large reliable data stores from multiple general-purpose computer hard…

Everything changes and nothing stands still

Photo by Mika Baumeister on Unsplash


Old and new versions of the code, and old and new data formats, may potentially all coexist in the system at the same time. In order for the system to continue running smoothly, we need to maintain compatibility in both directions:

  • Backward compatibility: Newer code can read data that was written by older code.
  • Forward compatibility: Older code can read data that was written by newer code.

In this chapter, we will look at several formats for encoding data, including JSON, XML, Protocol Buffers, Thrift, and Avro. In particular, we will look at how they handle schema changes and how…

All about reading and writing data

Photo by Markus Winkler on Unsplash


This chapter talks about how we can store the data that we’re given, and how we can find it again when we’re asked for it.

There is a big difference between storage engines that are optimized for transactional workloads and those that are optimized for analytics.

We will talk about two families of storage engines: log-structured storage engines, and page-oriented storage engines such as B-trees.

Data Structures That Power Your Database

We can store data in a log and many databases internally use a log, which is an append-only data file. …

Data models are perhaps the most important part of developing software

Photo by Pietro Jeng on Unsplash


Most applications are built by layering one data model on top of another. Each layer hides the complexity of the layers below it by providing a clean data model. These abstractions allow different groups of people — for example, the engineers at the database vendor and the application developers using their database — to work together effectively.

Relational Model Versus Document Model

The best-known data model today is probably that of SQL, based on the relational model: data is organized into relations (called tables in SQL), where each relation is an unordered collection of tuples (rows in SQL).

The birth of NoSQL

The dominance of relational databases has lasted…

Welcome to the gate of becoming a system design expert!

Photo by Anne Nygård on Unsplash


The main functionality of a standard system:

  • Store data so that they, or another application, can find it again later (databases)
  • Remember the result of an expensive operation, to speed up reads (caches)
  • Allow users to search data by keyword or filter it in various ways (search indexes)
  • Send a message to another process, to be handled asynchronously (stream processing)
  • Periodically crunch a large amount of accumulated data (batch processing)

There are datastores that are also used as message queues (Redis), and there are message queues with database-like durability guarantees (Apache Kafka).

If you have an application-managed caching layer (using…

Using Chef to provision Linux and Windows servers in VMware Sphere

Photo by Science in HD on Unsplash


Server provisioning is the process of setting up a server to be used in a network based on the required resources. Provisioning can encompass all of the operations needed to create a new machine and bring it to a working state and includes defining the desired state of the system. Provisioning servers in a fast manner can dramatically improve the efficiency of an organization, especially for organizations that still heavily rely on manual processes.

This article highlights the detailed processes and tricks that can be used to provision servers (Virtual Machines in VMware Sphere). …

Best practices for developing, testing, and releasing profiles, cookbooks, and policies

Photo by isword on Unsplash


  1. Write InSpec Profiles once and use them everywhere
  2. Test-Driven Development using InSpec Profiles
  3. Test locally and build/release using CI/CD pipelines
  4. Use chef_cookbook_generator to generate the cookbooks and policies
  5. Use data bags even with kitchen test
  6. Use Azure DevOps as repositories and CI/CD pipelines
  7. Use kitchen-azurerm as the test driver
  8. Use Artifactory to store cookbooks and binaries
  9. Use Audit cookbook in Policies


Unlike other types of development, developing, testing, and releasing Chef stuff can be difficult and challenging. I am gonna walk you through how I minimize the work and maximize the efficiency of the development and deployment. …

A Comprehensive explanation on “solving the finding the longest xyz” coding problem with examples

Photo by timJ on Unsplash

When we practice solving coding problems, everyone may face problems that ask us to find the longest *** for the given input and logic, many know there are techniques to solve it, after solving quite a few them, I’ like to share what I have learned.

General Tips and Techniques

Generally, we can use the sliding window, two-pointer, and dynamic programming techniques to solve the problems.

The result can be calculated based on two cases.

  1. If it is the length of consecutive elements (substring)
  2. If it is the length of non-consecutive (may be consecutive as well) (subsequence)

1. If it is the length of…

A Comprehensive Explanation With Bit Manipulation Examples

Photo by Hope House Press on Unsplash

1. Bitwise Operators

1.1 Basic Operations

  1. AND ( & ): If both bits in the compared position of the bit patterns are 1, the bit in the resulting bit pattern is 1, otherwise 0. e.g. 5 (101) & 3 (011) = 1 (001).
  2. OR ( | ): If both bits in the compared position of the bit patterns are 0, the bit in the resulting bit pattern is 0, otherwise 1. e.g. 5 (101) | 3 (011) = 7 (111).
  3. XOR ( ^ ): If both bits in the compared position of the bit patterns are 0 or 1, the bit in the resulting bit pattern…

Concepts and considerations for some topics in System Design

Table of Contents

  1. Concurrency
  2. Networking
  3. Abstraction
  4. Read-world performance and estimation
  5. Map-reduce
  6. Hadoop and Spark

1. Concurrency

1.1 Threads and processes

  • A thread is a basic unit of CPU utilization. It is also referred to as a “lightweight process”.
  • A process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system, a process may be made up of multiple threads of execution that execute instructions concurrently.
  • CPU is not required when a process is handling the IO request.

1.2 Concurrency and parallelism

  • Concurrency means executing multiple tasks at the same time but not necessarily simultaneously. In a single-core environment…

Larry | Peng Yang

Software Engineer in Tokyo. Aim to understand computer science very well. LinkedIn:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store