Data Storage Fundamentals

🎯 Difficulty Level: Easy
⏱️ Reading Time: 15 minutes
👤 Author: Rob Vet
📅 Last updated on: December 21, 2025

Behind Image

Relational vs. NoSQL vs. Raw Storage

1. Relational Databases (SQL)

A relational database stores data in tables (rows and columns). Each table represents a real-world entity—like Customers, Products, Employees, or Orders—and tables connect to each other using keys and relationships. SQL is the language used to query and join tables.

Understanding Tables and Schemas

Table Schemas

Figure 1 — Relational Tables and Keys

In the Figure 1, above, each box represents a table schema — a blueprint describing what type of data the table holds. The columns within the tables define the fields - or attributes - (firstName, price, orderDate, etc.) and their associated data types (String, Money, Date, etc.). Because every table follows this structure, relational databases stay organized and predictable.

The highlighted ID fields — customerID, productID, employeeID, and orderID — are Primary Keys, meaning they uniquely identify each row with a unique value. These keys act like a serial number for each record.

Notice how Orders table repeats the ID values from the other tables — customerID, employeeID, and productID. The repeating fields are called Foreign Keys, meaning they “point back” to the row in the table that contains their parent record. This is how the database knows which customer placed the order, which employee handled it, and which product was purchased.

Foreign Keys represent relationships between tables.

Tables remain independent, but foreign keys connect them.

Foreign Key relationships prevents duplication: customer names, product details, and employee information are stored once in their parent table and referenced wherever needed. We do not store the same information from a parent table redundantly in child tables.

Now, with an understanding of database tables and the mechanism with which keys link the entities together, what happens once tables are populated with real data — making the relationships visible at the row level.

Figure 2 below depicts tables with related data.

Populated Tables

Figure 2 — Related Data

Relational Data in Action

In Figure 2, above, each table contains sample rows with data, making the relationships much easier to understand:

Customers: Alice Johnson (customerID = 1) ties to two orders in the Orders table.
Employees: David Brown (employeeID = 1) fulfilled one of the orders.
Products: The Laptop (productID = 1) appears in an order for customer 1.

The Orders table acts as the hub linking customers, employees, and products together. Each order row pulls information from the other tables using its foreign key values.

Hopefully you see the foundation of how tables join together related data together through shared keys to answer complex queries, such as:

What did each customer buy?
Which employees handled the most orders?
What products generated the most revenue?

Benefits

Strong consistency (ACID transactions)
Structured, predictable schema
Powerful SQL querying
Great for transactional systems (orders, payments, inventory)

Drawbacks

Schema is rigid; changes require migrations
Not ideal for rapidly changing data structures
Scaling horizontally (across many database instances) can be challenging

Use When

Your data is structured and stable
Relationships matter (Orders ↔ Customers ↔ Products)
You need correctness and integrity

Avoid When

Data is unstructured or constantly changing
You need extreme horizontal scale or flexible schemas

Product Examples Across Clouds

Cloud	Relational Service
Azure	Azure SQL Database, Azure PostgreSQL, Azure MySQL
AWS	RDS (SQL Server, Postgres, MySQL), Aurora
GCP	Cloud SQL, AlloyDB

2. NoSQL Databases

Understanding How NoSQL Databases Work

“NoSQL” = Not Only SQL. These databases relax relational rules to gain flexibility, speed, or scale.

NoSQL databases store data in more flexible ways than traditional relational tables. Instead of rigid rows and columns, Tey relax relational rules to gain flexibility, speed, or scale. NoSQL databases structure data based on how it will be used—whether as documents, key–value pairs, wide-column tables, or graph relationships.

The goal is flexibility and scale: These databases enable you to store data without a fixed schema and grow horizontally across many servers.

Relational databases spread information across multiple tables, but NoSQL systems often keep related data together. For example, a document store might hold a customer, their orders, and product details in one JSON document. This removes the need for joins, speeds up reads, and aligns well with modern application data.

NoSQL Databases Types

NoSQL databases fall into a few major types, each optimized for different data shapes and use cases.

NoSQL Overview

Main NoSQL Types

1. Document Stores (e.g., MongoDB, Cosmos DB)

Document databases store data as JSON-like documents. Each document represents a real-world object and can contain nested objects and arrays.
This makes them ideal when data naturally forms a hierarchy—like a customer with their orders embedded inside.

Why they work well

The structure mirrors application objects (great for developers)
Documents can vary in shape—no required schema
Fetching one document often gives you all related data at once
Excellent for quickly evolving data models

Common uses

User profiles
Product catalogs
Content management
Event and telemetry data

2. Key–Value Stores (e.g., Redis, DynamoDB in KV mode)

A key–value store is the simplest NoSQL model:
a unique key → a blob of data.

The database doesn’t care what the value contains—JSON, a string, a number, or even a binary object. The system is optimized for extremely fast lookups by key.

Why they work well

Blazing-fast reads and writes
Perfect for caching
Ideal for “grab by ID” scenarios

Common uses

User session storage
Caching API responses
Shopping cart states
Leaderboards and real-time metrics

3. Column-Family Databases (e.g., Cassandra, HBase)

Column-family databases organize data into wide tables where each row can contain different columns. Unlike relational tables, these structures allow massive horizontal scalability and extremely fast writes.

Think of them as “big analytics tables” designed for distributed workloads.

Why they work well

Handles billions of rows across many servers
Flexible columns—each row can contain only the columns it needs
Tuned for high-throughput analytics and time-series workloads

Common uses

Event logging
IoT time-series data
Recommendation engines
Large-scale analytics where relational modeling struggles

4. Graph Databases (e.g., Neo4j, Cosmos DB Gremlin)

Graph databases represent data as nodes (entities) and edges (relationships).
This mirrors how relationships work in the real world:

A person knows another person
A user follows an artist
A product is related to another product

Instead of using joins, graph databases store these relationships directly.

Why they work well

Designed for deeply connected data
Querying relationships is extremely fast
Perfect for pattern discovery across networks

NoSql Data in Action

In the diagram below, the same customer, orders, employees, and products from the relational example are now stored inside a single NoSQL document. Instead of spreading the information across four separate tables and connecting them with primary and foreign keys, a document database keeps everything related to one customer together in a nested JSON structure.

NoSQL Overview

Note how the NoSQL document store is fundamentally different from what we just saw relational database. In the relational model, customer details, orders, employees, and products each live in their own tables. To answer even a simple question—like “What did Alice buy?”—the database has to JOIN multiple tables together. That’s powerful, but it can become slow or complex as data grows or schemas evolve.

In a NoSQL document store, all of Alice’s information is already in one place. Her orders, the employee who handled each one, the product information, and the totals are embedded directly inside her customer record. There’s no need for joins because the related data travels together.

Why this can be advantageous

Fewer joins, faster reads
Applications can retrieve everything about a customer with a single query. This is ideal for user profiles, shopping carts, product catalogs, and many app-driven scenarios.
The data shape matches the real world
Customer → Orders → Product → Employee is naturally hierarchical, and document databases store it the same way.
Flexible structure
Documents don’t require rigid schemas. If new fields appear in future orders, they can be added without altering a predefined table structure.
Optimized for modern applications
Document stores shine when data is frequently read as whole objects rather than across normalized tables.

When this style works best

Document-based NoSQL is especially useful when applications routinely need to fetch entire objects—such as a customer profile or an order history—as a single unit. It trades the strict consistency and normalization of relational databases for speed, flexibility, and developer-friendly design, making it a great fit for high-scale, rapidly evolving systems.

Benefits

Flexible schemas
Very high scalability
Optimized for specific workloads

Drawbacks

Weaker consistency depending on engine
No unified query model
Joins can be complex or unsupported

Use When

Data shape changes frequently
Scale > strict consistency
Specialized modeling needed (graph, key-value, time-series)

Avoid When

You need strong correctness
You require complex joins across entities

Product Examples Across Clouds

Cloud	NoSQL Service	Type
Azure	Cosmos DB	Document, Key-Value, Graph
AWS	DynamoDB	Key-Value / Document
GCP	Firestore / Bigtable	Document / Column-Family
Multi-cloud	MongoDB Atlas	Document

3. Raw Storage (Object Storage / Data Lakes)

Raw Storage

What It Is

Blob storage is essentially a massive, infinitely scalable file system in the cloud. Instead of storing records in tables or documents in a database, blob storage stores files as binary objects, each identified by a URL. These “blobs” can be anything—images, PDFs, videos, logs, backups, ZIP files, machine learning inputs, JSON, CSV, Parquet, or application data. Cloud providers optimize blob storage for durability and cost, making it the cheapest and most universal storage layer. It behaves like a giant content repository where applications can upload, download, or stream files on demand.

Think of Blob storage as the foundation for a Data Lake, a container of file formats that is often accessed by analytics engines or AI systems. Figure xx depicts the relationship between Blob storage and a Data Lake.

Blob and Data Lakes

Data Lake Storage adds organization, hierarchy, and analytics-friendly features. A data lake isn’t just a place to store files—it’s designed to handle extremely large volumes of raw, semi-structured, and structured data in open formats like Parquet, CSV, or JSON. Unlike blob storage—which is more like a dumping ground for files—data lake storage adds directory structures, security boundaries, and compatibility with Spark, Fabric, Databricks, Hive, Synapse, BigQuery, and other analytical engines. In a data lake, the raw files become the “source of truth” for analytics and AI, feeding pipelines that clean, transform, and refine the data into usable datasets.---

Benefits

Lowest-cost general-purpose storage
Infinite horizontal scale
Can store unstructured, semi-structured, and structured data
Ideal for AI/ML pipelines that use images, PDFs, or logs

Drawbacks

Not directly queryable without a compute layer
No transactional guarantees
Requires governance & organization strategy

Use When

You need to store files of any kind
You’re building a data lake for analytics or AI
You want long-term, cost-efficient data retention

Avoid When

You need real-time record updates
You require row-level consistency
You need relational constraints or transactional operations

Product Examples Across Clouds

Cloud	Object Store / Data Lake
Azure	ADLS Gen2, Blob Storage
AWS	S3
GCP	Google Cloud Storage

4. Visual Summary Comparison

Storage Comparison Chart

Table Summary

Feature	Relational	NoSQL	Raw Storage
Schema	Fixed	Flexible	None
Scale	Moderate	Massive	Massive
Query	SQL	APIs / Custom	Requires compute
Best For	Transactions	Web scale apps	AI/Analytics
Examples	SQL Server, Postgres	Cosmos, Dynamo	ADLS, S3

5. When to Use Each (Simple Rule of Thumb)

If the data has structure → SQL

Example: Orders, payments, users, inventory.

If scale or flexibility matters → NoSQL

Example: Profile stores, IoT, app telemetry.

If data is unstructured or for ML/AI → Raw Storage

Example: Images, logs, PDFs, clickstreams, training sets.

6. End-to-End Architecture: How They Fit Together

Modern Data Architecture Pipeline

A complete modern data system typically uses all three storage patterns, but in different layers:

Operational Systems → Relational Databases (SQL)
Order systems, transactions, customer data
Strong consistency, queries, constraints
Application Telemetry, Events, IoT → NoSQL
Flexible, scalable ingest
JSON, key-value, graph, or wide-column models
Long-term Storage, Analytics, AI → Data Lake (Object Storage)
Raw files stored cheaply (images, logs, PDFs, JSON, Parquet)

Medallion Architecture (Where Processing Happens)

Object storage (your data lake) becomes the foundation for a Medallion pipeline:

Bronze — Raw files exactly as ingested

Silver — Cleaned, structured, validated datasets

Gold — Business-ready analytics tables / features for ML

This is where Spark engines, Fabric, Databricks, Synapse, or BigQuery process and refine the data stored in the lake.