Skip to content

Data Storage Fundamentals


🎯 Difficulty Level: Easy
⏱️ Reading Time: 15 minutes
👤 Author: Rob Vet
📅 Last updated on: November 28, 2025

Behind Image

Relational vs. NoSQL vs. Raw Storage


1. Relational Databases (SQL)

A relational database stores data in tables (rows and columns). Each table represents a real-world entity—like Customers, Products, Employees, or Orders—and tables connect to each other using keys and relationships. SQL is the language used to query and join tables.


Understanding Tables and Schemas

Table Schemas

Figure 1 — Relational Tables and Keys

In the Figure 1, above, each box represents a table schema — a blueprint describing what type of data the table holds. The columns within the tables define the fields - or attributes - (firstName, price, orderDate, etc.) and their associated data types (String, Money, Date, etc.). Because every table follows this structure, relational databases stay organized and predictable.

The highlighted ID fields — customerID, productID, employeeID, and orderID — are Primary Keys, meaning they uniquely identify each row with a unique value. These keys act like a serial number for each record.

Notice how Orders table repeats the ID values from the other tables — customerID, employeeID, and productID. The repeating fields are called Foreign Keys, meaning they “point back” to the row in the table that contains their parent record. This is how the database knows which customer placed the order, which employee handled it, and which product was purchased.

Foreign Keys represent relationships between tables.

Tables remain independent, but foreign keys connect them.

Foreign Key relationships prevents duplication: customer names, product details, and employee information are stored once in their parent table and referenced wherever needed. We do not store the same information from a parent table redundantly in child tables.

Now, with an understanding of database tables and the mechanism with which keys link the entities together, what happens once tables are populated with real data — making the relationships visible at the row level.

Figure 2 below depicts tables with related data.

Populated Tables

Figure 2 — Related Data

Relational Data in Action

In Figure 2, above, each table contains sample rows with data, making the relationships much easier to understand:

  • Customers: Alice Johnson (customerID = 1) ties to two orders in the Orders table.
  • Employees: David Brown (employeeID = 1) fulfilled one of the orders.
  • Products: The Laptop (productID = 1) appears in an order for customer 1.

The Orders table acts as the hub linking customers, employees, and products together. Each order row pulls information from the other tables using its foreign key values.

Hopefully you see the foundation of how tables join together related data together through shared keys to answer complex queries, such as:

  • What did each customer buy?
  • Which employees handled the most orders?
  • What products generated the most revenue?

Benefits

  • Strong consistency (ACID transactions)
  • Structured, predictable schema
  • Powerful SQL querying
  • Great for transactional systems (orders, payments, inventory)

Drawbacks

  • Schema is rigid; changes require migrations
  • Not ideal for rapidly changing data structures
  • Scaling horizontally (across many database instances) can be challenging

Use When

  • Your data is structured and stable
  • Relationships matter (Orders ↔ Customers ↔ Products)
  • You need correctness and integrity

Avoid When

  • Data is unstructured or constantly changing
  • You need extreme horizontal scale or flexible schemas

Product Examples Across Clouds

Cloud Relational Service
Azure Azure SQL Database, Azure PostgreSQL, Azure MySQL
AWS RDS (SQL Server, Postgres, MySQL), Aurora
GCP Cloud SQL, AlloyDB

2. NoSQL Databases

Understanding How NoSQL Databases Work

“NoSQL” = Not Only SQL. These databases relax relational rules to gain flexibility, speed, or scale.

NoSQL databases store data in more flexible ways than traditional relational tables. Instead of rigid rows and columns, Tey relax relational rules to gain flexibility, speed, or scale. NoSQL databases structure data based on how it will be used—whether as documents, key–value pairs, wide-column tables, or graph relationships.

The goal is flexibility and scale: These databases enable you to store data without a fixed schema and grow horizontally across many servers.

Relational databases spread information across multiple tables, but NoSQL systems often keep related data together. For example, a document store might hold a customer, their orders, and product details in one JSON document. This removes the need for joins, speeds up reads, and aligns well with modern application data.

NoSQL Databases Types

NoSQL databases fall into a few major types, each optimized for different data shapes and use cases.

NoSQL Overview

Main NoSQL Types

1. Document Stores (e.g., MongoDB, Cosmos DB)

Document databases store data as JSON-like documents. Each document represents a real-world object and can contain nested objects and arrays.
This makes them ideal when data naturally forms a hierarchy—like a customer with their orders embedded inside.

Why they work well

  • The structure mirrors application objects (great for developers)
  • Documents can vary in shape—no required schema
  • Fetching one document often gives you all related data at once
  • Excellent for quickly evolving data models

Common uses

  • User profiles
  • Product catalogs
  • Content management
  • Event and telemetry data

2. Key–Value Stores (e.g., Redis, DynamoDB in KV mode)

A key–value store is the simplest NoSQL model:
a unique key → a blob of data.

The database doesn’t care what the value contains—JSON, a string, a number, or even a binary object. The system is optimized for extremely fast lookups by key.

Why they work well

  • Blazing-fast reads and writes
  • Perfect for caching
  • Ideal for “grab by ID” scenarios

Common uses

  • User session storage
  • Caching API responses
  • Shopping cart states
  • Leaderboards and real-time metrics

3. Column-Family Databases (e.g., Cassandra, HBase)

Column-family databases organize data into wide tables where each row can contain different columns. Unlike relational tables, these structures allow massive horizontal scalability and extremely fast writes.

Think of them as “big analytics tables” designed for distributed workloads.

Why they work well

  • Handles billions of rows across many servers
  • Flexible columns—each row can contain only the columns it needs
  • Tuned for high-throughput analytics and time-series workloads

Common uses

  • Event logging
  • IoT time-series data
  • Recommendation engines
  • Large-scale analytics where relational modeling struggles

4. Graph Databases (e.g., Neo4j, Cosmos DB Gremlin)

Graph databases represent data as nodes (entities) and edges (relationships).
This mirrors how relationships work in the real world:

  • A person knows another person
  • A user follows an artist
  • A product is related to another product

Instead of using joins, graph databases store these relationships directly.

Why they work well

  • Designed for deeply connected data
  • Querying relationships is extremely fast
  • Perfect for pattern discovery across networks

NoSql Data in Action

In the diagram below, the same customer, orders, employees, and products from the relational example are now stored inside a single NoSQL document. Instead of spreading the information across four separate tables and connecting them with primary and foreign keys, a document database keeps everything related to one customer together in a nested JSON structure.

NoSQL Overview

Note how the NoSQL document store is fundamentally different from what we just saw relational database. In the relational model, customer details, orders, employees, and products each live in their own tables. To answer even a simple question—like “What did Alice buy?”—the database has to JOIN multiple tables together. That’s powerful, but it can become slow or complex as data grows or schemas evolve.

In a NoSQL document store, all of Alice’s information is already in one place. Her orders, the employee who handled each one, the product information, and the totals are embedded directly inside her customer record. There’s no need for joins because the related data travels together.

Why this can be advantageous

  • Fewer joins, faster reads
    Applications can retrieve everything about a customer with a single query. This is ideal for user profiles, shopping carts, product catalogs, and many app-driven scenarios.

  • The data shape matches the real world
    Customer → Orders → Product → Employee is naturally hierarchical, and document databases store it the same way.

  • Flexible structure
    Documents don’t require rigid schemas. If new fields appear in future orders, they can be added without altering a predefined table structure.

  • Optimized for modern applications
    Document stores shine when data is frequently read as whole objects rather than across normalized tables.

When this style works best

Document-based NoSQL is especially useful when applications routinely need to fetch entire objects—such as a customer profile or an order history—as a single unit. It trades the strict consistency and normalization of relational databases for speed, flexibility, and developer-friendly design, making it a great fit for high-scale, rapidly evolving systems.


Benefits

  • Flexible schemas
  • Very high scalability
  • Optimized for specific workloads

Drawbacks

  • Weaker consistency depending on engine
  • No unified query model
  • Joins can be complex or unsupported

Use When

  • Data shape changes frequently
  • Scale > strict consistency
  • Specialized modeling needed (graph, key-value, time-series)

Avoid When

  • You need strong correctness
  • You require complex joins across entities

Product Examples Across Clouds

Cloud NoSQL Service Type
Azure Cosmos DB Document, Key-Value, Graph
AWS DynamoDB Key-Value / Document
GCP Firestore / Bigtable Document / Column-Family
Multi-cloud MongoDB Atlas Document

3. Raw Storage (Object Storage / Data Lakes)

Raw Storage

What It Is

Blob storage is essentially a massive, infinitely scalable file system in the cloud. Instead of storing records in tables or documents in a database, blob storage stores files as binary objects, each identified by a URL. These “blobs” can be anything—images, PDFs, videos, logs, backups, ZIP files, machine learning inputs, JSON, CSV, Parquet, or application data. Cloud providers optimize blob storage for durability and cost, making it the cheapest and most universal storage layer. It behaves like a giant content repository where applications can upload, download, or stream files on demand.

Think of Blob storage as the foundation for a Data Lake, a container of file formats that is often accessed by analytics engines or AI systems. Figure xx depicts the relationship between Blob storage and a Data Lake.

Blob and Data Lakes

Data Lake Storage adds organization, hierarchy, and analytics-friendly features. A data lake isn’t just a place to store files—it’s designed to handle extremely large volumes of raw, semi-structured, and structured data in open formats like Parquet, CSV, or JSON. Unlike blob storage—which is more like a dumping ground for files—data lake storage adds directory structures, security boundaries, and compatibility with Spark, Fabric, Databricks, Hive, Synapse, BigQuery, and other analytical engines. In a data lake, the raw files become the “source of truth” for analytics and AI, feeding pipelines that clean, transform, and refine the data into usable datasets.---


Benefits

  • Lowest-cost general-purpose storage
  • Infinite horizontal scale
  • Can store unstructured, semi-structured, and structured data
  • Ideal for AI/ML pipelines that use images, PDFs, or logs

Drawbacks

  • Not directly queryable without a compute layer
  • No transactional guarantees
  • Requires governance & organization strategy

Use When

  • You need to store files of any kind
  • You’re building a data lake for analytics or AI
  • You want long-term, cost-efficient data retention

Avoid When

  • You need real-time record updates
  • You require row-level consistency
  • You need relational constraints or transactional operations

Product Examples Across Clouds

Cloud Object Store / Data Lake
Azure ADLS Gen2, Blob Storage
AWS S3
GCP Google Cloud Storage

4. Visual Summary Comparison

Storage Comparison Chart

Table Summary

Feature Relational NoSQL Raw Storage
Schema Fixed Flexible None
Scale Moderate Massive Massive
Query SQL APIs / Custom Requires compute
Best For Transactions Web scale apps AI/Analytics
Examples SQL Server, Postgres Cosmos, Dynamo ADLS, S3

5. When to Use Each (Simple Rule of Thumb)

If the data has structure → SQL

Example: Orders, payments, users, inventory.

If scale or flexibility matters → NoSQL

Example: Profile stores, IoT, app telemetry.

If data is unstructured or for ML/AI → Raw Storage

Example: Images, logs, PDFs, clickstreams, training sets.


6. End-to-End Architecture: How They Fit Together

Modern Data Architecture Pipeline

Modern Data Architecture Pipeline

A complete modern data system typically uses all three storage patterns, but in different layers:

  1. Operational Systems → Relational Databases (SQL)
  2. Order systems, transactions, customer data
  3. Strong consistency, queries, constraints

  4. Application Telemetry, Events, IoT → NoSQL

  5. Flexible, scalable ingest
  6. JSON, key-value, graph, or wide-column models

  7. Long-term Storage, Analytics, AI → Data Lake (Object Storage)

  8. Raw files stored cheaply (images, logs, PDFs, JSON, Parquet)

Medallion Architecture (Where Processing Happens)

Object storage (your data lake) becomes the foundation for a Medallion pipeline:

Bronze — Raw files exactly as ingested

Silver — Cleaned, structured, validated datasets

Gold — Business-ready analytics tables / features for ML

This is where Spark engines, Fabric, Databricks, Synapse, or BigQuery process and refine the data stored in the lake.