Data Storage Fundamentals
🎯 Difficulty Level: Easy
⏱️ Reading Time: 15 minutes
👤 Author: Rob Vet
📅 Last updated on: November 28, 2025

Relational vs. NoSQL vs. Raw Storage
1. Relational Databases (SQL)
A relational database stores data in tables (rows and columns). Each table represents a real-world entity—like Customers, Products, Employees, or Orders—and tables connect to each other using keys and relationships. SQL is the language used to query and join tables.
Understanding Tables and Schemas

In the Figure 1, above, each box represents a table schema — a blueprint describing what type of data the table holds. The columns within the tables define the fields - or attributes - (firstName, price, orderDate, etc.) and their associated data types (String, Money, Date, etc.). Because every table follows this structure, relational databases stay organized and predictable.
The highlighted ID fields — customerID, productID, employeeID, and orderID — are Primary Keys, meaning they uniquely identify each row with a unique value. These keys act like a serial number for each record.
Notice how Orders table repeats the ID values from the other tables — customerID, employeeID, and productID. The repeating fields are called Foreign Keys, meaning they “point back” to the row in the table that contains their parent record. This is how the database knows which customer placed the order, which employee handled it, and which product was purchased.
Foreign Keys represent relationships between tables.
Tables remain independent, but foreign keys connect them.
Foreign Key relationships prevents duplication: customer names, product details, and employee information are stored once in their parent table and referenced wherever needed. We do not store the same information from a parent table redundantly in child tables.
Now, with an understanding of database tables and the mechanism with which keys link the entities together, what happens once tables are populated with real data — making the relationships visible at the row level.
Figure 2 below depicts tables with related data.

Relational Data in Action
In Figure 2, above, each table contains sample rows with data, making the relationships much easier to understand:
- Customers: Alice Johnson (
customerID = 1) ties to two orders in the Orders table. - Employees: David Brown (
employeeID = 1) fulfilled one of the orders. - Products: The Laptop (
productID = 1) appears in an order for customer 1.
The Orders table acts as the hub linking customers, employees, and products together. Each order row pulls information from the other tables using its foreign key values.
Hopefully you see the foundation of how tables join together related data together through shared keys to answer complex queries, such as:
- What did each customer buy?
- Which employees handled the most orders?
- What products generated the most revenue?
Benefits
- Strong consistency (ACID transactions)
- Structured, predictable schema
- Powerful SQL querying
- Great for transactional systems (orders, payments, inventory)
Drawbacks
- Schema is rigid; changes require migrations
- Not ideal for rapidly changing data structures
- Scaling horizontally (across many database instances) can be challenging
Use When
- Your data is structured and stable
- Relationships matter (Orders ↔ Customers ↔ Products)
- You need correctness and integrity
Avoid When
- Data is unstructured or constantly changing
- You need extreme horizontal scale or flexible schemas
Product Examples Across Clouds
| Cloud | Relational Service |
|---|---|
| Azure | Azure SQL Database, Azure PostgreSQL, Azure MySQL |
| AWS | RDS (SQL Server, Postgres, MySQL), Aurora |
| GCP | Cloud SQL, AlloyDB |
2. NoSQL Databases
Understanding How NoSQL Databases Work
“NoSQL” = Not Only SQL. These databases relax relational rules to gain flexibility, speed, or scale.
NoSQL databases store data in more flexible ways than traditional relational tables. Instead of rigid rows and columns, Tey relax relational rules to gain flexibility, speed, or scale. NoSQL databases structure data based on how it will be used—whether as documents, key–value pairs, wide-column tables, or graph relationships.
The goal is flexibility and scale: These databases enable you to store data without a fixed schema and grow horizontally across many servers.
Relational databases spread information across multiple tables, but NoSQL systems often keep related data together. For example, a document store might hold a customer, their orders, and product details in one JSON document. This removes the need for joins, speeds up reads, and aligns well with modern application data.
NoSQL Databases Types
NoSQL databases fall into a few major types, each optimized for different data shapes and use cases.

Main NoSQL Types
1. Document Stores (e.g., MongoDB, Cosmos DB)
Document databases store data as JSON-like documents. Each document represents a real-world object and can contain nested objects and arrays.
This makes them ideal when data naturally forms a hierarchy—like a customer with their orders embedded inside.
Why they work well
- The structure mirrors application objects (great for developers)
- Documents can vary in shape—no required schema
- Fetching one document often gives you all related data at once
- Excellent for quickly evolving data models
Common uses
- User profiles
- Product catalogs
- Content management
- Event and telemetry data
2. Key–Value Stores (e.g., Redis, DynamoDB in KV mode)
A key–value store is the simplest NoSQL model:
a unique key → a blob of data.
The database doesn’t care what the value contains—JSON, a string, a number, or even a binary object. The system is optimized for extremely fast lookups by key.
Why they work well
- Blazing-fast reads and writes
- Perfect for caching
- Ideal for “grab by ID” scenarios
Common uses
- User session storage
- Caching API responses
- Shopping cart states
- Leaderboards and real-time metrics
3. Column-Family Databases (e.g., Cassandra, HBase)
Column-family databases organize data into wide tables where each row can contain different columns. Unlike relational tables, these structures allow massive horizontal scalability and extremely fast writes.
Think of them as “big analytics tables” designed for distributed workloads.
Why they work well
- Handles billions of rows across many servers
- Flexible columns—each row can contain only the columns it needs
- Tuned for high-throughput analytics and time-series workloads
Common uses
- Event logging
- IoT time-series data
- Recommendation engines
- Large-scale analytics where relational modeling struggles
4. Graph Databases (e.g., Neo4j, Cosmos DB Gremlin)
Graph databases represent data as nodes (entities) and edges (relationships).
This mirrors how relationships work in the real world:
- A person knows another person
- A user follows an artist
- A product is related to another product
Instead of using joins, graph databases store these relationships directly.
Why they work well
- Designed for deeply connected data
- Querying relationships is extremely fast
- Perfect for pattern discovery across networks
NoSql Data in Action
In the diagram below, the same customer, orders, employees, and products from the relational example are now stored inside a single NoSQL document. Instead of spreading the information across four separate tables and connecting them with primary and foreign keys, a document database keeps everything related to one customer together in a nested JSON structure.

Note how the NoSQL document store is fundamentally different from what we just saw relational database. In the relational model, customer details, orders, employees, and products each live in their own tables. To answer even a simple question—like “What did Alice buy?”—the database has to JOIN multiple tables together. That’s powerful, but it can become slow or complex as data grows or schemas evolve.
In a NoSQL document store, all of Alice’s information is already in one place. Her orders, the employee who handled each one, the product information, and the totals are embedded directly inside her customer record. There’s no need for joins because the related data travels together.
Why this can be advantageous
-
Fewer joins, faster reads
Applications can retrieve everything about a customer with a single query. This is ideal for user profiles, shopping carts, product catalogs, and many app-driven scenarios. -
The data shape matches the real world
Customer → Orders → Product → Employee is naturally hierarchical, and document databases store it the same way. -
Flexible structure
Documents don’t require rigid schemas. If new fields appear in future orders, they can be added without altering a predefined table structure. -
Optimized for modern applications
Document stores shine when data is frequently read as whole objects rather than across normalized tables.
When this style works best
Document-based NoSQL is especially useful when applications routinely need to fetch entire objects—such as a customer profile or an order history—as a single unit. It trades the strict consistency and normalization of relational databases for speed, flexibility, and developer-friendly design, making it a great fit for high-scale, rapidly evolving systems.
Benefits
- Flexible schemas
- Very high scalability
- Optimized for specific workloads
Drawbacks
- Weaker consistency depending on engine
- No unified query model
- Joins can be complex or unsupported
Use When
- Data shape changes frequently
- Scale > strict consistency
- Specialized modeling needed (graph, key-value, time-series)
Avoid When
- You need strong correctness
- You require complex joins across entities
Product Examples Across Clouds
| Cloud | NoSQL Service | Type |
|---|---|---|
| Azure | Cosmos DB | Document, Key-Value, Graph |
| AWS | DynamoDB | Key-Value / Document |
| GCP | Firestore / Bigtable | Document / Column-Family |
| Multi-cloud | MongoDB Atlas | Document |
3. Raw Storage (Object Storage / Data Lakes)

What It Is
Blob storage is essentially a massive, infinitely scalable file system in the cloud. Instead of storing records in tables or documents in a database, blob storage stores files as binary objects, each identified by a URL. These “blobs” can be anything—images, PDFs, videos, logs, backups, ZIP files, machine learning inputs, JSON, CSV, Parquet, or application data. Cloud providers optimize blob storage for durability and cost, making it the cheapest and most universal storage layer. It behaves like a giant content repository where applications can upload, download, or stream files on demand.
Think of Blob storage as the foundation for a Data Lake, a container of file formats that is often accessed by analytics engines or AI systems. Figure xx depicts the relationship between Blob storage and a Data Lake.

Data Lake Storage adds organization, hierarchy, and analytics-friendly features. A data lake isn’t just a place to store files—it’s designed to handle extremely large volumes of raw, semi-structured, and structured data in open formats like Parquet, CSV, or JSON. Unlike blob storage—which is more like a dumping ground for files—data lake storage adds directory structures, security boundaries, and compatibility with Spark, Fabric, Databricks, Hive, Synapse, BigQuery, and other analytical engines. In a data lake, the raw files become the “source of truth” for analytics and AI, feeding pipelines that clean, transform, and refine the data into usable datasets.---
Benefits
- Lowest-cost general-purpose storage
- Infinite horizontal scale
- Can store unstructured, semi-structured, and structured data
- Ideal for AI/ML pipelines that use images, PDFs, or logs
Drawbacks
- Not directly queryable without a compute layer
- No transactional guarantees
- Requires governance & organization strategy
Use When
- You need to store files of any kind
- You’re building a data lake for analytics or AI
- You want long-term, cost-efficient data retention
Avoid When
- You need real-time record updates
- You require row-level consistency
- You need relational constraints or transactional operations
Product Examples Across Clouds
| Cloud | Object Store / Data Lake |
|---|---|
| Azure | ADLS Gen2, Blob Storage |
| AWS | S3 |
| GCP | Google Cloud Storage |
4. Visual Summary Comparison

Table Summary
| Feature | Relational | NoSQL | Raw Storage |
|---|---|---|---|
| Schema | Fixed | Flexible | None |
| Scale | Moderate | Massive | Massive |
| Query | SQL | APIs / Custom | Requires compute |
| Best For | Transactions | Web scale apps | AI/Analytics |
| Examples | SQL Server, Postgres | Cosmos, Dynamo | ADLS, S3 |
5. When to Use Each (Simple Rule of Thumb)
If the data has structure → SQL
Example: Orders, payments, users, inventory.
If scale or flexibility matters → NoSQL
Example: Profile stores, IoT, app telemetry.
If data is unstructured or for ML/AI → Raw Storage
Example: Images, logs, PDFs, clickstreams, training sets.
6. End-to-End Architecture: How They Fit Together


A complete modern data system typically uses all three storage patterns, but in different layers:
- Operational Systems → Relational Databases (SQL)
- Order systems, transactions, customer data
-
Strong consistency, queries, constraints
-
Application Telemetry, Events, IoT → NoSQL
- Flexible, scalable ingest
-
JSON, key-value, graph, or wide-column models
-
Long-term Storage, Analytics, AI → Data Lake (Object Storage)
- Raw files stored cheaply (images, logs, PDFs, JSON, Parquet)
Medallion Architecture (Where Processing Happens)
Object storage (your data lake) becomes the foundation for a Medallion pipeline:
Bronze — Raw files exactly as ingested
Silver — Cleaned, structured, validated datasets
Gold — Business-ready analytics tables / features for ML
This is where Spark engines, Fabric, Databricks, Synapse, or BigQuery process and refine the data stored in the lake.