Table of Contents
- Introduction
- Why JSON Support Matters in Modern SQL
- PostgreSQL JSON vs JSONB
- Creating Tables with JSON Fields
- Inserting JSON Data (from APIs or files)
- Accessing JSON Properties in PostgreSQL
- Filtering Based on JSON Content
- Updating and Modifying JSON Fields
- Using JSON Functions:
jsonb_set
,jsonb_extract_path
,jsonb_array_elements
- Flattening JSON Arrays
- Indexing JSONB for Performance
- Parsing API Responses in PostgreSQL
- Building a Table to Store API Logs
- Practical Example: GitHub API Data in JSON
- Joining JSON with Relational Tables
- Aggregating Inside JSON Arrays
- Validating and Cleaning JSON Payloads
- Common Pitfalls and Optimization Tips
- JSON Utility Functions and Extensions
- Summary and What’s Next
1. Introduction
Modern applications often rely on API responses and semi-structured data, especially in JSON format. PostgreSQL stands out among SQL databases for its robust native JSON support, making it ideal for ingesting and analyzing data directly from APIs.
2. Why JSON Support Matters in Modern SQL
- APIs return nested JSON by default
- Avoid premature normalization
- Store entire documents or responses in a single field
- Perform flexible querying without rigid schema changes
3. PostgreSQL JSON vs JSONB
Feature | JSON | JSONB |
---|---|---|
Storage | Textual format | Binary format |
Indexable | No | Yes (GIN, expression indexes) |
Order Preserved | Yes | No |
Performance | Slower for lookup | Faster for search/filtering |
Recommendation: Use JSONB
for analytics and indexing.
4. Creating Tables with JSON Fields
CREATE TABLE api_logs (
id SERIAL PRIMARY KEY,
source TEXT,
payload JSONB,
fetched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
This allows you to store entire API payloads for later parsing and analysis.
5. Inserting JSON Data (from APIs or files)
INSERT INTO api_logs (source, payload)
VALUES (
'github',
'{
"login": "prafull",
"id": 12345,
"repos": [
{"name": "syskool", "stars": 150},
{"name": "xeblabs", "stars": 80}
]
}'
);
Use your backend or ETL pipeline to automate this step.
6. Accessing JSON Properties in PostgreSQL
SELECT
payload->>'login' AS username,
(payload->>'id')::INT AS user_id
FROM api_logs
WHERE source = 'github';
->
returns JSON object->>
returns text (castable to scalar types)
7. Filtering Based on JSON Content
SELECT *
FROM api_logs
WHERE payload->>'login' = 'prafull';
With nested data:
WHERE payload->'repos' @> '[{"name": "syskool"}]'
8. Updating and Modifying JSON Fields
UPDATE api_logs
SET payload = jsonb_set(payload, '{login}', '"kumar_prafull"')
WHERE id = 1;
- First param: JSON field
- Second: path as array
- Third: new value (must be JSON-encoded)
9. Using JSON Functions
jsonb_set
: Update a specific keyjsonb_extract_path
: Navigate nested levelsjsonb_array_elements
: Unpack arrays into rowsjsonb_build_object
: Create JSON in SQLjsonb_object_keys
: List keys in an object
Example:
SELECT key, value
FROM api_logs,
jsonb_each_text(payload)
WHERE id = 1;
10. Flattening JSON Arrays
SELECT
payload->>'login' AS user,
repo_elem->>'name' AS repo_name,
(repo_elem->>'stars')::INT AS stars
FROM api_logs,
jsonb_array_elements(payload->'repos') AS repo_elem
WHERE source = 'github';
Each array element becomes its own row — useful for reporting.
11. Indexing JSONB for Performance
GIN Index:
CREATE INDEX idx_jsonb_payload ON api_logs USING GIN (payload jsonb_path_ops);
Expression Index:
CREATE INDEX idx_username ON api_logs ((payload->>'login'));
Improves filter and search performance.
12. Parsing API Responses in PostgreSQL
If API is fetched in ETL, store raw JSON, then parse downstream:
-- Extract nested country code
SELECT payload->'address'->>'country_code' AS country
FROM api_logs;
Use jsonb_populate_recordset()
for bulk parsing if schema is known.
13. Building a Table to Store API Logs
CREATE TABLE external_api_responses (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
endpoint TEXT,
response JSONB,
status_code INT,
requested_at TIMESTAMP DEFAULT NOW()
);
Track performance, failures, and response sizes.
14. Practical Example: GitHub API Data in JSON
-- Assume payload from GitHub API
SELECT
payload->>'login' AS username,
jsonb_array_length(payload->'repos') AS repo_count
FROM api_logs
WHERE source = 'github';
15. Joining JSON with Relational Tables
-- users table has user_id
-- api_logs.payload contains matching GitHub user_id
SELECT u.name, a.payload->>'login' AS github_login
FROM users u
JOIN api_logs a ON u.external_id = (a.payload->>'id')::INT;
16. Aggregating Inside JSON Arrays
SELECT
payload->>'login' AS user,
SUM((repo_elem->>'stars')::INT) AS total_stars
FROM api_logs,
jsonb_array_elements(payload->'repos') AS repo_elem
GROUP BY user;
Aggregate metrics from nested structures.
17. Validating and Cleaning JSON Payloads
Check for structure:
SELECT * FROM api_logs
WHERE jsonb_typeof(payload->'repos') != 'array';
Avoid invalid assumptions — always validate before parsing.
18. Common Pitfalls and Optimization Tips
- Use
jsonb
, notjson
, for faster querying - Always wrap JSON string literals in double quotes
- Avoid storing massive payloads with no access plans
- Index only fields used in filtering
- Be careful with null vs missing key comparisons
19. JSON Utility Functions and Extensions
jsonb_pretty(payload)
– formatted viewjsonb_strip_nulls(payload)
– clean nullsjsonb_build_object()
– construct JSON in SQL- Extensions:
pg_partman
,pg_cron
,plv8
(JS for PostgreSQL)
20. Summary and What’s Next
PostgreSQL provides powerful JSON capabilities that make it an excellent tool for storing and analyzing API data and other semi-structured documents. From storing entire payloads to filtering and joining on nested attributes, you can use pure SQL to explore flexible data structures.