SQL with APIs and JSON Fields in PostgreSQL: Working with Semi-Structured Data


Table of Contents

  1. Introduction
  2. Why JSON Support Matters in Modern SQL
  3. PostgreSQL JSON vs JSONB
  4. Creating Tables with JSON Fields
  5. Inserting JSON Data (from APIs or files)
  6. Accessing JSON Properties in PostgreSQL
  7. Filtering Based on JSON Content
  8. Updating and Modifying JSON Fields
  9. Using JSON Functions: jsonb_set, jsonb_extract_path, jsonb_array_elements
  10. Flattening JSON Arrays
  11. Indexing JSONB for Performance
  12. Parsing API Responses in PostgreSQL
  13. Building a Table to Store API Logs
  14. Practical Example: GitHub API Data in JSON
  15. Joining JSON with Relational Tables
  16. Aggregating Inside JSON Arrays
  17. Validating and Cleaning JSON Payloads
  18. Common Pitfalls and Optimization Tips
  19. JSON Utility Functions and Extensions
  20. Summary and What’s Next

1. Introduction

Modern applications often rely on API responses and semi-structured data, especially in JSON format. PostgreSQL stands out among SQL databases for its robust native JSON support, making it ideal for ingesting and analyzing data directly from APIs.


2. Why JSON Support Matters in Modern SQL

  • APIs return nested JSON by default
  • Avoid premature normalization
  • Store entire documents or responses in a single field
  • Perform flexible querying without rigid schema changes

3. PostgreSQL JSON vs JSONB

FeatureJSONJSONB
StorageTextual formatBinary format
IndexableNoYes (GIN, expression indexes)
Order PreservedYesNo
PerformanceSlower for lookupFaster for search/filtering

Recommendation: Use JSONB for analytics and indexing.


4. Creating Tables with JSON Fields

CREATE TABLE api_logs (
id SERIAL PRIMARY KEY,
source TEXT,
payload JSONB,
fetched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

This allows you to store entire API payloads for later parsing and analysis.


5. Inserting JSON Data (from APIs or files)

INSERT INTO api_logs (source, payload)
VALUES (
'github',
'{
"login": "prafull",
"id": 12345,
"repos": [
{"name": "syskool", "stars": 150},
{"name": "xeblabs", "stars": 80}
]
}'
);

Use your backend or ETL pipeline to automate this step.


6. Accessing JSON Properties in PostgreSQL

SELECT
payload->>'login' AS username,
(payload->>'id')::INT AS user_id
FROM api_logs
WHERE source = 'github';
  • -> returns JSON object
  • ->> returns text (castable to scalar types)

7. Filtering Based on JSON Content

SELECT *
FROM api_logs
WHERE payload->>'login' = 'prafull';

With nested data:

WHERE payload->'repos' @> '[{"name": "syskool"}]'

8. Updating and Modifying JSON Fields

UPDATE api_logs
SET payload = jsonb_set(payload, '{login}', '"kumar_prafull"')
WHERE id = 1;
  • First param: JSON field
  • Second: path as array
  • Third: new value (must be JSON-encoded)

9. Using JSON Functions

  • jsonb_set: Update a specific key
  • jsonb_extract_path: Navigate nested levels
  • jsonb_array_elements: Unpack arrays into rows
  • jsonb_build_object: Create JSON in SQL
  • jsonb_object_keys: List keys in an object

Example:

SELECT key, value
FROM api_logs,
jsonb_each_text(payload)
WHERE id = 1;

10. Flattening JSON Arrays

SELECT
payload->>'login' AS user,
repo_elem->>'name' AS repo_name,
(repo_elem->>'stars')::INT AS stars
FROM api_logs,
jsonb_array_elements(payload->'repos') AS repo_elem
WHERE source = 'github';

Each array element becomes its own row — useful for reporting.


11. Indexing JSONB for Performance

GIN Index:

CREATE INDEX idx_jsonb_payload ON api_logs USING GIN (payload jsonb_path_ops);

Expression Index:

CREATE INDEX idx_username ON api_logs ((payload->>'login'));

Improves filter and search performance.


12. Parsing API Responses in PostgreSQL

If API is fetched in ETL, store raw JSON, then parse downstream:

-- Extract nested country code
SELECT payload->'address'->>'country_code' AS country
FROM api_logs;

Use jsonb_populate_recordset() for bulk parsing if schema is known.


13. Building a Table to Store API Logs

CREATE TABLE external_api_responses (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
endpoint TEXT,
response JSONB,
status_code INT,
requested_at TIMESTAMP DEFAULT NOW()
);

Track performance, failures, and response sizes.


14. Practical Example: GitHub API Data in JSON

-- Assume payload from GitHub API
SELECT
payload->>'login' AS username,
jsonb_array_length(payload->'repos') AS repo_count
FROM api_logs
WHERE source = 'github';

15. Joining JSON with Relational Tables

-- users table has user_id
-- api_logs.payload contains matching GitHub user_id

SELECT u.name, a.payload->>'login' AS github_login
FROM users u
JOIN api_logs a ON u.external_id = (a.payload->>'id')::INT;

16. Aggregating Inside JSON Arrays

SELECT
payload->>'login' AS user,
SUM((repo_elem->>'stars')::INT) AS total_stars
FROM api_logs,
jsonb_array_elements(payload->'repos') AS repo_elem
GROUP BY user;

Aggregate metrics from nested structures.


17. Validating and Cleaning JSON Payloads

Check for structure:

SELECT * FROM api_logs
WHERE jsonb_typeof(payload->'repos') != 'array';

Avoid invalid assumptions — always validate before parsing.


18. Common Pitfalls and Optimization Tips

  • Use jsonb, not json, for faster querying
  • Always wrap JSON string literals in double quotes
  • Avoid storing massive payloads with no access plans
  • Index only fields used in filtering
  • Be careful with null vs missing key comparisons

19. JSON Utility Functions and Extensions

  • jsonb_pretty(payload) – formatted view
  • jsonb_strip_nulls(payload) – clean nulls
  • jsonb_build_object() – construct JSON in SQL
  • Extensions: pg_partman, pg_cron, plv8 (JS for PostgreSQL)

20. Summary and What’s Next

PostgreSQL provides powerful JSON capabilities that make it an excellent tool for storing and analyzing API data and other semi-structured documents. From storing entire payloads to filtering and joining on nested attributes, you can use pure SQL to explore flexible data structures.