๐งฉ Modern Database Concepts: ACID, BASE, CAP & Sharding
Databases are the backbone of modern applications. This post dives into ACID, BASE, CAP theorem, Hibernate, JDBC, and NoSQL concepts, with practical examples and tips for Java developers. ๐
1. ๐งฉ Consistency Models: ACID vs BASE
- ACID (Relational Databases)
- ⚡ Atomicity: Each transaction is all-or-nothing.
- ⚡ Consistency: DB remains consistent after transactions.
- ⚡ Isolation: Transactions do not interfere concurrently.
- ⚡ Durability: Committed data persists.
- BASE (NoSQL / Distributed Systems)
- Basically Available
- Soft State
- Eventually Consistent
2. ๐ CAP Theorem
In a distributed system, when a partition occurs:
- ⚖️ Can only choose Consistency (C) or Availability (A) for the system.
- ๐ก Example: Two nodes disconnected; you must either accept stale/inconsistent reads or deny availability.
3. ๐ท Indexes
Indexes speed up queries by storing key info separately.
- CREATE INDEX
index_name ON table(column) - ⚡ Good for large read-heavy tables
- ⚠️ Can slow down writes due to updates on indexed columns
4. ๐ Hibernate & ORM
Hibernate maps Java objects to database tables:
- Core Concepts: Config, SessionFactory, Session, Transaction, Query
- Session Methods:
- createSQLQuery = HQL queries
- saveOrUpdate → updates if PK exists
- save → returns ID; persist → returns nothing
- get → returns null if not found; load → throws exception if not found
- Object States:
- Transient → new, not saved
- Persistent → saved and session open
- Detached → session closed, can reattach with merge
- Loading Types:
- Eager → fetch at initialization
- Lazy → fetch on demand
- Dirty Checking → reduces DB writes by updating only changed entities
- Automatic session closing (with Spring or try-with-resources)
- Exception handling automated via HibernateException
5. ๐ Locking Strategies
- Optimistic → assumes rare conflicts, checks version before commit
- Pessimistic → locks row/table exclusively during transaction
6. ๐พ JDBC vs JPA vs Hibernate
- JDBC → low-level API, manual mapping between Java objects & DB
- JPA → specification, ORM abstraction, manages entities & persistence
- Hibernate → implementation of JPA + caching, session management, advanced features
- Spring Data JPA → requires JPA provider (Hibernate)
- JTA → manages distributed transactions (@Transactional, multiple resources)
7. ⚡ SQL vs NoSQL
- SQL → vertically scalable, predefined schemas, multi-row transactions
- NoSQL → horizontally scalable, dynamic schemas (JSON, documents), sharding for high traffic
8. ๐ Key Concepts: Sharding vs Partitioning vs Replication
- Partitioning → organizing data within a single database into smaller, manageable pieces.
- Can be horizontal (rows split across partitions) or vertical (columns split across partitions).
- Purpose: improve **query performance**, maintenance, and manage large tables.
- Example: splitting orders by month within the same database.
- Sharding → horizontally partitioning data **across multiple database nodes**.
- Each shard holds a subset of the data based on a **shard key** (e.g., user ID, region).
- Purpose: improve **write scalability** and **horizontal scaling**.
- Example: Users 1–1000 → Shard 1, Users 1001–2000 → Shard 2.
- ⚡ Key difference from partitioning: **sharding is distributed across multiple servers**, partitioning can exist inside a single DB.
- Replication → duplicating data across nodes for **read scaling** and **fault tolerance**.
- Read replicas reduce load on primary nodes.
- Writes go to master; replicas are eventually consistent.
- Example: Reporting DB or caching layer served by read replicas.
- Decision Guide
- ✅ Read-heavy → Replication
- ✅ Write-heavy → Sharding / Partitioning
- ⚡ Mixed workloads → combine sharding with replication (replicated shards)
๐ก Practical Tips for Java Developers
- Use Spring Data JPA with **entity mappings** and **transaction boundaries** that respect sharding/partitioning keys.
- For read replicas, annotate service methods with
@Transactional(readOnly = true)to avoid accidental writes. - Pick **stable shard keys** to avoid hotspots and uneven data distribution.
- Partition tables in the same DB for performance optimization; shard across multiple nodes for scaling write-heavy workloads.
๐ก Practical Tip for Java Developers
- Use Spring Data JPA with proper **entity mappings** and **transaction boundaries** for sharded databases.
- For read replicas, consider **@Transactional(readOnly = true)** on service methods to avoid accidental writes.
- Partitioning keys (e.g., user ID, account ID) should be stable and evenly distributed to avoid hot spots.
9. ๐ Example: Hibernate Save vs Persist
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
User user = new User();
user.setName("Alice");
// save → returns generated ID
Serializable id = session.save(user);
// persist → returns void
session.persist(user);
tx.commit();
session.close();
10. ๐งพ Transaction Isolation & Dirty Reads
- Dirty Read: Transaction reads uncommitted changes; risk if rollback occurs.
- Non-Repeatable Read: Data changes between reads in the same transaction.
- Phantom Read: New rows appear between repeated queries in a transaction.
11. ๐ SQL Clauses & Queries
- WHERE: Filters rows before grouping
- HAVING: Filters groups after aggregation
- GROUP BY: Aggregates rows by specified columns
- JOIN: Combines tables using
ON table1.col = table2.col
SELECT department, COUNT(*)
FROM employees
WHERE active = 1
GROUP BY department
HAVING COUNT(*) > 5
ORDER BY department;
SELECT e.name, d.name
FROM employees e
JOIN departments d
ON e.department_id = d.id;
12. ๐ Summary Table
| Concept | Definition / Purpose | Notes |
|---|---|---|
| ACID | Transaction guarantees in RDBMS | Atomicity, Consistency, Isolation, Durability |
| BASE | Eventual consistency in NoSQL | Basically Available, Soft State, Eventually Consistent |
| Sharding | Partitioning data across nodes | Improves scalability |
| Replication | Duplicate data on multiple nodes | Improves availability and fault tolerance |
| Hibernate States | Transient, Persistent, Detached | Determines object lifecycle in session |
| Dirty Read | Reading uncommitted data | Can lead to inconsistencies if rollback occurs |
13. ๐ผ Database Architecture Diagram
Subset of data (Partition)
Subset of data (Partition)
Exact copy for availability
Transaction guarantees in relational DB
Distributed, NoSQL systems
14. ๐ Tips for Java Developers
- Use JPA/Hibernate to simplify DB interactions and avoid boilerplate JDBC code.
- Carefully choose between eager vs lazy loading to optimize performance.
- Monitor transaction locks and isolation levels for concurrency-sensitive applications.
- Consider indexes for large read-heavy tables but avoid over-indexing write-heavy tables.
- Leverage NoSQL for flexible, high-throughput scenarios, and SQL for relational, transactional workloads.
15. ๐ Database Normalization & Denormalization
Normalization and denormalization are techniques to organize data in databases for consistency, integrity, and performance.
๐ Normalization
Normalization reduces redundancy and ensures data integrity by splitting data into multiple related tables.
- Example: Instead of storing customer name in every order row, create a
Customerstable and reference it viacustomer_idin theOrderstable. - Benefits:
- Minimizes duplicate data
- Improves consistency and maintainability
- Trade-offs:
- Requires joins for queries → can impact read performance
๐ฆ Denormalization
Denormalization combines data into fewer tables to optimize read performance, often at the cost of some redundancy.
- Example: Store customer name directly in the
Orderstable to speed up reporting queries without joining theCustomerstable. - Benefits:
- Faster reads
- Simpler queries for reporting and analytics
- Trade-offs:
- More storage used
- Potential inconsistency if duplicated data is not kept in sync
๐ก Practical Tip for Java Developers
- Use JPA/Hibernate mappings carefully: decide which relationships to normalize vs denormalize for performance vs integrity.
- Consider denormalization for read-heavy reporting tables or cache tables in distributed systems.
- Normalize transactional tables to prevent anomalies and maintain data integrity.
Labels: Java, Databases, Hibernate, JPA, JDBC, SQL, NoSQL, ACID, BASE, CAP Theorem, Sharding, Replication, Java Performance, Dirty Read, Transaction Isolation, GROUP BY, HAVING
Comments
Post a Comment