Cassandra
Basics
- replication factor (RF)
- consistency level (CL) = QUORUM (Quorum referring to majority, 2 replicas in this case or RF/2 +1)
- coordinator
- SSTable
- memtable
- timestamps
- compaction: take small SSTables and merging them into bigger one
Intro
- keyspaces:
- top-level namespace/container
- similar to a relational database schema
CREATE KEYSPACE killrvideo WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor': 1 }; - USE switches between keyspaces
USE killrvideo; - Tables:
- keyspaces contain tables
- tables contain data
CREATE TABLE table1 ( column1 TEXT, column2 TEXT, column3 INT, PRIMARY KEY (column1) ); CREATE TABLE users ( user_id UUID, first_name TEXT, last_name TEXT, PRIMARY KEY (user_id) ); - Basic Data Types:
- text: UTF8 encoded string, varchar is same as text, unbounded
- int: Signed, 32 bits
- timestamp: date and time, 64 bit integer, store number of seconds since Jan 1st 1970 GMT
- UUID && TIMEUUID
- generate global unique id without communication between nodes
- TIMEUUID embeds a TIMESTAMP value
- INSERT:
INSERT INTO users (user_id, first_name, last_name) VALUES (uuid(), 'Joseph', 'Chu'); - SELECT:
SELECT * FROM users; SELECT first_name, last_name FROM users; SELECT * FROM users WHERE user_id = 4b516b3-ddf0-4c43-bab6-b91d674b64a5; - COPY:
- imports/exports CSV
COPY table1 (column1, column2, column3) FROM 'table1data.csv';- header parameter skips the first line in the file
COPY table1 (column1, column2, column3) FROM 'table1data.csv' WITH HEADER=true; - get data into Cassandra:
- COPY
- Spark
- Drivers
- Etc.