Tuesday, June 19, 2012

Cassandra quick notes (Part I)

I had a number of quick notes on Cassandra, which I thought others might find useful as well. Since my original set of quick notes is pretty long, I am breaking it up into two parts (this is the first part). If you are interested in the "operations" aspects of Cassandra, I would recommend looking at this book; a lot of the pointers are from this book.
  • Calculate ideal initial tokens
$  \text{init_token} = \text{node_num_zero_indexed} \times \frac{2^{127}}{\text{num_nodes}} $
  • Adjust replication factor to work with quorum
$ \text{nodes_for_quorum} = \frac{\text{rep_factor}}{2} + 1 $
  • Anti Entropy Repair
Anti Entropy Repair (AES) is a very intensive data repair mechanism, and should preferably be run at times of low traffic. It can result in duplicate data on nodes, which can be removed using nodetool compact, or can be eliminated during the repair process using a the $\text{-pr}$ option. Schedule for AES should be lower or equal to $\text{gc_grace_seconds}$. AES should be run in the following situations:
    • Change in replication factor
    • Joined nodes without auto bootstrap
    • Lost or corrupted fils (such as SSTables, indexes, or commit logs)
$ <cassandra_home>/bin/nodetool -h $HOST -p $PORT repair -pr
  • Nodetool cleanup
Use nodetool cleanup to remove copies of data from nodes for which they are not
responsible for. Cleanup is intensive; run cleanup for topology changes, or
hinted handoff and write consistency ANY.
$ <cassandra_home>/bin/nodetool -h $HOST -p $PORT cleanup
  • Use nodetool snapshot for backup
Snapshot makes hard-links of files in the data directory to a subfolder,
"snapshot/<timestamp>"
$ <cassandra_home>/bin/nodetool -h $HOST -p $PORT snapshot
  • Clear snapshots with nodetool
$ <cassandra_home>/bin/nodetool -h $HOST -p $PORT clearsnapshot
  • Nodetool to move nodes in ring
$ <cassandra_home>/bin/nodetool -h $HOST -p $PORT move <new_token>
  • Nodetool to remove a "downed" node
$ <cassandra_home>/bin/nodetool -h $HOST -p $PORT removetoken <token_value>
When a node is removed, Cassandra actively begins replicating the missing data until it is stored on the number of nodes specified by replication factor.
  • Removing a "live" node
$ <cassandra_home>/bin/nodetool -h $LIVE_NODE_TO_REMOVE -p $PORT decommission
  • Get quick stats using nodetool
The following are pretty handy in quickly gleaming at Cassandra cluster state.
$ <cassandra_home>/bin/nodetool -h $HOST -p $PORT tpstats

$ <cassandra_home>/bin/nodetool -h $HOST -p $PORT cfstats

$ <cassandra_home>/bin/nodetool -h $HOST -p $PORT compactionstats

$ <cassandra_home>/bin/nodetool -h $HOST -p $PORT cfhistgrams
  • Monitor GC events in Cassandra log
Cassandra has options in $\text{conf/cassandra-env.sh}$ that cause Java to print
garbage collection message to log file.
$ grep "GC inspection" /var/log/cassandra/system.log

Thursday, June 14, 2012

My reading list 2012, so far

I am happy to say that I have managed to adhere to my goal of reading regularly. Below is a list of the books I have read to far, half way into the year:
  • Refactoring: Improving the Design of Existing Code by Martin Fowler
  • Outliers: The Story of Success by Malcolm Gladwell
  • The Lean Startup: How today's entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses by Eric Ries
  • Armchair Economist: Economics and Everyday life by Steven Landsburg
    > This has been on my reading list for a while, finally got around to it this year.
  • Maximum City: Bombay Lost and Found by Suketu Mehta.
(updated 08/19/2012)
  • Cassandra: High Performance Cookbook by Edward Capriolo.
  • Quiet: The Power of Introverts in a World that can't stop talking by Susan Cain.
  • Getting Real: The smarter, faster, easier way to build a successful web application by 37signals.
(updated 11/15/2012)
  • Moonwalking with Einstein: The art and science of remembering everything by Joshua Foer.