Due the nature of Cassandra to distribute across many nodes, and even across different DCs / Availability zones it looks very attractive to use Cassandra tables as a queue - store data in one location, and process it in another one. So Cassandra will solve the transportation layer by itself, and no any other components will be needed.
Implement a queue analogue based on Cassandra tables.
In two words - tombstones and compaction.
But as they were said - Note that it’s possible to improve on this hypothetical queue scenario. Specifically, when knowing what the last entry was, a consumer can specify the start column and thus somewhat mitigate the effect of tombstones by not having to either 1) start scanning at the beginning of the row and 2) collect and keep all the irrelevant tombstones in memory.
- Separate data and statuses to prevent any updates in the Data table
- Make status records as small as possible
- Use rounded timestamp for the Status table as a partition key. For example round it by 5 minutes.
- Process data records one by one
- Change the status record before processing to prevent collisions with other processor instances ("Lock" the record)
- Change the status again after the data record was processed ("Unlock" the record)
Use DateTieredCompactionStrategy compaction
For the data consistency always use LOCAL_QUORUM for all statements