systems notes, field experiments, writeups from the late shift.

online
kartik@site:~/posts$ ls
2023
Apache Helix: The Distributed System’s Orchestra Conductor
achieve harmony in complex clusters using finite-state machines
Navigating the Minefield of RocksDB Configuration Options
unleashing the full potential of your rocksdb with the right configuration
2021
How to Package Java Projects in Python Tar files
before diving into this article, i should state that — as a developer — any situation requiring placing a language a project into a language b package should occur very rarely. most of the time it’s preferable to consider re-designing the interaction between various language components in these situations. but what if this situation is unavoidable? open source projects such as apache flink and apache spark serve as examples. these projects have been written completely in java but also have python modules available for those who don’t want to use the java api.
2020
Utilize UDFs to Supercharge Queries in Apache Pinot
groovy functions
Leverage Plugins to Ingest Parquet Files from S3 In pinot
one of the primary advantages of using pinot is its pluggable architecture. the plugins make it easy to add support for any third-party system which can be an execution framework, a filesystem, or input format.
Learning Multi-dimensional indices: The next big thing in OLAP DBs
flood
A Glimpse into my “WFH in Quarantine” Life
the lab
How Does Zookeeper Servers Remain In sync?
leader and followers
Why Apache Airflow Is a Great Choice for Managing Data Pipelines
a glimpse at capabilities which makes airflow better than its predecessors
Deploying ML Models in Distributed Real-time Data Streaming Applications
explore the various strategies to deploy ml models in apache flink/spark or other realtime data streaming applications.