Blog Posts

Why Parquet Beats CSV for Modern Data Work

2025-09-23

CSV got us started, but it struggles at scale: bloated files, messy schemas, and painfully slow queries. Parquet flips the model—storing data in compact, labeled columns that query faster, compress better, and form the foundation of modern lakehouses. In this post, I use a Lego analogy to show why Parquet isn’t just a file format upgrade, it’s the workshop behind today’s analytics.

Modern Data Architecture Explained Through the Kitchen

2025-08-31

Modern data architecture can feel like alphabet soup: databases, warehouses, lakes, lakehouses, catalogs. With Microsoft Fabric making Delta Lake the default, it’s never been more important to understand how these systems fit together. In this post, I use a kitchen metaphor to break down the strengths and weaknesses of each—lunchboxes, buffets, pantries, and chefs—before looking at DuckLake, a new approach that puts metadata where it belongs: in a database. The goal isn’t hype, it’s clarity—so you can design an architecture that feeds your business real insight.

A Practical Guide to Shipping Data Projects That Work

2025-07-28

Most data projects don’t fail because of tools—they fail because of poor communication, overengineering, and ignoring real-world constraints. This post breaks down what data success *actually* looks like: delivering useful answers, on time, with minimal overhead. Learn how small, business-aware teams can ship faster, automate smarter, and scale simpler by focusing on impact, not architecture.

From Hand Tools to Horsepower: The Case for DuckDB

2025-06-09

Pandas is a fantastic tool for small datasets and quick analysis but hits limits when scaling or persisting state. DuckDB fills that gap by combining SQL-native querying, persistent local storage, and high performance—allowing data engineers to build scalable, reliable pipelines on their laptop without spinning up clusters. This post explores the practical differences between Pandas and DuckDB, real-world use cases, and why DuckDB is the smarter tool for modern data workflows.

The Next-Gen BI Tool Isn’t a Tool — It’s a Kit

2025-05-27

Traditional BI tools prioritize speed and ease but often sacrifice flexibility and customization. Today, the rise of AI, modular libraries, and instant cloud platforms like Replit empower data engineers to build highly customizable, interactive, and user-focused data experiences—without needing full-stack development expertise. This shift transforms BI from rigid, one-size-fits-all dashboards into composable, code-assisted data product kits that deliver tailored insights and enable narrative-driven storytelling. Discover how the future of BI is no longer a monolithic platform but a flexible toolkit that bridges data engineering and user experience seamlessly.

Creativity Loves Constraints: Lessons from the Data Engineering Trenches

2025-04-16

Constraints shape creativity in data engineering more than limitless resources ever could. From limited budgets and tight deadlines to technical and organizational boundaries, data teams constantly navigate tradeoffs that spark smarter, more pragmatic solutions. This post explores how real-world constraints drive innovations like Zero ETL, local-first engines like DuckDB, and integrated platforms like Microsoft Fabric. Instead of fighting limitations, successful engineers learn to embrace and design with constraints—delivering impactful, efficient data solutions that work within the messy realities of business.

Microsoft Fabric: Finally a Way to Get Sh*t Done in Data Without Fighting the Stack

2025-03-31

Microsoft Fabric revolutionizes data workflows within the Microsoft ecosystem by unifying ingestion, transformation, modeling, and reporting into a seamless, serverless platform. It eliminates traditional Azure complexity and enables faster, more autonomous analytics with native Power BI integration, Lakehouses, notebooks, and real-time replication. While still evolving, Fabric dramatically improves productivity and collaboration by reducing tool fragmentation and infrastructure overhead—making it the most practical solution I’ve seen for getting data projects done in Microsoft environments.

Do You Really Need Data Modeling? A Practical Look

2025-02-05

Data modeling has been the backbone of structured analytics for decades, ensuring consistency, performance, and reliability. But with modern storage affordability, faster processing, and flexible BI tools, the necessity of rigid data models is evolving. This post explores when traditional modeling adds value—and when startups and agile teams can thrive by embracing more flexible, denormalized, or hybrid approaches. Learn how to balance structure and speed to deliver impactful insights without over-engineering your data pipeline.

Insights, Not Infrastructure: The True Goal of Data Engineering

2025-01-17

Data engineering isn’t about building pipelines or managing infrastructure for its own sake. It’s about delivering clear, timely, and actionable insights that empower decision-makers. This post explores why stakeholders want insights—not raw data—and how data teams can focus on outcomes over technology. By understanding the organizational and technical challenges in turning data into useful knowledge, and fostering better collaboration and feedback, data engineers can truly move the needle.

Demystifying Real-Time Reporting

2024-12-23

Not every data request needs a fire alarm. This post cuts through the hype around “real-time” reporting—clarifying what it is, what it isn’t, and how to deliver fresh, actionable insights without burning down your team or your budget.

Farm-to-Table for Data Engineers: How Zero ETL Delivers Fresh Results

2024-11-27

Traditional ETL pipelines are slow, brittle, and expensive—leaving data teams stuck serving yesterday’s leftovers. The Zero ETL movement flips the script by bringing data directly from source to analytics in real time, cutting out unnecessary prep and manual overhead. This post explains what Zero ETL is (and isn’t), why now is the moment for change, and how data teams can deliver fresher, faster, and more reliable insights using modern tools and automation. Discover how a farm-to-table approach is revolutionizing data engineering.