Matthew Powers – Medium

Matthew Powers

Matthew Powers

Speed up a pandas query 10x with these 6 Dask DataFrame tricks

This post demonstrates how to speed up a pandas query to run 10 times faster with Dask using six performance optimizations. You’ll often…

Feb 22, 2022

Speed up a pandas query 10x with these 6 Dask DataFrame tricks

Feb 22, 2022

Matthew Powers

Creating a Java Spark project with Maven and junit

This blog post shows how to organize a Spark Java project, write some application code and run a simple test.

Jul 4, 2019

Jul 4, 2019

Matthew Powers

Dependency Injection with Spark

Dependency injection is a great design pattern to write code that’s more flexible and easier to test.

Jun 25, 2019

Jun 25, 2019

Matthew Powers

Speaking Slack Notifications from Spark

The spark-slack library can be used to speak notifications to Slack from your Spark programs and handle Slack Slash command responses.

Apr 3, 2018

Apr 3, 2018

Matthew Powers

Documenting Spark Code with Scaladoc

You can use Scaladoc to generate nicely formatted documentation for your Spark projects, just like the official Spark documentation.

Feb 18, 2018

Documenting Spark Code with Scaladoc

Feb 18, 2018

Matthew Powers

The different type of Spark functions (custom transformations, column functions, UDFs)

Spark code can be organized in custom transformations, column functions, or user defined functions (UDFs).

Jan 21, 2018

Jan 21, 2018

Matthew Powers

Adding StructType columns to Spark DataFrames

StructType objects define the schema of Spark DataFrames. StructType objects contain a list of StructField objects that define the name…

Jan 16, 2018

Jan 16, 2018

Matthew Powers

Adding ArrayType columns to Spark DataFrames with concat_ws and split

The concat_ws and split Spark SQL functions can be used to add ArrayType columns to DataFrames.

Jan 15, 2018

Jan 15, 2018

Matthew Powers

How to write Spark ETL Processes

Spark is a powerful tool for extracting data, running transformations, and loading the results in a data store.

Jan 5, 2018

Jan 5, 2018

Matthew Powers

Spark User Defined Functions (UDFs)

Spark let’s you define custom SQL functions called user defined functions (UDFs). UDFs are great when built-in SQL functions aren’t…

Dec 27, 2017

Dec 27, 2017

Matthew Powers

Matthew Powers

Spark coder, live in Colombia / Brazil / US, love Scala / Python / Ruby, working on empowering Latinos and Latinas in tech

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams