Matthew PowersSpeed up a pandas query 10x with these 6 Dask DataFrame tricksThis post demonstrates how to speed up a pandas query to run 10 times faster with Dask using six performance optimizations. You’ll often…6 min read·Feb 22, 2022----
Matthew PowersCreating a Java Spark project with Maven and junitThis blog post shows how to organize a Spark Java project, write some application code and run a simple test.3 min read·Jul 4, 2019--2--2
Matthew PowersDependency Injection with SparkDependency injection is a great design pattern to write code that’s more flexible and easier to test.3 min read·Jun 25, 2019----
Matthew PowersSpeaking Slack Notifications from SparkThe spark-slack library can be used to speak notifications to Slack from your Spark programs and handle Slack Slash command responses.2 min read·Apr 3, 2018--4--4
Matthew PowersDocumenting Spark Code with ScaladocYou can use Scaladoc to generate nicely formatted documentation for your Spark projects, just like the official Spark documentation.5 min read·Feb 18, 2018--2--2
Matthew PowersThe different type of Spark functions (custom transformations, column functions, UDFs)Spark code can be organized in custom transformations, column functions, or user defined functions (UDFs).3 min read·Jan 21, 2018--7--7
Matthew PowersAdding StructType columns to Spark DataFramesStructType objects define the schema of Spark DataFrames. StructType objects contain a list of StructField objects that define the name…3 min read·Jan 16, 2018--6--6
Matthew PowersAdding ArrayType columns to Spark DataFrames with concat_ws and splitThe concat_ws and split Spark SQL functions can be used to add ArrayType columns to DataFrames.2 min read·Jan 15, 2018----
Matthew PowersHow to write Spark ETL ProcessesSpark is a powerful tool for extracting data, running transformations, and loading the results in a data store.3 min read·Jan 5, 2018--11--11
Matthew PowersSpark User Defined Functions (UDFs)Spark let’s you define custom SQL functions called user defined functions (UDFs). UDFs are great when built-in SQL functions aren’t…3 min read·Dec 27, 2017--8--8