Matthew Powers
1 min readJun 18, 2018

--

Thanks for the comment.

The spark-stringmetric library has Spark implementations of common similarty functions (it’s a Scala library though…).

Check out this test file for these functions in action.

You could probably create a Spark UDF that wraps the fuzzywuzzy functions you need to access. You’ll be able to solve your problem with native Spark, but it’ll get a little complicated with broadcasting variables. Good luck!

--

--

Matthew Powers
Matthew Powers

Written by Matthew Powers

Spark coder, live in Colombia / Brazil / US, love Scala / Python / Ruby, working on empowering Latinos and Latinas in tech

No responses yet