1 min readJun 18, 2018
Thanks for the comment.
The spark-stringmetric library has Spark implementations of common similarty functions (it’s a Scala library though…).
Check out this test file for these functions in action.
You could probably create a Spark UDF that wraps the fuzzywuzzy functions you need to access. You’ll be able to solve your problem with native Spark, but it’ll get a little complicated with broadcasting variables. Good luck!