2023-06-17 08:53:41 -05:00
2022-11-21 10:43:35 -06:00
2023-06-17 08:53:41 -05:00
2022-10-17 21:41:45 -05:00
2023-02-18 19:20:17 -06:00

python.funs

A handy list of functions that I've developed over time to make my work easier. Details on what they contain below.

Clustering

This is a folder for any more clustering functions or solutions I come up with, current only has what's mentioned below.

KPOD clustering

KPOD is a method to use K-Means on data that has largely missing values. The official Python package is here, with a CRAN package by a different author here. Both of these are based on the original paper here.

What have I changed from the original implementation?

  1. Created a single class instead of multiple functions, since most of the code relies on scikit-learn.
  2. Added a method to obtain the "best" k value based on the silouette score.

Fuzzy joins in Python

Python lacks broad string based fuzzy matching support, unlike R with its stringdist package. If you have flexibility in the tool you want to use, please, for the love of God, use R in this instance. If you absolutely have to use Python, please head over to my repository for a fast and efficient solution that uses tf-idf values, with documentation from where it was created.

Description
A handy list of functions that I've developed over time to make my work easier.
Readme MIT 46 KiB
Languages
Python 100%