A review of the things I have learned from building in the open over the past year. Thoughts and reflections on what it takes to grow a project and the difficulty translating open-source success to commerical success.
Translating responsible AI principles to create VerifyML. User feedback, design decisions and architecture choices in creating our responsible AI solution.
Fairness is messy and complicated. Attempts to distil it down to a single metric is unhelpful and counter-productive. As business owners and model developers we should embrace the struggle in trying to apply fairness in artificial intelligence and data analytics models.
An exploration of markdown and HTML syntax trees. Documenting my experience creating rehype-prism-plus, a syntax highlighting plugin that creates pretty code blocks.
A good project is only one part of the puzzle. Getting stars is really all about marketing and promoting it. A guide on growth hacking a Github project.
An explanation of the challenges of graph anonymisation and the difficulty of striking a balance between usefulness and anonymity. Written as a response to Singapore's TraceTogether privacy saga
Looking for a performant, out of the box template, with all the best in web technology to support your blogging needs? Checkout the Tailwind Nextjs Starter Blog template
Learn Julia by implementing Schelling's famous segregation model. You will see many similarities to Python - no types need to be specified (it's a dynamic language) and pick up some nice syntactical properties of Julia.
A revised benchmark of graphs / network computation packages featuring an updated methodology and more comprehensive testing. Find out how Networkx, igraph, graph-tool, Networkit, SNAP and lightgraphs perform
The serverless way - using Google Cloud Platform to deploy simple machine learning models via Cloud Run. A fun weekend project that analyses the twitter-verse
In this post, I explore the problem of simplifying route intersections and document some Python code that can be used to clean and visualize Open Street Maps as a network representation
Part II in the network exploration of the Game of Thrones series. In this post, we combine the plots together and use gganimate to visualise relationships across all 5 books
Extending Pyspark's MLlib native feature selection function by using a feature importance score generated from a machine learning model and extracting the variables that are plausibly the most important
This post is the first in a series of my study notes on regression techniques. It covers regression as a solution to the least squares minimisation problem
I find a positive correlation between the foreign-born and consumption shares within U.S. counties but this result does not hold across Asian countries. In fact, an increase in foreign-born share led to a decline in consumption of Asian-related consumer packaged goods
In the previous post, I provided an exploratory analysis of the allrecipe dataset. This post is a continuation and details the construction of product weights from the recipe corpus
Over the past week I made a few detours and explored other options that yielded little. On the positive side, I managed to merge and clean most of the datasets and started generating some descriptive statistics to get a better understanding of the data
I decided to document my progress on my masters thesis as a weekly Thursday special. Hopefully I would have enough materials or progress to continue the weekly post but this should also give me some motivation to work on it