GMV-Optimizing My Wife’s Portfolio My wife caught the investing bug. She recently put together a portfolio of healthcare, consumer (cyclical and defensive), technology services,…
Product Focus & Recs Using Pareto, Apriori, & Association Process Data Let’s clean and process the UCI Machine Learning Online Retail data set so it’s ready to use later…
Detecting & Treating Price Outliers A few outliers can significantly skew our understanding of variables or influence a predictive model. Depending on the source and…
Cohort Metrics for a Fictitious Online Wine Shop Today, let’s construct customer time cohorts, specifically first purchase dates, and look at metrics like retention and revenue by cohort.…
Customer RFMT Metrics & Segmentation Today, we’ll construct recency, frequency, monetary value, and tenure (RFMT) metrics and segments using Brazilian ecommerce marketplace Olist’s sales transactions…
Comparing Filtering Methods on MoMA Artworks We regularly filter dataframes using boolean masking, loc, query, and numpy.where. I was curious how they actually compared. I came…
Peloton’s Implied Volatility & Delta Hedge Today, let’s dig into Peloton (ticker: PTON). The purpose of this post is to: Forecast PTON’s volatility at a point…
Ebay Kleinanzeigen Peppa Pig Scrape My daughter Danae’s birthday is on the horizon. Since she adores Peppa Pig, I scraped Ebay Kleinanzeigen (Germany’s eBay classifieds)…
Comparing groupby, pivot_table, and crosstab on Covid Smell Loss Data We routinely use groupby, pivot_table, and crosstab along with stack and unstack to pivot data by variables in order to…
Moderna & AbbVie: Dynamic Covariance Today, let’s look at dynamic covariance for biotech diversification and weighting, building on our previous post on GARCH models for…
Spotify: Dynamic Conditional Beta Spotify (ticker: SPOT), which went public cash flow-positive as a direct listing back in April 3, 2018, is one of…
Cleaner, Clearer Code with Pipe I first came across pandas pipe last month at a Berlin Time Series Analysis meetup and in a TDS post.…
Gun Sales: Identifying Time Series Structural Change In a previous post, we explored and visualized population-adjusted gun sales, looking at many dimensions of time series data. We…
Cartoonize Your LinkedIn Photo In our post Fun With Numpy from a few months ago, we loaded a color photo, converted it to a…
NBA Team Dashboard It’s hard to believe the NBA 2021 season has started! We’ve created an NBA Team Dashboard that lets fans explore…
Imputing Time Series Missing Values Today, let’s see how different missing value impute methods stack up for various types of time series. It was inspired…
Berlin Covid Dashboard We’ve created an interactive dashboard that explores Berlin’s covid profile and evolution by district. It was borne out of the…
Guns: Time Series Analysis and Forecast Today, we’ll do a deep-dive time series exploration and SARIMAX forecast using data on gun backround checks in the U.S.…
Ode to Tegel On November 8, 2020, Tegel (TXL) saw its last flight take off, closing its doors after 47 years in operation…
FiveThirtyEight’s Pollster Ratings & Goodreads: Reshaping Wide to Long Data is commonly in wide format, with rows being unique individual observations (no repeated records) and columns being features for…
Astronaut Flight Records: Reshaping List-Like Data We often encounter messy list-like or string data that needs to be reshaped into separate rows for further analysis, including…
Poll Closing & Key Races by State It’s hard to believe that it’s finally Election Day. To curb results-watching anxiety, it would be great to have poll…
SpaceX & Wine: Reshaping Nested Data We often encounter nested data, including in json files or dataframes with nested data columns. Today, we’ll walk through a…
Creating an IPO Spider 350+ companies and counting have IPO’d on U.S. exchanges in 2020, the most since 2000, including big names like Palantir,…
’96 Sonics: 3 Regex Methods to Split Names Today, we’ll walk through 3 simple regex methods to split names into first and last names in a dataframe. We’ll…
Berlin Temperatures 2009-2019 Extreme temperature is an increasingly important topic to understand. As Berlin is my home, it would be great to visualize…
Going Bananas Today, we’ll predict (in-sample) and forecast (out-of-sample) banana prices using time series data from January 1, 1990 through July 1,…
Building a Value-Weighted Telehealth Index Telehealth – remote care delivery, patient monitoring, and education / engagement via a broad range of digital technologies such as…
Simulating Netflix Equity Price and Returns In the coming weeks, we’ll focus on some super-fun time-series data, including accessing and working with time-indexed data and building…
Berlin’s Airbnb Market & Venue Clusters by Neighborhood As one of Europe’s fastest-growing economies, Berlin has quickly grown into a tourism and residential real estate magnet. Imagine a…
Mapping, Segmenting, and Clustering Brooklyn’s Neighborhoods Imagine you’re a retailer, restaurant, bar, or salon looking for the best spot for a new location in Brooklyn. Or,…
Cancer Cell Samples: 4 Classification Models What is the likelihood that a patient’s cancer is malignant? Today, we’ll look into this question. We’ll optimize, train, make…
KNN, Decision Tree, SVM, and Logistic Regression Classifiers to Predict Loan Status Hi Crawbears, Today, we’ll look into the question: Will a new bank customer default on his or her loan? We’ll…
San Francisco Crime Interactive Choropleth, Marker, and Cluster Maps Last time, we created FiveThirtyEight-style visuals and choropleth maps after cleaning and wrangling data from an official UN data set…
U.S. Immigration FiveThirtyEight-Style Visuals and Choropleth Maps Hi Crawbears, Today, we’ll look into the question: How have U.S. immigration trends and distribution changed by country over the…
Multivariate Linear & Polynomial Models to Predict Home Price Hi Crawbears, Are you getting a fair value for your home? Let’s think about this question by working with a…
Predicting Whether You’ve Smoked 100 Cigarettes: a Marginal Logistic Model Hi Crawbears, Last time, we constructed linear models, including OLS, marginal, and multilevel, with the NHANES national health and nutrition…
Marginal & Multilevel Linear Models Hi Crawbears, In a previous post, we dug into the NHANES national health and nutrition data set to answer some…
Bayesian Simulation Hi Crawbears, In a previous post, we applied chained Bayesian logic to interpret a positive COVID-19 test result. We’ve also…
76 Cereal Brands, a Head-to-Head Comparison Hi Crawbears, A reader sent in a fun ask: Can you do an analysis comparing cereal brands? It’s hard not…
Uninsured Racial Disparity, Blood Pressure Gender & Smoker Diffs Hi Crawbears, In a previous post, I examined baby circadian clocks, specifically the difference in night bedtime and sleep duration…
Chained Bayesian: Interpreting a Positive COVID-19 Test Result Hi Crawbears, Bayesian logic can be used to compute probabilities in your daily life. It makes your subjective belief (prior…
Baby Circadian Clocks Parents of toddlers carefully scrutinize, guard, and manage when and for how long their babies’ nap during the day and…
Exploring & Visualizing Gapminder Hi Crawbears, In my previous analysis of the NBA 3 Pointer’s dramatic evolution, I walked through data subsetting, time series…
Simulating the Effects of Non-Representative Sampling and Sample Size What I find exhilarating about constructing Python simulations is that they can bring to life concepts and frameworks of how…
Detlef Schrempf in the 3 Point Era Following my analysis The NBA 3 Pointer’s Staggering Rise, a reader asked how Detlef Schrempf stacks up. As Schrempf is one…
The NBA 3-Pointer’s Staggering Rise Today’s NBA game is, in many ways, unrecognizable from it’s mid-90s self. One of the most prominent shifts has been…
Danae & Shaelo on Strawberry Summit Hi Crawbears, For Crawstat’s inaugural post, I built a fun, basic simulation with Python that you could try yourself. A…
Welcome to Crawstat! Hi Crawbears, Does data make your heart beat faster? Do you get a thrill out of building statistical models and…