10th jul '25

this week's goals

✔ finish datacamp course 30

finish datacamp course 32

✔ finish freecodecamp project: drum machine

✔ finish freecodecamp project: calculator

✔ finish freecodecamp project: 25 + 5 clock

today's checklist

✔ maru hiragana

✔ finish datacamp course 30, chapter 4

✔ finish datacamp course 32, chapter 1

✔ finish datacamp course 32, chapter 2

tags

backend
languages

more datacamp

here are my notes on the first 3 chapters of the datacamp courses i did today (a continuation of experimental design in python and supervised learning with scikit-learn):

chapter 4: advanced insights from experimental complexity

addressing complexities in experimental data
- heteroscedasticity - changing variability of a variable across the range of another variable

chapter 1: classification

machine learning with scikit-learn
- machine learning - the process in which computers learn to make decisions from data without being explicitly programmed
- unsupervised learning - the uncovering of hidden patterns from unlabelled data
- supervised learning - a type of machine learning in which the predicted values are known and the model is built to accurately predict values of previously unseen data
the classification challenge
- the 4 steps of classifying labels of unseen data:
  - building a model
  - have the model learn from labelled data we pass to it
  - pass unlabelled data to the model as input
  - model predicts labels of the unseen data
- labelled data = training data
- k-nearest neighbours algorithm - used to predict the label of any data point by looking at the k closest labelled data points, and by taking a majority vote
- measuring model performance
  - accuracy - correct predictions / total observations
covariate adjustment in experimental design
- covariates - variables that are related to the outcome variable and can influence its analysis
- can help in reducing confounding

chapter 2: regression

introduction to regression
- (another type of supervised learning) in regression, target variable typically has continuous values
the basics of linear regression
- y=ax+b
  - where y is the target
  - x is the single feature in simple linear regression
  - a and b are prarmeters or the coefficients of the model (slope and intercept)
- when adding more features: y= a1x1 + a2x2 + …. anxn + b
cross-validation
- steps of k-fold cross-validation:
  - split dataset into groups / folds
  - set aside first fold as test set
  - fit model on remaining folds
  - predict on test set
  - computer metric of interest
  - repeat with the next folds
- regularised regression
  - regularisation is used to avoid overfitting in regression
  - penalises large coefficients