june 5

today, i predicted variables, wrote 4 blogs, and still didn't design those blind boxes

today's checklist

  • datacamp course: introduction to regression with statsmodels in python

  • catch up with blogs: 29th, 30th, 31st may and yesterday (jun 4)

  • design blind box for christmas

  • digitalise fire on marz main 9 character profiles

  • language dailies

everything but art

ohayou! bonjour! guten morgen!

started the day earlier than usual - 09.09!

managed to finish my language dailies in less than half an hour.

french: elementary a2 - 1.7 developing fluency
french english
J'ai fait le ménage et après, j'ai regardée la télé. I did some housework and then I watched TV.
Je suis restée à la maison pendant le week-end. I stayed home over the weekend.
Je suis sortie avec des amis toute la soirée. I went out with friends all evening.
german: elementary a2 - 1.5 developing fluency
german english
Das Wetter war schön. Ich war im Park. The weather was nice. I was in the park.
Ich hatte Besuch. Meine Schwester war hier. I had a visitor. My sister was here.
Ich war am Wochenende im Park. I was in the park at the weekend.
Ich war kaputt. Ich war zu Hause. I was knackered. I was at home.

regression? like, in orv?

i finished the datacamp course 'introduction to regression with statsmodels in python' just in time for lunch (12.27). here are my notes for the course:

chapter 1: simple linear regression modeling

  • regression
    • statistical models that help explore relationship between a response variable and explanatory variables
    • if given the values of explanatory variables, values of response variables can be predicted
  • response variable
    • y variable
    • dependent variable
    • the variable you want to predict
  • explanatory variable
    • x variable
    • independent variable
    • variables that explain how the response variable will change
  • linear regression
    • when the response variable is numeric
  • logistic regression
    • when the response variable is logical (either True or False)
  • simple linear / logistics regression
    • only one explanatory variable
  • a scatterplot can be used to visualise pairs of variables
  • python packages for regression
    • statsmodels - for insight
    • scikit-learn - for prediction
  • a histogram can be used to visualise the relationship between numerical and categorical variables

chapter 2: predictions and model objects

  • the predicting question: if i set the explanatory variables to these values, what value would the response variable have?
  • extrapolating
    • making predictions outside the range of observed data
  • fitted values (.fittedvalues attribute) - predictions on original dataset
  • residuals (.resid attribute) - actual response values minus predicted response values (how much the model missed by)
  • response value = fitted value + residual
  • regression to the mean
    • when residuals exist due to problems in model and fundamental randomness
    • extreme cases are often due to randomness
    • extreme cases don’t persist over time - will eventually look like average cases
  • to fit a linear regression model, may need to transform the explanatory or response variable if they do not give a straight line

chapter 3: assessing model fit

  • coefficient of determination (r-squared or R-squared)
    • the proportion of the variance in the response variable that is predictable from the explanatory variable
    • 1 is a perfect fit
    • 0 means worst possible fit
    • correlation squared - for simple linear regression
  • residual standard error
    • roughly a measure of the typical size of the residuals - how much the predictions are typically wrong
  • leverage - a measure of how extreme the explanatory variable values are
  • influence - measures how much the model would change if you left observation out of the dataset when modeling

chapter 4: simple logistics regression modeling

  • logistic regression
    • type of generalised linear model
    • user when the response variable is logical
    • follows s-shaped curve
  • odds ratio - the probability of something happening divided by the probability that is doesn’t
    • odds_ratio = probability / (1-probability)
  • four outcomes to a logical response variable
    • predicted false, actual false - correct
    • predicted false, actual true - false negative
    • predicted true, actual false - false positive
    • predicted true, actual true - correct
    • a confusion matrix - the counts of each outcome
  • accuracy of model
    • the proportion of correct predictions
    • accuracy = tn+tp / tn+fn+fp+tp
  • sensitivity of model
    • the proportion of true positives
    • accuracy = tp / fn+tp
  • specificity of model
    • the proportion of true negatives
    • tn / tn+fp

blog checkpoints

today's catchup for blog posts was for the 29th, 30th, and 31st may, as well as yesterday's blog, the 4th june. here are the times i was able to finish them, starting from about 16.30:

  • 17.19 finished 29th may blog page
  • 18.11 finished 30th may blog page
  • 18.32 finished 31st may blog page
  • 19.09 finished yesterday's blog page!!