To do survival analysis in R, you need two variables: A yes/no (1/0) flag that says whether or not the person got the event, and A time variable that says how long it took for them to get the event if they got it, or how long they were followed-up if they didn’t get it.
Here is sampling of ways to make variables in R on the condition of values of other variables. We will continue using our dataset of fake myocardial infarction (MI) patients who survived but are at high risk for another MI. Here is what I mean to illustrate with each example: I cheat by doing this
Here, we will use a fake dataset (see this post for how to read in the fake dataset). These are fake people who had a myocardial infarction (MI), and we are worried they are high risk to have another one. Therefore, we want to calculate their risk score. Notice the term (Age/10)^2 is suspiciously ugly.
Before you read any data in, you need to set up directories that you will use for this R project. I make three directories: In R, there are so many formats, that it doesn’t just assume you’ve read in a table the way SAS and SPSS do. R thinks what you read in could be