Machine Learning Takes on Heart Disease Risk

Machine learning is a process where computers are used to analyse past data in the hope of predicting outcomes from future data. These outcomes can be anything in the modern world of Machine Learning, ranging from what type of book or CD you are likely to buy to predicting the outcomes of sports events such as horse racing. Various different Machine Learning methods have been devised each with different strengths that can make them more applicable to different types of problems. So a Neural Network may perform better on one kind of problem than say a Random Forest algorithm, but less efficiently on another kind of problem. Risk factor calculators that you plug your data into on the web are based on simpler models which assume a linear relationship between the factors eg LDL, blood pressure etc. Machine learning algorithms can dig deeper so to speak and amongst other things, uncover weightings to various factors. Figure out which are most important and weight them accordingly.

Such an approach was taken with a 10 year project tracking people with 48 factors. Four different algorithms were employed on the data. The data was split into 75% to find out what the relationships were within the data, usually called ‘training’ the model in machine learning parlance. The remaining 25% was used to test how well the model could predict cardio events such as heart attacks. The results were pretty good, out performing conventional risk assessors. Taking an average of how the four methods ranked the different factors we can see that LDL is well behind HDL, Trig’s and HbA1c as a risk factor. Here is an ordered table of those averages showing that age (not surprisingly) was the most impactful feature. Note that some are negatively significant eg Women are significantly less likely to have an event). ‘Missing’ means within the data a patient had this data missing. here is a link to the report

Ethnicitya: South Asian
SESb: 2nd Townsend quintile
Ethnicitya: Black/Afro-Caribbean
SESb: 3rd Townsend quintile
SESb: 4th Townsend quintile
HDL cholesterol*
Oral corticosteroid prescribed
HbA1c missing
Total cholesterol*
Systolic blood pressure*
Ethnicitya: Other/Mixed
SESb: 5th Townsend quintile (most deprived)
Atrial fibrillation
Family history of CHD < 60 years
SESb: Unknown
AST/ALT ratio missing
Ethnicitya: Chinese/East Asian
BMI missing
Ethnicitya: Unknown
Serum creatinine
Immunosuppressant prescribed
gamma GT
Chronic kidney disease
Anti-psychotic drug prescribed
Severe mental illness
Rheumatoid arthritis
Blood pressure treatment*
LDL cholesterol
gamma GT missing
AST/ALT ratio
Serum creatinine missing
FEV1 missing
Serum fibrinogen
CRP missing
Serum fibrinogen missing
LDL cholesterol missing
Triglycerides missing


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s