Maximum recommended explanatory variables

Modified on Sun, 15 Nov 2020 at 04:53 AM

Why is the number of variables limited in multivariate analysis? 


The number of explanatory variables you can add in a model is limited: it is important to have at least 10 subjects per numeric variable or per n-1 modalities of categorical variables for the multivariate analysis to be properly performed. If not, the model could not be a proper reflection of reality, thus the coefficients will be unreliable, or worst, the model could be unable to find the coefficients because of a mathematical convergence problem.  


How are counted the number of variables I can use in my regression model? 


To know which number of modalities/variables you can add in your model: take the count displayed under the variable you chose to explain. This number is the number of observations if the variable to explain is numeric; the number of observations in the lowest modality of the variable to explain if it is binary. Divide this number per 10: this is the maximum number of explanatory variables recommended. 
To count the number of modalities/variables you are adding in your model: each numeric variable counts for 1, each binary variable counts for 1, each variable with k modalities counts for k-1. 


Example: 

1) The variable you want to explain is the Constant Score, which is a numeric variable, and you have 60 observations: you can use 60/10=6 explanatory variables with less than 3 modalities in your model.

2) The variable you want to use is the Smoking Status, which is a binary variable, and you have 70 observations in the modality "Smoker" and 100 in the modality "Non-smoker". You can use 70/10=7 explanatory variables with 2 or less modalities in your model.

3) Taking the second example: you can choose 7 explanatory variables for your model. You can serenely add the variables: "age" (numeric), "sex" (man, woman), "cancer status" (no, yes, metastatic), "professional status" ("retired","manager", "emloyee", "unemployed") which count for 1 + (2-1) + (3-1) + (4-1) = 1+1+2+3 = 7. 

It is unrecommended to use the variable "professional status" with 5 modalities in place of the one with 4 modalities which count for 5-1=4 modalities, and thus will exceed the number of explanatory variables that should be used. 


Can I use a variable with a large number of modalities as an explanatory variable?

If you really want to use a categorica variable which have too much modalities for the model to be safe, it is recommended to group modalities together to lower the number of modalities in excess.


How to use it on EasyMedStat?


  1. Go to Multivariate analysis.
  2. Choose a variable to explain.
  3. Choose explanatory variables following the progress bar which will guide you.


Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article