# Maximum recommended explanatory variables

Modified on Sun, 15 Nov 2020 at 04:53 AM

# Why is the number of variables limited in multivariate analysis?

The number of explanatory variables you can add in a model is limited: it is important to have at least 10 subjects per numeric variable or per n-1 modalities of categorical variables for the multivariate analysis to be properly performed. If not, the model could not be a proper reflection of reality, thus the coefficients will be unreliable, or worst, the model could be unable to find the coefficients because of a mathematical convergence problem.

# How are counted the number of variables I can use in my regression model?

To know which number of modalities/variables you can add in your model: take the count displayed under the variable you chose to explain. This number is the number of observations if the variable to explain is numeric; the number of observations in the lowest modality of the variable to explain if it is binary. Divide this number per 10: this is the maximum number of explanatory variables recommended.
To count the number of modalities/variables you are adding in your model: each numeric variable counts for 1, each binary variable counts for 1, each variable with k modalities counts for k-1.

Example:

1) The variable you want to explain is the Constant Score, which is a numeric variable, and you have 60 observations: you can use 60/10=6 explanatory variables with less than 3 modalities in your model.

2) The variable you want to use is the Smoking Status, which is a binary variable, and you have 70 observations in the modality "Smoker" and 100 in the modality "Non-smoker". You can use 70/10=7 explanatory variables with 2 or less modalities in your model.

3) Taking the second example: you can choose 7 explanatory variables for your model. You can serenely add the variables: "age" (numeric), "sex" (man, woman), "cancer status" (no, yes, metastatic), "professional status" ("retired","manager", "emloyee", "unemployed") which count for 1 + (2-1) + (3-1) + (4-1) = 1+1+2+3 = 7.

It is unrecommended to use the variable "professional status" with 5 modalities in place of the one with 4 modalities which count for 5-1=4 modalities, and thus will exceed the number of explanatory variables that should be used.

# Can I use a variable with a large number of modalities as an explanatory variable?

If you really want to use a categorica variable which have too much modalities for the model to be safe, it is recommended to group modalities together to lower the number of modalities in excess.

# How to use it on EasyMedStat?

1. Go to Multivariate analysis.
2. Choose a variable to explain.
3. Choose explanatory variables following the progress bar which will guide you. 