Data Cleaner

Modified on Mon, 7 Nov, 2022 at 12:09 PM

What is the data cleaner?


The data cleaner is a set of tools that allow you to manually check and correct your data. Cleaning your data is a mandatory step before statistical analysis. The data cleaner helps you to streamline this step.


The data cleaner will analyze your existing data, detect common mistakes like outliers or misspellings and allow you to clean up your data without having to return to each patient record individually.


Several types of data issues can be detected and cleaned:

  • Outliers in numeric variables: values that significantly differ from other values
  • Outliers in event variables: dates that may be wrong as they significantly differ from others
  • Rare list values: modalities of discrete variables that are infrequent and may prevent statistical analyses
  • Misspellings: text values that may be wrongly typed
  • Empty variables: variables that have been created but for which no value has ever been entered
  • Missing values: patients for which one or several variables are missing


Frequent questions about the data cleaner


Are changes reversible?

No. They can not be reversed from the data cleaner. Once you change the value of a data point, there is no cancel button. You can still go back to the patient form and edit the value.

Moreover, every action performed through the data cleaner appears in the audit trail, just like any other edit.

Finally, when you use "Keep value" buttons, you do not change any value in the database.


Can I define manual bounds for numeric variables?

Yes. You can define minimum and maximum values for numeric variables. These bounds will be used as rules for the data cleaner.

In the cleaning report, 2 lines will be displayed: one line for the automated rule, one line for the manual rule set by minimum and maximum values.

After you have set or bounds for a numeric variable, all patients will be reassessed in the data cleaner.


Can I get a cleaning report?

Yes. A cleaning report is generated automatically while you use the application.

It is available on the cleaner dashboard at the bottom of the page.

The report lists:

  • The rules that have been checked by the data cleaner
  • The number of data points that have been assessed
  • The number of data points that you have manually verified
  • The number of data points for which you have changed a value


What happens when I include or exclude patients in the study?

  • Newly included patients will not be analyzed and cleaned until you click on the "Restart analysis" button
  • Excluded patients will be removed from the data cleaner. Rules for outliers and rare values will not be recalculated.


What happens when I add or delete variables?


If you add or delete variables, the data cleaner will be updated accordingly without the need to take any action.


Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article