Create one analysis-ready variable from a multilingual ePRO question

Modified on Fri, 20 Feb at 10:27 AM

When an ePRO questionnaire is translated, a common setup is to store answers in one variable per language (e.g., English, French, German). That works well for data collection, but it makes analysis harder because your export contains multiple sparse columns.

With first_numeric_value() and map_numeric_to_list(), you can create one single variable that is always filled (when the patient answered) and always returned in one reference language for statistics.


Real-world example: “Current smoking status”

In clinical research ePROs, questions like smoking status are typically collected as List variables:

Question shown to patients (translated):

  • EN: “Do you currently smoke?”

  • FR: “Fumez-vous actuellement ?”

  • DE: “Rauchen Sie aktuell?”

You collect the answer as one variable per language:

  • Smoking_EN (List)

  • Smoking_FR (List)

  • Smoking_DE (List)

Each patient answers in their language, so only one of these variables is filled.

Your goal is to produce a single variable for analysis, whatever the language in which patient has answered:

  • Smoking (List in English)


Step 1: Extract the numeric code from the first answered language

  1. Create a new List variable named "Smoking" 
  2. Create values for this List variable (eg. "Never smoked", "Stopped smoking", "Currently smoking")
  3. Assign a numeric value to each of these values (eg. "0", "1", "2")
  4. Create this formula for the variable:
first_numeric_value(Smoking_EN, Smoking_FR, Smoking_DE)

What it does:

  • Reads the variables in order

  • Takes the first non-null answer

  • Returns its numeric value as a Numeric (0, 1 or 2 in our example)


Step 2: Map that code to the reference language list (English)

Edit the formula by adding the map_numeric_to_list() function to the formula:

Smoking = map_numeric_to_list( first_numeric_value(Smoking_EN, Smoking_FR, Smoking_DE), Smoking_EN )

What this gives you:

  • Smoking is a List variable

  • Its value is always expressed using the English labels (because Smoking_EN is the reference)

  • You can now analyze/export one clean column instead of three sparse columns


Important requirements and expected behavior

  • Your language-specific List variables must represent the same modalities, with the same numeric values.
    For example, if “Stopped smoking” = 1 in Smoking_EN, then the French and German equivalents must also have numeric value 1.
    Labels can differ by language, but the numeric values must stay identical.

  • first_numeric_value() behavior

    • Uses the first non-null variable based on argument order

    • Returns a Numeric code (0, 1, 2, …) extracted from the answered variable

    • If all inputs are null → returns null (shown as “NC” in the eCRF)

  • map_numeric_to_list() behavior

    • If the numeric code exists in the reference list → returns the corresponding List value (with the reference language label)

    • If the numeric code does not exist in the reference list → returns null (shown as “NC” in the eCRF)

    • If the numeric input is null → returns null


Why this helps

Instead of exporting multiple columns:

  • Smoking_EN, Smoking_FR, Smoking_DE

…you export and analyze one analysis-ready variable:

  • Smoking

This keeps the ePRO behavior unchanged, while making your dataset immediately ready for statistical analysis.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article