Hello,
I have multiple SAS data sets that are measuring the same thing across multiple study visits. Unfortunately, the variable names are not the same across the data sets even though they are measuring the same thing (with the exception of ID). The data I have looks something like this:
Name of data sets: Visit_1 Visit_2 Visit_3
Variable names: ID 200 101 ID 300 102 ID 400 103
Variables 200, 300 and 400 are all measuring whether the visit was attended.
Variables 101,102 and 103 are all measuring the results of a test.
I am trying to create one data set that looks something like this:
Variable names: ID Visit_number Visit_attended Results
001 1 1 0
001 2 1 0
001 3 1 1
002 1 1 0
002 2 0 .
002 3 1 0
How might I go about solving this issue?
-Thank you
Do you really have a variable named 200? Why? If so then you need use name literal to refer to it in code
'200'n
But I will just assume that it has normal variable name, like VAR200, instead.
Sounds like you want to interleave the observations. You could add RENAME= dataset options to the input datasets, but why not just leave the names as they are and populate new variables. Since you are using SET only one of the three variables will have a value for any given observation you could use almost any aggregation function to find the non-missing value.
data want ;
set visit_1 - visit_3 ;
by id;
Visit_attended = coalesce(var200,var300,var400);
Results = coalesce(var101,var102,var103);
run;
If you do not already have a Visit_number variable then add the INDSNAME= option to the SET statement and add code to calculate Visit_number from the name of the dataset that contributed the observation.
Like this?
data WANT;
set Visit_1(rename=('200'n=VISIT_NUMBER '101'n=VISIT_ATTENDED))
Visit_2(rename=('300'n=VISIT_NUMBER '102'n=VISIT_ATTENDED))
Visit_3(rename=('400'n=VISIT_NUMBER '103'n=VISIT_ATTENDED))
;
run;
Or like this:
data WANT;
set Visit_1 - visit_3;
visit_number = coalesce('200'n, '300'n, '400'n);
visit_attended = coalesce('101'n, '102'n, '103'n);
keep ID visit_number visit_attended;
run;
Do you really have a variable named 200? Why? If so then you need use name literal to refer to it in code
'200'n
But I will just assume that it has normal variable name, like VAR200, instead.
Sounds like you want to interleave the observations. You could add RENAME= dataset options to the input datasets, but why not just leave the names as they are and populate new variables. Since you are using SET only one of the three variables will have a value for any given observation you could use almost any aggregation function to find the non-missing value.
data want ;
set visit_1 - visit_3 ;
by id;
Visit_attended = coalesce(var200,var300,var400);
Results = coalesce(var101,var102,var103);
run;
If you do not already have a Visit_number variable then add the INDSNAME= option to the SET statement and add code to calculate Visit_number from the name of the dataset that contributed the observation.
1) Who ever is designing the data entry/collection process needs a headspace and alignment check. If the data points are the same then the names should not change.
2) If the plan ahead of time was to analyze the data in SAS then the use of columns/variable names like "200" should have been avoided.
3) I would ask whoever set this up "What was wrong with using descriptive names like Results in the first place?"
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.