BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lousam
Obsidian | Level 7

Hello,

 

I have multiple SAS data sets that are measuring the same thing across multiple study visits. Unfortunately, the variable names are not the same across the data sets even though they are measuring the same thing (with the exception of ID). The data I have looks something like this:

 

Name of data sets:        Visit_1                                          Visit_2                                       Visit_3

 Variable names:             ID       200          101                   ID        300      102                    ID         400          103

 

Variables 200, 300 and 400 are all measuring whether the visit was attended.

Variables 101,102 and 103 are all measuring the results of a test.

 

I am trying to create one data set that looks something like this:

 

Variable names:      ID       Visit_number      Visit_attended        Results

                                 001           1                          1                          0

                                 001           2                          1                          0

                                 001           3                          1                          1

                                 002           1                          1                          0

                                 002           2                          0                           .

                                 002           3                          1                           0

 

How might I go about solving this issue? 

 

-Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Do you really have a variable named 200? Why? If so then you need use name literal to refer to it in code

'200'n

But I will just assume that it has normal variable name, like VAR200, instead.

Sounds like you want to interleave the observations.  You could add RENAME= dataset options to the input datasets, but why not just leave the names as they are and populate new variables.  Since you are using SET only one of the three variables will have a value for any given observation you could use almost any aggregation function to find the non-missing value.

data want ;
  set visit_1 - visit_3 ;
  by id;
  Visit_attended = coalesce(var200,var300,var400);
  Results = coalesce(var101,var102,var103);
run;

If you do not already have a Visit_number variable then add the INDSNAME= option to the SET statement and add code to calculate Visit_number from the name of the dataset that contributed the observation.

 

View solution in original post

4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

Like this?

 

data WANT;
  set Visit_1(rename=('200'n=VISIT_NUMBER '101'n=VISIT_ATTENDED))
      Visit_2(rename=('300'n=VISIT_NUMBER '102'n=VISIT_ATTENDED))
      Visit_3(rename=('400'n=VISIT_NUMBER '103'n=VISIT_ATTENDED))
  ;
run;
PGStats
Opal | Level 21

Or like this:

data WANT;
  set Visit_1 - visit_3;
  visit_number = coalesce('200'n, '300'n, '400'n);
  visit_attended = coalesce('101'n, '102'n, '103'n);
keep ID visit_number visit_attended;
run;
PG
Tom
Super User Tom
Super User

Do you really have a variable named 200? Why? If so then you need use name literal to refer to it in code

'200'n

But I will just assume that it has normal variable name, like VAR200, instead.

Sounds like you want to interleave the observations.  You could add RENAME= dataset options to the input datasets, but why not just leave the names as they are and populate new variables.  Since you are using SET only one of the three variables will have a value for any given observation you could use almost any aggregation function to find the non-missing value.

data want ;
  set visit_1 - visit_3 ;
  by id;
  Visit_attended = coalesce(var200,var300,var400);
  Results = coalesce(var101,var102,var103);
run;

If you do not already have a Visit_number variable then add the INDSNAME= option to the SET statement and add code to calculate Visit_number from the name of the dataset that contributed the observation.

 

ballardw
Super User

1) Who ever is designing the data entry/collection process needs a headspace and alignment check. If the data points are the same then the names should not change.

2) If the plan ahead of time was to analyze the data in SAS then the use of columns/variable names like "200" should have been avoided.

3) I would ask whoever set this up "What was wrong with using descriptive names like Results  in the first place?"

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2077 views
  • 5 likes
  • 5 in conversation