BookmarkSubscribeRSS Feed
yzyczpy98
Calcite | Level 5

I am currently working with a longitudinal dataset that consists of multiple waves.

The variables in my dataset are named using a pattern like "A1STO," "A2STO," "B1STO," "B2STO," and so on.

I want to keep only the variables from wave 4 onwards (i.e., "A4STO",...,"A10STO," "A11STO," etc.) and remove the variables associated with waves before wave 4.

5 REPLIES 5
Tom
Super User Tom
Super User

Poor choice of a naming convention.  It would have been better to use A01 through A10 as the prefix instead of only switching to two digits when you get past nine.

 

Do you have a data dictionary that lists all of the variables and which wave they are part of?  That would simplify the process.  Perhaps you should just make one.  Even if you had to do it by hand it should not be very hard to make.

 

Say you made a dataset, let's call it DICTIONARY, with variables NAME and WAVE that had values like:

data dictionary ;
  length name $32 wave 8;
  input name wave;
cards;
A1STO 1
A2STO 2
B1STO  1
B2STO 2
A4STO 4
A10STO 10
A11STO 11
;

You could even add other variables to store things like the A or B prefix and the STO suffix.

 

Once you have it you could pull the list of names out for any combination you wanted.

proc sql noprint;
select nliteral(name) into :drop_list separated by ' '
from dictionary
where wave in (1 2 3)
;
quit;

Which you could then use to subset the data:

data want;
  set have(drop=&drop_list);
run;

 

 

ballardw
Super User

You may benefit from reading some of the discussion in this thread:

https://communities.sas.com/t5/New-SAS-User/List-of-variables-based-on-suffix-not-prefix/m-p/886704#...

 

This sort of wide set with information about the time period as part of a variable name is awkward to work with. It may be easier in the long run to reshape the data so you have an one variable that contains the time point information and then the other variables have the meanings of the collected data point.

Then when it comes time to use the data you filter on the timepoint variable instead of attempting to delete poorly named variables.

 

Another learning point. If you use _ to separate parts of variable information, such as A1_topic then you could include the _ as part of a list: Drop A1_:  ; for instance would drop all variables whose names start with A1_ and not touch A10_ variables.

 

I'm not sure but I think there may even be another thread on this forum with the same or similar variable names but I can't find it right now.

yzyczpy98
Calcite | Level 5

The dataset contains thousands of variables, so I cannot create a data dictionary by hand.😨

ballardw
Super User

Proc sql;

   create datadictionary as

  select *

  from dictionary.columns

  where LIBNAME='YOURLIB' and MEMNAME='YOURDATASET'

  ;

quit;

 

Given the correct library name and data set name, in capital letters as that is how the libname and memname are stored in the metadata SAS maintains, the result will be a data set with way more than you expect about YOUR data.

 

Astounding
PROC Star

Here's one way to extract just the variable names that you want.  I prefer using PROC CONTENTS instead of dictionary.columns, but it's no big deal either way.

proc contents data=have noprint out=varnames (keep=name);
run;
data keepvars;
   set varnames;
   wave = input(substr(name, 2), ??2.);
if wave = . then wave = input(substr(name, 2), 1.); if wave < 4 then delete; run;

This gives you a data dictionary with all variable names that you want to keep.  From that point, other posters have already given good suggestions about how you can proceed.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1476 views
  • 1 like
  • 4 in conversation