Variables Listed in Observations

Reply
Occasional Contributor
Posts: 10

Variables Listed in Observations

Hi All, I was hoping you might be able to help me out with something I'm trying to do.

First, I'll explain what I'm trying to do. Every year, I receive a dataset with thousands of variables as well as updated datasets of previous years. I am only interested in about 50-60 of those variables, so I keep the ones I need and drop the rest. However, every year , many of the variables change, and of the variables in my list, some are broken up into smaller components that I then need to include in my list and aggregate to the original variable. So every year, my list of variables changes.

I would like to make it so I have to write as little code each year as possible. My current solution is to put together a list of all the variables, where each observation is a variable name, including ones that are discontinued in later years. I check each year's dataset and try to pull each variable. If the variable doesn't exist in the dataset, a conditional is triggered and the variable is calculated using its component variables.

So, basically 2 questions:

1. Does this approach make sense? Does anyone else have a better idea of how to go about this?

2. Does anyone have an idea how to go about coding this? Right now, I'm looping through the variables in each observation of my variable list, determining if they exist in the dataset, and adding them to a macro variable list of variables to keep. It was the first idea I came up with and seems a little convoluted.

Thanks in advance for your help!

Super User
Posts: 11,343

Re: Variables Listed in Observations

Posted in reply to united_mdm

Something like this should give you the variables that don't match. The library and datasets names must be in all capital letters as that is the way the data is stored in the dictionary table. This checks the data type in case the variable of the same name is numeric in one set and character in the other.

proc sql;

     create table unmatchedvars as

     select name,type from (select * from dictionary.columns where libname='YOURLIBRARY'

                                and memname='YOUROLDDATASETNAME')

     except

     select name,type from (select * from dictionary.columns where libname='YOURLIBRARY'

                                and memname='YOURNEWDATASETNAME');

quit;

Ask a Question
Discussion stats
  • 1 reply
  • 142 views
  • 3 likes
  • 2 in conversation