Hello,
I have a file that contains multiple years' worth of survey data. I do not have access to the individual years' files, just the merged file. I'm trying to work out a way to determine which questions (each represented by a variable in the file) were asked during which years. I have tried a few approaches, but haven't had any luck. All I really need at this point is a list of variables and their corresponding years. My most recent attempt involved several nested do loops. For example, just looking at the numeric variables in the file:
data modlastyear ;
set modlast;
by syear; *This is the survey year variable;
array allnum(*) _numeric_;
do syear=2000 to syear=2016;
do i=1 to dim(allnum);
if not missing (allnum(i)) then do;
year=syear;
Variable =vname(allnum(i));
output;
end;
end;
end;
run;
I am currently using 9.4.
Edit: Here are a few lines of data:
SAS Output
320 | 2 | . | 4 | . | 306 | 2000 |
101 | 2 | . | 5 | . | 203 | 2000 |
201 | 2 | . | 3 | . | 102 | 2000 |
101 | 2 | . | 5 | . | 202 | 2000 |
202 | 2 | . | 3 | . | 203 | 2000 |
You could start with:
data modlastyear ;
set modlast;
array allnum(*) _numeric_;
do i=1 to dim(allnum);
if not missing (allnum(i)) then do;
Variable =vname(allnum(i));
output;
end;
end;
keep syear variable;
run;
proc freq data=modLastYear;
table syear*variable;
run;
(untested)
Welcome to the SAS Forum.
Your question is relatively clear, but since we're unfamiliar with your data and specifically the data structure we can't make any comments about whether an array/loops are appropriate.
Can you please provide some sample data? The sample data does not need to be real, but it needs to reflect your data structure and complexity as close as possible.For example if you have a series of 50 variables, just include 5, but say your actual situation has 50 variables.
Your code looks like it's transposing the data from a wide to long format which you don't mention at all in your question.
@miladysam wrote:
Hello,
I have a file that contains multiple years' worth of survey data. I do not have access to the individual years' files, just the merged file. I'm trying to work out a way to determine which questions (each represented by a variable in the file) were asked during which years. I have tried a few approaches, but haven't had any luck. All I really need at this point is a list of variables and their corresponding years. My most recent attempt involved several nested do loops. For example, just looking at the numeric variables in the file:
data modlastyear ; set modlast; by syear; *This is the survey year variable; array allnum(*) _numeric_; do syear=2000 to syear=2016; do i=1 to dim(allnum); if not missing (allnum(i)) then do; year=syear; Variable =vname(allnum(i)); output; end; end; end; run;
I am currently using 9.4.
Can you post a few lines of data - it's the data which determines the code.
You are most of the way there. First, however, remove the outermost DO loop. Each observation already contains SYEAR for each observation. Also to be removed: year = syear; (The existing variable SYEAR will be sufficient.)
After that, your DATA step can run, then this step can report on which questions were found in each year:
proc freq data=modlastyear;
tables syear * variable / list missing;
run;
Alternatively, you might want to see the data in this form:
proc freq data=modlastyear;
tables variable * syear / list missing;
run;
Ignore the counts. They represent number of survey respondents. The first columns showing VARIABLE and SYEAR tell you what you want to find out.
You could start with:
data modlastyear ;
set modlast;
array allnum(*) _numeric_;
do i=1 to dim(allnum);
if not missing (allnum(i)) then do;
Variable =vname(allnum(i));
output;
end;
end;
keep syear variable;
run;
proc freq data=modLastYear;
table syear*variable;
run;
(untested)
Thanks!
Example data should be 1) What you have now and 2) what you want. The example output you show isn't quite clear as to what it represents.
The coding of those variables makes me suspect that the data might be from the BRFSS or something borrowing the questions. IF that is actually BRFSS then you will have to be aware that some of the questions, such as the fruits and vegetables are only asked every other year.
Personally I would go to the source of the merged file and convince them to provide the documentation of how the data was combined from the individual year data sets. You don't mention if you have any data labels or not but without some information you may well be guessing about the meanings of ANY of the variables.
It is actually best to provide any sample data in the form of a data step so we can replicate exactly what you have. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.