BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
miladysam
Calcite | Level 5

Hello,

  I have a file that contains multiple years' worth of survey data. I do not have access to the individual years' files, just the merged file. I'm trying to work out a way to determine which questions (each represented by a variable in the file) were asked during which years. I have tried a few approaches, but haven't had any luck. All I really need at this point is a list of variables and their corresponding years. My most recent attempt involved several nested do loops. For example, just looking at the numeric variables in the file:

 

 

data modlastyear ;
	set modlast;
	by syear; *This is the survey year variable;
	
	array allnum(*) _numeric_;

do syear=2000 to syear=2016;
	do i=1 to dim(allnum);
	if not missing (allnum(i)) then do;
	year=syear;
	Variable =vname(allnum(i));
	output;
	end;
	end;
	end;
run;

 

I am currently using 9.4.

 

Edit: Here are a few lines of data:

 

 SAS Output

fruit fruit2 FRUIT1 fruitfmt FRUITJU1 fruitjui SYEAR
3202.4.3062000
1012.5.2032000
2012.3.1022000
1012.5.2022000
2022.3.2032000
 
1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

You could start with:

 

data modlastyear ;
set modlast;
array allnum(*) _numeric_;
do i=1 to dim(allnum);
    if not missing (allnum(i)) then do;
        Variable =vname(allnum(i));
        output;
        end;
    end;
keep syear variable;
run;

proc freq data=modLastYear;
table syear*variable;
run;

(untested)

 

PG

View solution in original post

7 REPLIES 7
Reeza
Super User

Welcome to the SAS Forum. 

 

Your question is relatively clear, but since we're unfamiliar with your data and specifically the data structure we can't make any comments about whether an array/loops are appropriate. 

 

Can you please provide some sample data? The sample data does not need to be real, but it needs to reflect your data structure and complexity as close as possible.For example if you have a series of 50 variables, just include 5, but say your actual situation has 50 variables. 

 

Your code looks like it's transposing the data from a wide to long format which  you don't mention at all in your question. 

 


@miladysam wrote:

Hello,

  I have a file that contains multiple years' worth of survey data. I do not have access to the individual years' files, just the merged file. I'm trying to work out a way to determine which questions (each represented by a variable in the file) were asked during which years. I have tried a few approaches, but haven't had any luck. All I really need at this point is a list of variables and their corresponding years. My most recent attempt involved several nested do loops. For example, just looking at the numeric variables in the file:

 

 

data modlastyear ;
	set modlast;
	by syear; *This is the survey year variable;
	
	array allnum(*) _numeric_;

do syear=2000 to syear=2016;
	do i=1 to dim(allnum);
	if not missing (allnum(i)) then do;
	year=syear;
	Variable =vname(allnum(i));
	output;
	end;
	end;
	end;
run;

 

I am currently using 9.4.


 

AndrewHowell
Moderator

Can you post a few lines of data - it's the data which determines the code.

miladysam
Calcite | Level 5
I've edited my post to include some example data. It's actually a nightmare data set with just over 3200 variables. The variable names have shifted over time with no documentation, so sometimes one question will have multiple variable names...that's something I'll have to sort out later.

For this code I chose, more or less at random, about five variables. I'm running them as a test case rather than testing the code against the whole ginormous data set.
Astounding
PROC Star

You are most of the way there.  First, however, remove the outermost DO loop.  Each observation already contains SYEAR for each observation.  Also to be removed:  year = syear;  (The existing variable SYEAR will be sufficient.)

 

After that, your DATA step can run, then this step can report on which questions were found in each year:

 

proc freq data=modlastyear;

tables syear * variable / list missing;

run;

 

Alternatively, you might want to see the data in this form:

 

proc freq data=modlastyear;

tables variable * syear / list missing;

run;

 

Ignore the counts.  They represent number of survey respondents.  The first columns showing VARIABLE and SYEAR tell you what you want to find out.

PGStats
Opal | Level 21

You could start with:

 

data modlastyear ;
set modlast;
array allnum(*) _numeric_;
do i=1 to dim(allnum);
    if not missing (allnum(i)) then do;
        Variable =vname(allnum(i));
        output;
        end;
    end;
keep syear variable;
run;

proc freq data=modLastYear;
table syear*variable;
run;

(untested)

 

PG
miladysam
Calcite | Level 5

Thanks!

ballardw
Super User

Example data should be 1) What you have now and 2) what you want. The example output you show isn't quite clear as to what it represents.

 

The coding of those variables makes me suspect that the data might be from the BRFSS or something borrowing the questions. IF that is actually BRFSS then you will have to be aware that some of the questions, such as the fruits and vegetables are only asked every other year.

 

Personally I would go to the source of the merged file and convince them to provide the documentation of how the data was combined from the individual year data sets. You don't mention if you have any data labels or not but without some information you may well be guessing about the meanings of ANY of the variables.

 

It is actually best to provide any sample data in the form of a data step so we can replicate exactly what you have. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 752 views
  • 0 likes
  • 6 in conversation