Selecting Alternating Sets of Variables

Mgarret · Posted 01-06-2012 08:45 PM

Hi all--

A have a series of datasets with variables that start with a prefix such as Q30_ , Q41_ , Q52_. The variables all have a value tagged onto the end. Example: Q30_ 1, Q30_ 2, Q30_ 3 and so on. The amount of varibles will be different in each dataset -- so each dataset can be different depending on how many variables with a certain prefix are generated.

The dataset could look like this:

Q30_1	Q30_2	Q30_3	Q30_4	Q30_5	Q41_1	Q41_2	Q41_3

Or the dataset could look like:

Q30_1	Q30_2	Q30_3	Q41_1	Q41_2	Q41_3	Q41_4	Q41_5	Q41_6	Q41_7

I need to merge these variables with prefixes into other datasets.

Example: For Dataset A, I need to bring in all the variables which start with Q52_, Q3_ and Q17_ from Dataset B which still need to be in numerical order by the value they have tagged on to the prefix. And then I need to bring in the variables LastName, Agency and Status from Dataset C.

So the Final Dataset could look like this:

FirstName	LastName	Q52_1	Q52_2	Q52_3	Agency	Q3_1	Q3_2	Q3_3	Q3_4	Status	Q17_1	Q17_2	Q17_3	Q17_4

The amount of variables that can be brought in could be different every time. So in another situation Q52_ goes up to Q52_14 and Q17_ only goes up to Q17_2

Can a create an array using each variable Prefix (im not sure how to do that)? Is there a way to select only the variables that start with "Q30_" and then select only the variables which start with "Q55_" and so on ?

Any assistance is greatly appreciated. Thanks.

Linlin · Posted 01-06-2012 08:55 PM

Yes, you can.

example:

data have;

input q30_1 q30_2 q30_3 aa bb;

cards;

1 2 3 4 5

;

data want;

set have (keep=q30_:);

run;

proc print;run;

Mgarret · Posted 01-06-2012 09:25 PM

Thanks. I have tried that and it does not seem to work with multiple questions.

Linlin · Posted 01-06-2012 10:21 PM

Using the sample data created by Patrick:

data have_A;

id=1;

array Q3_ {5} 8. (5*2);

array Q17_ {9} 8. (9*20);

array Q41_ {10} 8. (10*1);

array Q52_ {10} 8. (10*1);

someOtherVar=0;

output;

run;

data have_B;

id=1;

FirstName='fn';

LastName='ln';

Agency='A';

Status='1';

run;

data want;

merge have_a have_b;

by id;

run;

proc sql noprint;

select name into :q3 separated by ' '

from dictionary.columns

where libname='WORK' and memname='WANT' and upcase(substr(name,1,3))='Q3_'

order by name;

select name into :q17 separated by ' '

from dictionary.columns

where libname='WORK' and memname='WANT' and upcase(substr(name,1,3))='Q17'

order by name;

select name into :q41 separated by ' '

from dictionary.columns

where libname='WORK' and memname='WANT' and upcase(substr(name,1,3))='Q41'

order by name;

select name into :q52 separated by ' '

from dictionary.columns

where libname='WORK' and memname='WANT' and upcase(substr(name,1,3))='Q52'

order by name;

quit;

data want;

retain firstname lastname &q52 agency &q3 status &q52 &q41;

set want(drop=id);

run;

Patrick · Posted 01-06-2012 09:24 PM

Not the most efficient way from a processing point of view - but easy to code.

For the code to work there must always be at least one variabe Q3_..., Q17_... and Q52_... If this is not the case then an approach querying the dictionary tables and creating a attrib statement (or a SQL Select) would be an option. There are already heaps of such examples in the forums as you are not the first who wants variables in a specific order in a dataset.

data have_A;
id=1;
array Q3_ {5} 8. (5*2);
array Q17_ {9} 8. (9*20);
array Q41_ {10} 8. (10*1);
array Q52_ {10} 8. (10*1);
someOtherVar=0;
output; output;
run;

data have_B;
id=1;
FirstName='fn';
LastName='ln';
Agency='A';
Status='1';
run;

data want;
merge
    have_B (keep=ID FirstName LastName)
    have_A (keep=ID Q52_:)
    have_B (keep=ID Agency)
    have_A (keep=ID Q3_:)
    have_B (keep=ID Status)
    have_A (keep=ID Q17_:)
    ;
by id;
run;

proc print data=want;
run;

Mgarret · Posted 01-06-2012 09:58 PM

Thanks you.

Question:

I'm not sure what the parts in the code ive bolded mean.....

array Q3_ {5} 8. (5*2);

array Q17_ {9} 8. (9*20);

array Q41_ {10} 8. (10*1);

array Q52_ {10} 8. (10*1);

Patrick · Posted 01-06-2012 10:02 PM

That's just the part creating some sample data.

If you look up in the docu the array statement you'll find there that this is a way to create variables and assign them a value (5*2: 5 array elements getting value of 2 assigned).