Solved: Do Loop/Arrays Help

ThomasGeorge · Posted 02-29-2012 10:53 AM

Hopefully, this should be the last change to the requirement i have asked on this post. I would like to transpose the data. I tried proc transpose by ID and SP_MD. But the proc transpose did not work for me. I believe DO Loops and arrays are the best bet, but it has not worked well for me. I worked on the last Do loop and tried modifying what art had posted, but i keep getting error messages. I would like do have the data set set up just like i did on the model sheet the first sheet in the Excel file.

Thanks,

Thomas

Astounding · Posted 02-29-2012 06:47 PM

Thomas,

One step you could take is to eliminate the DATA steps. The WHERE statements could be added to PROC TRANSPOSE.

But I'm a little curious about how you intend to put the data sets together at the end. You can't do what SAS calls a "merge" because you'll have the same variable names in both data sets. One set of variables will overwrite the other. So do you want to combine them side by side (but changing the variable names for MD vs. SP), or do you want to keep the same variable names and create twice as many observations (while keeping SP_vs_MD to differentiate)?

View solution in original post

Reeza · Posted 02-29-2012 10:59 AM

Use proc tranpsose instead of an array.

See a good example w/ code here

http://www.ats.ucla.edu/stat/sas/modules/ltow_transpose.htm

Astounding · Posted 02-29-2012 11:02 AM

Ahh, much better now that you fixed the output structure. This utilizes a neat feature that Art used last week:

data before;

set before;

_name_ = compress( 'st' || put(response,1.) || question, ' .');

run;

proc transpose data=before out=after;

by idnum;

var response;

run;

If you have responses that could be 2 digits long, replace the 1. format with the z2. format.

Good luck.

art297 · Posted 02-29-2012 11:06 AM

or, yet another way of doing it:

data have;

input IDNUM $ Question $ Response;

cards;

001 Q.1. 1

001 Q.2. 3

001 Q.5. 4

002 Q.1. 0

002 Q.3. 3

002 Q.4. 2

003 Q.5. 3

003 Q.1. 0

003 Q.2. 5

;

data temp;

set have;

question=compress(question,'.');

run;

proc transpose data=temp out=want (drop=_:) prefix=st1;

by idnum;

var response;

id question;

idlabel question;

run;

Astounding · Posted 02-29-2012 11:18 AM

Art,

Yes, depending on what is really needed as output I probably should have used:

_name_ = 'st1' || compress(question, ' .');

It's not 100% clear to me whether the new variable names should start with "st1" or whether the 3rd character should vary depending on the response. Given the follow-up information, it should probably remain "st1".

ThomasGeorge · Posted 02-29-2012 11:07 AM

astounding. I appreciate your help. I missed out a critical step. Some of the IDNUM have 3-4 Q.1. How would i average the response in the code?

Tom · Posted 02-29-2012 11:16 AM

It all really depends on what you want to do with the data.

But you can use PROC SUMMARY to create an average of each question.

proc summary data=have nway missing ;

class idnum question ;

var response;

output out=temp mean= ;

run;

Astounding · Posted 02-29-2012 11:20 AM

If you combine our solutions, it should work.

First, apply Tom's summary.

Then use Art's code (or my revised code with the updated assignment for _name_), using the output data set from PROC SUMMARY as the input to PROC TRANSPOSE.

Here's a more complete version:

proc summary data=before nway missing;

class idnum question;

var response;

where question ne 'CanM';

output out=temp (keep=idnum question response) mean=;

run;

data temp;

set temp;

_name_ = 'st1' || compress(question, ' .');

run;

proc transpose data=before out=after;

by idnum;

var response;

run;

art297 · Posted 02-29-2012 11:32 AM

I don't understand why you got duplicate ids, but Astounding's approach should avoid that. How about if you try:

data have;

input IDNUM $ Question $ Response;

cards;

001 Q.1. 1

001 Q.1. 2

001 Q.1. 3

001 Q.2. 3

001 Q.5. 4

002 Q.1. 0

002 Q.3. 3

002 Q.4. 2

002 CanM 4

003 Q.5. 3

003 Q.1. 0

003 Q.2. 5

;

proc summary data=have nway;

class idnum question;

var response;

output out=averages (drop=_:) mean=;

run;

data averages;

set averages;

_name_ = 'st1' || compress(question, ' .');

run;

proc transpose data=averages out=want (drop=_:);

by idnum;

var response;

run;

data want;

retain idnum st1Q1-st1Q5 st1CanM;

set want;

run;

Astounding · Posted 02-29-2012 11:36 AM

Art,

Looks like our solutions are converging.

Those duplicate ids must have come from a test that didn't use PROC SUMMARY first.

ThomasGeorge · Posted 02-29-2012 11:59 AM

astounding i have a question. This program that you and art created only calculates the average, but i need standard deviation too?cos i cannot use the proc means step after this, as it will do an averge over an average.

Astounding · Posted 02-29-2012 12:07 PM

PROC MEANS can generate standard deviations in the output data set, as well as means.

Part of the answer depends on what you want your report to look like. Should the standard deviations form a second row for each IDNUM, or should they form additional columns on the existing row?

Also note, you will have a lot of missing standard deviations when there is only one response.

Astounding · Posted 02-29-2012 12:53 PM

Thomas,

I'm afraid that work beckons, and I won't be able to give you a complete answer here, just a few notes.

First, it should be possible to run all 7 STs at once. You might skip this at first, getting it to work for a single ST as a starting point. But it would certainly be possible to code statements like these at the proper points:

_name_ = 'st' || put(st,1.) || compress(question, ' .');

class st idnum question;

But without attempting that hurdle, first, in the PROC SUMMARY, modify the OUTPUT statement to get both statistics:

output out=temp (keep=idnum question mean std) mean=mean std=std;

Then in the DATA step that is needed anyway, construct the combination variable:

data temp;

set temp;

_name_ = 'st1' || compress(question, ' .');

new_response = put(mean, 3.1) || '+-' || put(std, 3.1);

run;

Then use new_response instead of response within PROC TRANSPOSE.

Good luck.

art297 · Posted 02-29-2012 01:52 PM

Thomas,

There is probably an easier way to accomplish this, but I think that the result at least is what you are looking for:

data have;

input IDNUM $ Question $ Response;

cards;

001 Q.1. 1

001 Q.1. 2

001 Q.1. 3

001 Q.2. 3

001 Q.2. 4

002 Q.1. 0

002 Q.1. 1

002 Q.3. 3

002 Q.3. 2

002 Q.4. 2

002 Q.4. 4

002 CanM 4

002 CanM 5

003 Q.5. 3

003 Q.5. 1

003 Q.1. 0

003 Q.1. 2

003 Q.2. 5

003 Q.2. 7

;

proc summary data=have nway;

class idnum question;

var response;

output out=averages (drop=_:) mean= std= /autoname;

run;

data averages;

set averages;

_name_ = 'sm1' || compress(question, ' .');

run;

proc transpose data=averages (rename=(response_mean=response))

out=part1 (drop=_:);

by idnum;

var response;

run;

data averages;

set averages;

_name_ = 'sd1' || compress(question, ' .');

run;

proc transpose data=averages

out=part2 (drop=_:);

by idnum;

var response_stddev;

run;

proc sql noprint;

select 'st'||trim(substr(name,3))||'='||

'catt(put('||trim(name)||',4.2),"+",put(sd'||

trim(substr(name,3))||',4.2));',

'if strip(st'||trim(substr(name,3))||

')=".+ ." then call missing('||

'st'||trim(substr(name,3))||');',

'length st'||trim(substr(name,3))||' $10;'

into :calcs separated by ' ',

:miss separated by ' ',

:lengths separated by ' '

from dictionary.columns

where libname="WORK" and

memname="PART1" and

name ne "IDNUM"

;

quit;

data want (drop=sm: sd:);

&lengths.;

set part1;

set part2 (drop=idnum);

&calcs.;

&miss.;

run;

art297 · Posted 02-29-2012 12:29 PM

Thomas,

If you look at the code I sent, last, the creation of _name_ was in a different place:

proc summary data=have nway;

class idnum question;

var response;

output out=averages (drop=_:) mean=;

run;

data averages;

set averages;

_name_ = 'st1' || compress(question, ' .');

run;

proc transpose data=averages out=want (drop=_:);

by idnum;

var response;

run;

data want;

retain idnum st1Q1-st1Q5 st1CanM;

set want;

run;

I'll take a look at your latest need now.

Registration is open

SAS Training: Just a Click Away