Re: proc transpose vs proc sql

hermanchen · Posted 06-18-2019 04:34 AM

Hi - I have a wide flat dataset with column response rates of binomial questions:

sample#, col1a, col1b, col2a, col2b...colna, colnb

1, 50%,50%,20%,80%,...,0%,100%

2, 0%,100%,80%,20%,...,50%,50%

I would like to run proc freq /chisq to compare samples 1 & 2. Prefer to use proc sql to process two columns at a time (q# A and B) but difficult to get proc freq to accept wide flat dataset because proc freq prefers a transpose format like below?

1,col1a,50%

1,col1b,50%

2,col2a,20%

2,col2b,80%

...

n,colna,0%

n,colnb,100%

my pseudo code with syntax errors below - please help? thanks!

/*compute counts from columns A/B ; save headers for loop*/

proc contents data=work.widefile18 Out=work.headers order=VARNUM;

run;

/*process first question: COLA and COLB*/

proc sql;

CREATE TABLE HEADERNAME AS SELECT VARNUM, NAME FROM work.headers

WHERE (VARNUM in (4,5 ));

%let resultcount=&sqlobs;

QUIT;

/*compute counts*/

%MACRO LOOP2();

proc sql;

select NAME into :COLA from HEADERNAME where VARNUM=1;

select NAME into :COLB from HEADERNAME where VARNUM=2;

quit;

data collab;

set work.widefile18;

count=1*vvaluex(COLA); output;

count=1*vvaluex(COLB); output;

run;%MEND;

/*loop COLA to COLB datastep */

%global i;

do i = 1 to 2;

DATA HEADERNAME;

SET HEADERNAME;

CALL EXECUTE("%LOOP2("||TRIM("sample &i",&COLA,&COLB)||")");

proc freq data=work.collab;

by count;

tables sample*COLA /chisq;

run;

%let i=&i+1;

end;*/

RUN;

PaigeMiller · Posted 06-18-2019 09:27 AM

Prefer to use proc sql to process two columns at a time (q# A and B) but difficult to get proc freq to accept wide flat dataset because proc freq prefers a transpose format like below?

Use PROC TRANSPOSE. It is the right tool for the job. Then you don't have to use complications like macros and CALL EXECUTE.

--
Paige Miller

hermanchen · Posted 06-19-2019 05:57 PM

Hi,

the flatwide dataset (2n)+1 columns. after proc transpose, the narrow dataset is now (2n) x 3 columns like below. COL2 also transposed to have multi-values from just binomial value. How do I specify only 2x2 crosstabulation (COL1 and COL2) in my code:

proc freq data=narrowdata;
weight COL3;
tables COL1*COL2 / noPercent noRow nocol ;

run;

transposed dataset:

1,col1a,50%

1,col1b,50%

2,col1a,20%

2,col1b,80%

...

n,colna,0%

n,colnb,100%

PaigeMiller · Posted 06-19-2019 05:59 PM

What's wrong with the code you show here?

--
Paige Miller

hermanchen · Posted 06-20-2019 09:02 AM

the problem is transposed COL2 has now multi-values containing the flatwide file's headers: question 1A and 2B, question 2A and 2B,..., question 255A and 255B.

Then the chisq output is a 255x255 table rather then 255 2x2 tables that I am looking for.

I am looking for proc freq code that would do something like:

for i = 1 to n questions

create itable select row i, i+1 from narrowdata where sampleno in (1,2)

proc freq data=itable ; tables itables[COL1]*itable[COL2] /weight=COL3 chisq;

end;

I understand the tables statements can process 2x2 tables however narrowdata is in nxn dataset and proc freq has no option to tell it to process every 2 rows by COL1 id, therefore, loop is required?

PaigeMiller · Posted 06-20-2019 10:00 AM

It would be very helpful if you could provide your data using these instructions:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

--
Paige Miller

hermanchen · Posted 06-20-2019 11:32 AM

Ok but how do I create a dummy variable as a class to create 2x2 tables of my narrowdata - syntax error in data steps , incorrectly, creates empty qtype column?

/*create dummy variable by n question type yes/no proportions*/
data narrowfile2;
set narrowfile1;
by sampleno;
retain qtype 1 ;
retain rowcount 1;
if (mod(qtype,2)=1) then do qtype=rowcount; output; end; else do qtype=qtype; rowcount+1; output; end;
run;

hermanchen · Posted 06-21-2019 03:31 PM

Hi Mr. Miller - I ran the macro to generate my example data but the macro didn't quite run/convert my data. I edited but the example should still apply - how do I transpose and create a dummy variable (for the multi-valued tranposed columns) as a class variable?

Note: will need to DROP RATE

Thanks!

data WORK.mydata;
infile datalines dsd truncover dlm=',';
input DBN:$6. Name:$6. Rate:BEST. _1a_:BEST. E:BEST. _1b_:BEST. G:BEST. _1c_:BEST. I:BEST. _1d_:BEST. K:BEST. _1e_:BEST. M:BEST.;
label DBN="DBN" Name="Name" Rate="Total Response Rate" _1a_="1a. How many" E="E" _1b_="1b. How many" G="G" _1c_="1c. How many" I="I" _1d_="1d. How many" K="K" _1e_="1e. How many" M="M";
datalines;
01M539,TECHNO,95,41,59,6,94,29,71,22,78,6,94
30Q300,THE_30,100,19,81,12,88,15,85,23,77,4,96
;

art297 · Posted 06-20-2019 12:34 PM

@hermanchen Based on what you've posted, it's difficult to understand exactly what you're trying to accomplish.

However, that said, I think that the Untranspose macro a group of us published last year will get you on the right track. You can download the paper and macro from: http://www.sascommunity.org/wiki/An_Easier_and_Faster_Way_to_Untranspose_a_Wide_File

Using that macro, given how I interpreted what you might be looking for, I ran the following code:

data have;
  infile cards dlm=',';
  informat col1a col1b col2a col2b col3a col3b percent4.;
  input sample_no col1a col1b col2a col2b col3a col3b;
  cards;
1, 50%,50%,20%,80%,0%,100%
2, 0%,100%,80%,20%,50%,50%
;

%untranspose(data=have, out=want, by=sample_no, id=type,
 id_informat=$1.,var=col1-col3, makelong=yes)

proc sort data=want;
  by sample_no _name_;
run;

proc freq data=want ;
  tables type*_value_/chisq;
  by sample_no _name_;
run;

Hope that helps,

Art, CEO, AnalystFinder.com

hermanchen · Posted 06-21-2019 03:33 PM

Hi Art, this looks promising - will look into untranspose

Thanks!