BookmarkSubscribeRSS Feed
hermanchen
Fluorite | Level 6
Hi -  I have a wide flat dataset with column response rates of binomial questions:
 
sample#, col1a, col1b, col2a, col2b...colna, colnb
 
1, 50%,50%,20%,80%,...,0%,100%
 
2, 0%,100%,80%,20%,...,50%,50%
 
 
 
I would like to run proc freq /chisq to compare samples 1 & 2. Prefer to use proc sql to process two columns at a time (q# A and B) but difficult to get proc freq to accept wide flat dataset because proc freq prefers a transpose format like below?
 
1,col1a,50%
 
1,col1b,50%
 
2,col2a,20%
 
2,col2b,80%
 
...
 
n,colna,0%
 
n,colnb,100%
 
 
 
my pseudo code with syntax errors below - please help? thanks!
 
 
 
/*compute counts from columns A/B ; save headers for loop*/
 
proc contents data=work.widefile18 Out=work.headers order=VARNUM;
run;
 
/*process first question: COLA and COLB*/
proc sql; 
CREATE TABLE HEADERNAME AS SELECT VARNUM, NAME FROM work.headers 
WHERE (VARNUM in (4,5 ));
%let resultcount=&sqlobs;
QUIT;
 
 
/*compute counts*/
%MACRO LOOP2();
proc sql;
select NAME into :COLA from HEADERNAME where VARNUM=1;
select NAME into :COLB from HEADERNAME where VARNUM=2;
quit;
data collab;
set work.widefile18;
count=1*vvaluex(COLA); output;
count=1*vvaluex(COLB); output;
run;%MEND;
 
 
/*loop COLA to COLB datastep */
%global i;
do i = 1 to 2;
DATA HEADERNAME;
SET HEADERNAME;
  
CALL EXECUTE("%LOOP2("||TRIM("sample &i",&COLA,&COLB)||")");
proc freq data=work.collab;
by count;
tables sample*COLA /chisq;
run; 
 
%let i=&i+1;
 
end;*/
RUN;

 

9 REPLIES 9
PaigeMiller
Diamond | Level 26

Prefer to use proc sql to process two columns at a time (q# A and B) but difficult to get proc freq to accept wide flat dataset because proc freq prefers a transpose format like below?

 

Use PROC TRANSPOSE. It is the right tool for the job. Then you don't have to use  complications like macros and CALL EXECUTE.

--
Paige Miller
hermanchen
Fluorite | Level 6

Hi,

the flatwide dataset (2n)+1 columns.  after proc transpose, the narrow dataset is now (2n) x 3 columns like below.  COL2 also transposed to have multi-values from just binomial value.  How do I specify only 2x2 crosstabulation (COL1 and COL2) in my code:


proc freq data=narrowdata;
weight COL3;
tables COL1*COL2 / noPercent noRow nocol ;

run;

 

 

transposed dataset:

1,col1a,50%
 
1,col1b,50%
 
2,col1a,20%
 
2,col1b,80%
 
...
 
n,colna,0%
 
n,colnb,100%
PaigeMiller
Diamond | Level 26

What's wrong with the code you show here?

--
Paige Miller
hermanchen
Fluorite | Level 6

the problem is transposed COL2 has now multi-values containing the flatwide file's headers: question 1A and 2B, question 2A and 2B,..., question 255A and 255B.

Then the chisq output is a 255x255 table rather then 255 2x2 tables that I am looking for.

I am looking for proc freq code that would do something like:

 

for i = 1 to n questions

  create itable select row i, i+1 from narrowdata where sampleno in (1,2)

  proc freq data=itable ; tables itables[COL1]*itable[COL2] /weight=COL3 chisq;

end;

 

I understand the tables statements can process 2x2 tables however narrowdata is in nxn dataset and proc freq has no option to tell it to process every 2 rows by COL1 id, therefore, loop is required?

PaigeMiller
Diamond | Level 26

It would be very helpful if you could provide your data using these instructions:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

 

--
Paige Miller
hermanchen
Fluorite | Level 6

Ok but how do I create a dummy variable as a class to create 2x2 tables of my narrowdata - syntax error in data steps , incorrectly, creates empty qtype column?

 

/*create dummy variable by n question type yes/no proportions*/
data narrowfile2;
set narrowfile1;
by sampleno;
retain qtype 1 ;
retain rowcount 1;
if (mod(qtype,2)=1) then do qtype=rowcount; output; end; else do qtype=qtype; rowcount+1; output; end;
run;

hermanchen
Fluorite | Level 6

Hi Mr. Miller - I ran the macro to generate my example data but the macro didn't quite run/convert my data.  I edited but the example should still apply - how do I transpose and create a dummy variable (for the multi-valued tranposed columns) as a class variable?

Note: will need to DROP RATE 

 

Thanks!

 

 

data WORK.mydata;
infile datalines dsd truncover dlm=',';
input DBN:$6. Name:$6. Rate:BEST. _1a_:BEST. E:BEST. _1b_:BEST. G:BEST. _1c_:BEST. I:BEST. _1d_:BEST. K:BEST. _1e_:BEST. M:BEST.;
label DBN="DBN" Name="Name" Rate="Total Response Rate" _1a_="1a. How many" E="E" _1b_="1b. How many" G="G" _1c_="1c. How many" I="I" _1d_="1d. How many" K="K" _1e_="1e. How many" M="M";
datalines;
01M539,TECHNO,95,41,59,6,94,29,71,22,78,6,94
30Q300,THE_30,100,19,81,12,88,15,85,23,77,4,96
;

art297
Opal | Level 21

@hermanchen  Based on what you've posted, it's difficult to understand exactly what you're trying to accomplish.

 

However, that said, I think that the Untranspose macro a group of us published last year will get you on the right track. You can download the paper and macro from: http://www.sascommunity.org/wiki/An_Easier_and_Faster_Way_to_Untranspose_a_Wide_File

 

Using that macro, given how I interpreted what you might be looking for, I ran the following code:

 

data have;
  infile cards dlm=',';
  informat col1a col1b col2a col2b col3a col3b percent4.;
  input sample_no col1a col1b col2a col2b col3a col3b;
  cards;
1, 50%,50%,20%,80%,0%,100%
2, 0%,100%,80%,20%,50%,50%
;

%untranspose(data=have, out=want, by=sample_no, id=type,
 id_informat=$1.,var=col1-col3, makelong=yes)

proc sort data=want;
  by sample_no _name_;
run;

proc freq data=want ;
  tables type*_value_/chisq;
  by sample_no _name_;
run;

Hope that helps,

Art, CEO, AnalystFinder.com

 

hermanchen
Fluorite | Level 6

Hi Art, this looks promising - will look into untranspose

Thanks!