Solved: counting values using a loop and code instead of repeating a block of ... - Page 3

HB · Posted 03-27-2012 06:07 PM

" If you could print out about 50 rows from have4, there should be a way to get you what you need here"

Like I say, If I knew how to do that, I would. Here's trying.

Okay. Reset.

This code runs without errors::

title ' ';
options nodate nonumber;

data have;
input entity $ con1 $ con2 con3 $;
datalines;
a y 1 d
a y 2 i
a y 3 d
a y 1 d
a n 2 d
a n 3 d
b y 1 di
b y 1 di
b n 1 ii
b n 3 i
b n 3 .
b n 3 .
c y 2 .
c n 2 .
c y 3 di
c n 3 .
d y 1 di
d y 2 ii
d y 3 i
d n 2 i
d y 1 d
d n 2 d
d y 3 .
d n 1 i
;

ODS LISTING CLOSE;
ods output table=have1;

proc tabulate data=have;
class entity con1 con2 con3;
table entity ,
con1 con2 con3;
run;

ods output close;

ODS LISTING;

proc sort data=have1 out=have2 (keep=entity con1-con3 n);
by entity;
run;

data have3 (drop=con2 rename=con4=con2);
set have2;
*con4=put (con2,1.);
length con4 $5;
con4 = left(put(con2,5.));
run;

data have4;
set have3;
array v con1-con3;
array vid $8. id1-id3;
do i=1 to dim(v);
if not missing(v(i)) then
vid(i)=catx('_',vname(v(i)),v(i));
end;
id=coalescec(id1,id2,id3);
run;

ods html body="have4.htm";
proc print;
run;

options missing="";

proc transpose data=have4 out=want (drop=_:) ;
by entity;
var n;
id id;
run;

proc print;
run;

However, the output it makes is wrong and clearly affected by the missing values.

The output is here:

http://momarda.brinkster.net/have4.htm

con1_y should be 4,2,2,5 not 4,2,1,4 and con1_n should be 2,4,2,3 not 2,2,,3.

Cleary for con3_d 5, , ,2 is close to 5,0,0,2 and for con3_d 1,1, ,3 is close to 1,1,0,3.

Okay.

Haikuo · Posted 03-27-2012 07:19 PM

Someone from SI please come in.

Sorry for dropping off the grid, tranffic was really bad today.

This is SOOO ODD!

If you take out con3,

proc tabulate data=have;

class entity con1 con2 ;

table entity ,

con1 con2;

run;

Then number is right.

And if you add con3 on:

proc tabulate data=have;

class entity con1 con2 con3;

table entity ,

con1 con2 con3;

run;

Then number is off. What is going on? Anyone from SI, please ?!

Haikuo

Haikuo · Posted 03-27-2012 08:07 PM

Hooray! I should have RTFM! All we have been missing is the 'missing' option, without which, SAS treats it differently, if it is one of the class variables.

So here is the working version, at least on my computer, SAS 9.2, winxp.

options missing='';

data have;

input entity $ con1 $ con2 con3 $;

datalines;

a y 1 d

a y 2 i

a y 3 d

a y 1 d

a n 2 d

a n 3 d

b y 1 di

b n 1 ii

b n 3 i

b n 3 .

c y 2 .

c n 2 .

c y 3 di

c n 3 .

d y 1 di

d y 2 ii

d y 3 i

d n 2 i

d y 1 d

d n 2 d

d y 3 .

d n 1 i

;

ODS LISTING CLOSE;

ods output table=have1;

proc tabulate data=have missing;

class entity con1 con2 con3;

table entity ,

con1 con2 con3 ;

run;

ods output close;

ODS LISTING;

proc sort data=have1 out=have2 (keep=entity con1-con3 n);

by entity;

run;

data have3 (drop=con2 rename=con4=con2);

set have2;

*con4=put (con2,1.);

length con4 $5;

con4 = left(put(con2,5.));

run;

data have4;

set have3;

array v con1-con3;

array vid $8. id1-id3;

do i=1 to dim(v);

if not missing(v(i)) then

vid(i)=catx('_',vname(v(i)),v(i));

end;

id=coalescec(id1,id2,id3);

if missing(id) then delete;

run;

options missing=0;

proc transpose data=have4 out=want (drop=_:) ;

by entity;

var n;

id id;

run;

proc print;

run;

Reeza · Posted 03-27-2012 04:15 PM

You could make labels that would clean it up in a separate datastep then print that file to excel directly.

If you wanted additional formatting (ie highlighting rows or bold fonts) you could do so using Tagsets.ExcelXP

Tom · Posted 03-27-2012 09:01 PM

Here is one way to do it.

Use SUMMARY to count the different levels.

Use a data step to generate the variable name/ label.

Then transpose to get the desired result.

You might get confusion if the distinct values of a variable yield duplicate variable labels because of use of CATX function.

Note: Edited to provide cleaner version of code.

*----------------------------------------------------------------------;
* Example data ;
*----------------------------------------------------------------------;
data have;
  input entity $ con1 $ con2 con3 $ @@;
cards;
a y 1 d a y 2 i a y 3 d a y 1 d a n 2 d a n 3 d b y 1 di b y 1 di b n 1 ii
b n 3 i b n 3 .  b n 3 .  c y 2 .  c n 2 .  c y 3 di c n 3 .  d y 1 di d y
2 ii d y 3 i d n 2 i d y 1 d d n 2 d d y 3 .  d n 1 i
;
*----------------------------------------------------------------------;
* Set user options in macro variables (first step to creating a macro) ;
*----------------------------------------------------------------------;
%let byvars=entity;
%let varlist=con1-con3;
%let inds=have;
%let outds=want;
%let ncols=1;
*----------------------------------------------------------------------;
* Get the variable names ;
*----------------------------------------------------------------------;
proc transpose data=&inds(obs=0) out=names;
  var &varlist ;
run;
*----------------------------------------------------------------------;
* Get the counts ;
*----------------------------------------------------------------------;
proc summary data=&inds missing chartype descendtypes;
  class &varlist &byvars;
  types (&varlist)*%sysfunc(tranwrd(%sysfunc(compbl(&byvars)),%str( ),*));
  output out=summary ;
run;
*----------------------------------------------------------------------;
* Find variable name based on _TYPE_ variable from PROC SUMMARY ;
* Define _LABEL_ as combinaiton of variable name and value ;
* Define _NAME_ as COL1,.... ;
*----------------------------------------------------------------------;
data summary ;
  set summary end=eof;
  by descending _type_ &varlist;
  index=find(_type_,'1');
  set names point=index ;
  length _label_ $256 ;
  _label_=catx('-',_name_,vvaluex(_name_));
  if find(vvaluex('first.'||_name_),'1') then col+1;
  _name_=cats('col',col);
  if eof then call symputx('ncols',col);
run;
*----------------------------------------------------------------------;
* Re sort by &byvars ;
*----------------------------------------------------------------------;
proc sort data=summary ;
  by &byvars ;
run;
*----------------------------------------------------------------------;
* Transpose;
*----------------------------------------------------------------------;
proc transpose data=summary out=&outds(drop=_name_);
  by &byvars;
  var _freq_;
run;
*----------------------------------------------------------------------;
* Convert missings to zeros. Fix order of variables. ;
*----------------------------------------------------------------------;
data &outds ;
  retain &byvars col1-col&ncols;
  set &outds ;
  array _col_ col1-col&ncols;
  do over _col_ ; _col_=max(0,_col_); end;
run;
*----------------------------------------------------------------------;
* Print using labels ;
*----------------------------------------------------------------------;
proc print data=&outds width=min label;
  id &byvars;
  var col1-col&ncols;
run;

entity con1-n con1-y con2-1 con2-2 con2-3 con3 con3-d con3-di con3-i con3-ii

a 2 4 2 2 2 0 5 0 1 0

b 4 2 3 0 3 2 0 2 1 1

c 2 2 0 2 2 3 0 1 0 0

d 3 5 3 3 2 1 2 1 3 1

Ksharp · Posted 03-28-2012 01:58 AM

How about:

data have;
input entity $ con1 $ con2 con3 $;
datalines;
a y 1 d
a y 2 i
a y 3 d
a y 1 d
a n 2 d
a n 3 d
b y 1 di
b y 1 di
b n 1 ii
b n 3 i
b n 3 .
b n 3 .
c y 2 .
c n 2 .
c y 3 di
c n 3 .
d y 1 di
d y 2 ii
d y 3 i
d n 2 i
d y 1 d
d n 2 d
d y 3 .
d n 1 i
;
run;
data temp(keep=entity name );
 set have;
 length name  $ 10;
  name=catx('_','con1',con1); output;
  name=catx('_','con2',con2); output;
  name=catx('_','con3',con3); output;
run;
proc freq data=temp noprint;
 tables entity*name/list out=x(drop=percent) nocum nopercent;
run;
options missing=0;
proc transpose data=x out=want(drop=_:);
 by entity;
 id name;
 var count;
run;

Ksharp

Haikuo · Posted 03-28-2012 07:41 AM

Ksharp, Slick!!! Thank you sharing! I knew you are gonna bring something for us.

HB, if you have awful a lot of variables, just use some Array, you will need two arrays if you have both character and numeric variables.

Haikuo

Tom · Posted 03-28-2012 08:06 AM

Very neat.

Two issues with this method.

First is limit on length of the generated variable NAME. It can be any length for the PROC FREQ, but needs to be limited to $32 to use as the ID field in the PROC TRANSPOSE.

Second is the need to replicated the lines with the OUTPUT statement for each analysis variable.

You might be able to generate them using macro logic.

Or perhaps with a data step loop using the CALL VNEXT function.

Ksharp · Posted 03-28-2012 10:07 PM

Tom.

For your first question, I have nothing to say, only OP can judge it.

For your second question, As HaiKuo said, That will be very easy, if I used ARRAY + vname() to avoid hard coding.

By the way, TOM. Actually I like the code you wrote very much.

The performace about you is just like another Hall of fame (i.e. data _null_;) ,you both are likely almost.

Please receive my respect for you.

Ksharp

HB · Posted 03-29-2012 10:37 AM

"For your first question, I have nothing to say, only OP can judge it."

The OP only needs some sort of header name to serve as a name for a mail merge field. 32 characters is plenty to work with in this case. No magic meaning in the names. I can see where 32 might be a limit in some cases, but here it is okay.

Regarding the hard coding,. the OP is also an extremely inexperienced SAS programmer and although he understands that variable and loops might be more elegant (and indeed asked for on in the thread title) he in many ways almost prefers hard coding as it helps him see what is going on.

KSharp: At the moment I am trying to make your programming work. I can't follow it all, but I seem to be following it well enough t translate my actual data in. It's going well. I have hopes that with a combo of your stuff, Astounding's output stuff, and Hai.kou's excellent work I may make this work.

Off to mush some more.

Astounding · Posted 03-28-2012 10:11 AM

HB,

You've gotten a lot of good suggestions here. In my view, here's where things stand.

Ksharp's approach looks most workable, because it is PROC FREQ-based. You had mentioned along the way that you might want some percentages instead of counts, where the percentage calculations omit missing values. That's exactly how PROC FREQ works. That would require significant changes to Ksharp's current version, but he (as well as several other posters) would be quite capable of doing that: (1) sorting by ENTITY, (2) running PROC FREQ BY ENTITY, (3) selecting which variables require percentages and which require counts (you would provide the rules on that one), (4) if needed, adjusting percentages from a scale of 0 to 100 to a scale of 0 to 1, (5) reconstructing NAME, (6) reshaping the results so they are ready for PROC PRINT, and (7) getting the PROC PRINT output into Excel-ready format. I know the steps are beyond your own SAS knowledge, but that would be taken care of for you. You would need to supply the rules about which variables require percentages and which require counts. The final PROC PRINT might need to add a VAR statement to group variables appropriately, such as:

var con1_: con2_: con3_:;

Down the road, you would have to watch out for new CON variables that take on unusual values. There could be a problem with the automation of variable names (the column headings), since SAS names are limited to letters, numbers, and underscores. If CON20 takes on values of "+" and "-", SAS would attempt to convert both values to CON20__ since it can't use CON20_+ and CON20_-. You would need to preprocess the data using statements that assign letters/numbers/underscores, such as:

if CON20='+' then CON20='plus';

else if CON20='-' then CON20='minus';

At this point, it appears the ball is in your court. Given the results so far, what are the final specifications?

Good luck.

HB · Posted 03-28-2012 10:54 AM

Jiminey crickets! Go home and go to your kid's band concert and your whole thread explodes. Goodness.

Okay. I am going to mush on this.

Astounding you have been astounding. Thanks.

I'm going to hand out some corrects and helpfuls after I mush and translate.

HB · Posted 03-29-2012 03:55 PM

Okay.

Ksharp swoops in and steals the correct answer designation becasue I could make that code work and modify it to do what I wanted.

Hai.kou gets a helpful for the code and remembering that I needed it as an output file and putting me on the trail of ODS.

Astounding gets a helpful for extreme responsiveness, stick-to-it-ness, and nice-ness.

I crafted a solution that worked and in the process found another solution that worked.

Still haven't solved another problem but I think I will open another thread for that.

Thanks much to all.

Re: counting values using a loop and code instead of repeating a block of sql

Re: counting values using a loop and code instead of repeating a block of sql

Re: counting values using a loop and code instead of repeating a block of sql

counting values using a loop and code instead of repeating a block of sql

Re: counting values using a loop and code instead of repeating a block of sql

Re: counting values using a loop and code instead of repeating a block of sql

Re: counting values using a loop and code instead of repeating a block of sql

Re: counting values using a loop and code instead of repeating a block of sql

counting values using a loop and code instead of repeating a block of sql

counting values using a loop and code instead of repeating a block of sql

counting values using a loop and code instead of repeating a block of sql

counting values using a loop and code instead of repeating a block of sql

counting values using a loop and code instead of repeating a block of sql

Classroom Training Available!