DATA Step, Macro, Functions and more

DATASTEP summary

Accepted Solution Solved
Reply
Contributor
Posts: 47
Accepted Solution

DATASTEP summary

Hello, I need your help.

I have a SAS dataset that looks like this

documentation_invalid, documentation_vv,occupancy_invalid,occupancy_vv,property_invalid,property_vv,purpose_invalid,purpose_vv

0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",1,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"

0,"FULL, SUB, VOI/VOA, NODOC, QUICK",1,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"

0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"

0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"

1,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",1,"PURCHASE, CASHOUT, RATE/TERM"

0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"

I want to summarize the dataset like this what do I need to do?

documentation_invalid, documentation_vv,occupancy_invalid,occupancy_vv,property_invalid,property_vv,purpose_invalid,purpose_vv

1,"FULL, SUB, VOI/VOA, NODOC, QUICK",1,"OWNER, 2NDHOME, INVESTOR",1,"SF, MF, TOWNHOUSE, CONDO",1,"PURCHASE, CASHOUT, RATE/TERM"


Accepted Solutions
Solution
‎08-15-2013 05:58 AM
Super Contributor
Posts: 297

Re: DATASTEP summary

Another solution:

ODS OUTPUT ONEWAYFREQS=WORK.OWF (DROP = F_: RENAME = (FREQUENCY = COUNT));

PROC FREQ DATA=HAVE;

  BY DOCUMENTATION_VV OCCUPANCY_VV PROPERTY_VV PURPOSE_VV;

  TABLES DOCUMENTATION_INVALID OCCUPANCY_INVALID PROPERTY_INVALID PURPOSE_INVALID /NOCUM NOPERCENT ;

RUN;

ODS OUTPUT CLOSE;

DATA TRANSFORM (KEEP = VAL_INVAL VARIABLE VALID_VALUES DOCUMENTATION_VV COUNT);

  LENGTH  VAL_INVAL $7. VARIABLE $21.;

  VAL_INVAL = "VALID";

  SET OWF;

  BY TABLE;

  IF DOCUMENTATION_INVALID = 1 THEN VAL_INVAL = "INVALID";

  ELSE IF OCCUPANCY_INVALID = 1 THEN VAL_INVAL = "INVALID";

  ELSE IF PROPERTY_INVALID = 1 THEN VAL_INVAL = "INVALID";

  ELSE IF PURPOSE_INVALID = 1 THEN VAL_INVAL = "INVALID";

  VARIABLE = SCAN(TABLE,-1);

  IF VARIABLE = "DOCUMENTATION_INVALID" THEN VALID_VALUES = DOCUMENTATION_VV;

  ELSE IF VARIABLE = "OCCUPANCY_INVALID" THEN VALID_VALUES = OCCUPANCY_VV;

  ELSE IF VARIABLE = "PROPERTY_INVALID" THEN VALID_VALUES = PROPERTY_VV;

  ELSE IF VARIABLE = "PURPOSE_INVALID" THEN VALID_VALUES = PURPOSE_VV;

  OUTPUT;

  IF FIRST.TABLE THEN TOTAL = COUNT;

    ELSE TOTAL+COUNT;

  IF LAST.TABLE  THEN DO;

    COUNT = TOTAL;

    VAL_INVAL = "TOTAL";

    OUTPUT;

  END;

RUN;

PROC TRANSPOSE DATA = TRANSFORM OUT=TRANS (DROP = DOCUMENTATION_VV _Smiley Happy SUFFIX=_COUNT;

  ID VAL_INVAL ;

  BY VARIABLE VALID_VALUES DOCUMENTATION_VV NOTSORTED;

  VAR COUNT;

RUN;

View solution in original post


All Replies
Super User
Posts: 19,772

Re: DATASTEP summary

What do you want your summary to look like?

Super Contributor
Posts: 339

Re: DATASTEP summary

proc sql;

     select max(documentation_invalid), documentation_vv, max(occupancy_invalid), occupancy_vv, max(property_invalid), property_vv, max(purpose_invalid), purpose_vv

     from have

     group by documentation_vv, occupancy_vv, property_vv, purpose_vv;

quit;

you could use sum instead of max if you would want to count the number of time a row was "invalid" for your group instead of just figuring out if it ever was. It depends on what the data represents and what you ought to do.

Vince

Contributor
Posts: 47

Re: DATASTEP summary


Reeza,

Here is how I want the summary to look like

Varialble                                   total_count    valid_count        invalid_count    valid_values

documentation_invalid                 6                          5                              1               "FULL, SUB, VOI/VOA, NODOC, QUICK"

occupancy_invalid                      6                          5                              1                   "OWNER, 2NDHOME, INVESTOR"

property_invalid                          6                           5                              1                    SF, MF, TOWNHOUSE, CONDO"

purpose_invalid                          6                          5                              1                    "PURCHASE, CASHOUT, RATE/TERM"

Super Contributor
Posts: 307

Re: DATASTEP summary

In the sample of desired output above, you have a column for valid_values. Does your data have more than one non-unique value for "valid_values" for a given crossing?

For example, are there values other than "FULL, SUB, VOI/VOA, NODOC, QUICK" for the row "documentation_invalid"? (I realize that your sample data only has one unique value for each proposed crossing, but actual data may not).

Super Contributor
Posts: 307

Re: DATASTEP summary

This works, assuming only one possible valid_value for each crossing of Variable (as per your desired output).

data have ;
infile datalines dlm=',' dsd;
informat documentation_vv occupancy_vv property_vv purpose_vv $50.;
input documentation_invalid documentation_vv $ occupancy_invalid occupancy_vv $ property_invalid property_vv $ purpose_invalid purpose_vv $ ;
datalines ;
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",1,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",1,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
1,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",1,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
;
;;;;

/* transpose numeric variables */
proc transpose data=have out=counts name=Variable;
run ;

/* cleanup */

data counts  ;
set counts ;
drop col1-col6;
tot_count = n ( of col1-col6 );
valid_count = sum ( of col1-col6 ) ;
invalid_count = tot_count - valid_count ;
run ;

/* transpose character vars */
proc transpose data=have out=valid_values name=Variable;
var documentation_vv occupancy_vv property_vv purpose_vv ;
run ;

/* cleanup */

data valid_values ;
set valid_values ;
drop col2-col6;
rename col1=valid_values;
run ;

/* merge */
proc sql ;
create table want as
select t1.*, t2.valid_values
from counts t1
  , valid_values t2
where substr ( t1.variable , 1, 8 ) = substr ( t2.variable , 1, 8 )
;
quit;

Message was edited by: Michael McCormick

Solution
‎08-15-2013 05:58 AM
Super Contributor
Posts: 297

Re: DATASTEP summary

Another solution:

ODS OUTPUT ONEWAYFREQS=WORK.OWF (DROP = F_: RENAME = (FREQUENCY = COUNT));

PROC FREQ DATA=HAVE;

  BY DOCUMENTATION_VV OCCUPANCY_VV PROPERTY_VV PURPOSE_VV;

  TABLES DOCUMENTATION_INVALID OCCUPANCY_INVALID PROPERTY_INVALID PURPOSE_INVALID /NOCUM NOPERCENT ;

RUN;

ODS OUTPUT CLOSE;

DATA TRANSFORM (KEEP = VAL_INVAL VARIABLE VALID_VALUES DOCUMENTATION_VV COUNT);

  LENGTH  VAL_INVAL $7. VARIABLE $21.;

  VAL_INVAL = "VALID";

  SET OWF;

  BY TABLE;

  IF DOCUMENTATION_INVALID = 1 THEN VAL_INVAL = "INVALID";

  ELSE IF OCCUPANCY_INVALID = 1 THEN VAL_INVAL = "INVALID";

  ELSE IF PROPERTY_INVALID = 1 THEN VAL_INVAL = "INVALID";

  ELSE IF PURPOSE_INVALID = 1 THEN VAL_INVAL = "INVALID";

  VARIABLE = SCAN(TABLE,-1);

  IF VARIABLE = "DOCUMENTATION_INVALID" THEN VALID_VALUES = DOCUMENTATION_VV;

  ELSE IF VARIABLE = "OCCUPANCY_INVALID" THEN VALID_VALUES = OCCUPANCY_VV;

  ELSE IF VARIABLE = "PROPERTY_INVALID" THEN VALID_VALUES = PROPERTY_VV;

  ELSE IF VARIABLE = "PURPOSE_INVALID" THEN VALID_VALUES = PURPOSE_VV;

  OUTPUT;

  IF FIRST.TABLE THEN TOTAL = COUNT;

    ELSE TOTAL+COUNT;

  IF LAST.TABLE  THEN DO;

    COUNT = TOTAL;

    VAL_INVAL = "TOTAL";

    OUTPUT;

  END;

RUN;

PROC TRANSPOSE DATA = TRANSFORM OUT=TRANS (DROP = DOCUMENTATION_VV _Smiley Happy SUFFIX=_COUNT;

  ID VAL_INVAL ;

  BY VARIABLE VALID_VALUES DOCUMENTATION_VV NOTSORTED;

  VAR COUNT;

RUN;

Contributor
Posts: 47

Re: DATASTEP summary

Thanks everyone.


🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 306 views
  • 0 likes
  • 5 in conversation