BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
zqkal
Obsidian | Level 7

Hello, I need your help.

I have a SAS dataset that looks like this

documentation_invalid, documentation_vv,occupancy_invalid,occupancy_vv,property_invalid,property_vv,purpose_invalid,purpose_vv

0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",1,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"

0,"FULL, SUB, VOI/VOA, NODOC, QUICK",1,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"

0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"

0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"

1,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",1,"PURCHASE, CASHOUT, RATE/TERM"

0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"

I want to summarize the dataset like this what do I need to do?

documentation_invalid, documentation_vv,occupancy_invalid,occupancy_vv,property_invalid,property_vv,purpose_invalid,purpose_vv

1,"FULL, SUB, VOI/VOA, NODOC, QUICK",1,"OWNER, 2NDHOME, INVESTOR",1,"SF, MF, TOWNHOUSE, CONDO",1,"PURCHASE, CASHOUT, RATE/TERM"

1 ACCEPTED SOLUTION

Accepted Solutions
Scott_Mitchell
Quartz | Level 8

Another solution:

ODS OUTPUT ONEWAYFREQS=WORK.OWF (DROP = F_: RENAME = (FREQUENCY = COUNT));

PROC FREQ DATA=HAVE;

  BY DOCUMENTATION_VV OCCUPANCY_VV PROPERTY_VV PURPOSE_VV;

  TABLES DOCUMENTATION_INVALID OCCUPANCY_INVALID PROPERTY_INVALID PURPOSE_INVALID /NOCUM NOPERCENT ;

RUN;

ODS OUTPUT CLOSE;

DATA TRANSFORM (KEEP = VAL_INVAL VARIABLE VALID_VALUES DOCUMENTATION_VV COUNT);

  LENGTH  VAL_INVAL $7. VARIABLE $21.;

  VAL_INVAL = "VALID";

  SET OWF;

  BY TABLE;

  IF DOCUMENTATION_INVALID = 1 THEN VAL_INVAL = "INVALID";

  ELSE IF OCCUPANCY_INVALID = 1 THEN VAL_INVAL = "INVALID";

  ELSE IF PROPERTY_INVALID = 1 THEN VAL_INVAL = "INVALID";

  ELSE IF PURPOSE_INVALID = 1 THEN VAL_INVAL = "INVALID";

  VARIABLE = SCAN(TABLE,-1);

  IF VARIABLE = "DOCUMENTATION_INVALID" THEN VALID_VALUES = DOCUMENTATION_VV;

  ELSE IF VARIABLE = "OCCUPANCY_INVALID" THEN VALID_VALUES = OCCUPANCY_VV;

  ELSE IF VARIABLE = "PROPERTY_INVALID" THEN VALID_VALUES = PROPERTY_VV;

  ELSE IF VARIABLE = "PURPOSE_INVALID" THEN VALID_VALUES = PURPOSE_VV;

  OUTPUT;

  IF FIRST.TABLE THEN TOTAL = COUNT;

    ELSE TOTAL+COUNT;

  IF LAST.TABLE  THEN DO;

    COUNT = TOTAL;

    VAL_INVAL = "TOTAL";

    OUTPUT;

  END;

RUN;

PROC TRANSPOSE DATA = TRANSFORM OUT=TRANS (DROP = DOCUMENTATION_VV _:) SUFFIX=_COUNT;

  ID VAL_INVAL ;

  BY VARIABLE VALID_VALUES DOCUMENTATION_VV NOTSORTED;

  VAR COUNT;

RUN;

View solution in original post

7 REPLIES 7
Reeza
Super User

What do you want your summary to look like?

Vince28_Statcan
Quartz | Level 8

proc sql;

     select max(documentation_invalid), documentation_vv, max(occupancy_invalid), occupancy_vv, max(property_invalid), property_vv, max(purpose_invalid), purpose_vv

     from have

     group by documentation_vv, occupancy_vv, property_vv, purpose_vv;

quit;

you could use sum instead of max if you would want to count the number of time a row was "invalid" for your group instead of just figuring out if it ever was. It depends on what the data represents and what you ought to do.

Vince

zqkal
Obsidian | Level 7


Reeza,

Here is how I want the summary to look like

Varialble                                   total_count    valid_count        invalid_count    valid_values

documentation_invalid                 6                          5                              1               "FULL, SUB, VOI/VOA, NODOC, QUICK"

occupancy_invalid                      6                          5                              1                   "OWNER, 2NDHOME, INVESTOR"

property_invalid                          6                           5                              1                    SF, MF, TOWNHOUSE, CONDO"

purpose_invalid                          6                          5                              1                    "PURCHASE, CASHOUT, RATE/TERM"

Fugue
Quartz | Level 8

In the sample of desired output above, you have a column for valid_values. Does your data have more than one non-unique value for "valid_values" for a given crossing?

For example, are there values other than "FULL, SUB, VOI/VOA, NODOC, QUICK" for the row "documentation_invalid"? (I realize that your sample data only has one unique value for each proposed crossing, but actual data may not).

Fugue
Quartz | Level 8

This works, assuming only one possible valid_value for each crossing of Variable (as per your desired output).

data have ;
infile datalines dlm=',' dsd;
informat documentation_vv occupancy_vv property_vv purpose_vv $50.;
input documentation_invalid documentation_vv $ occupancy_invalid occupancy_vv $ property_invalid property_vv $ purpose_invalid purpose_vv $ ;
datalines ;
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",1,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",1,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
1,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",1,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
;
;;;;

/* transpose numeric variables */
proc transpose data=have out=counts name=Variable;
run ;

/* cleanup */

data counts  ;
set counts ;
drop col1-col6;
tot_count = n ( of col1-col6 );
valid_count = sum ( of col1-col6 ) ;
invalid_count = tot_count - valid_count ;
run ;

/* transpose character vars */
proc transpose data=have out=valid_values name=Variable;
var documentation_vv occupancy_vv property_vv purpose_vv ;
run ;

/* cleanup */

data valid_values ;
set valid_values ;
drop col2-col6;
rename col1=valid_values;
run ;

/* merge */
proc sql ;
create table want as
select t1.*, t2.valid_values
from counts t1
  , valid_values t2
where substr ( t1.variable , 1, 8 ) = substr ( t2.variable , 1, 8 )
;
quit;

Message was edited by: Michael McCormick

Scott_Mitchell
Quartz | Level 8

Another solution:

ODS OUTPUT ONEWAYFREQS=WORK.OWF (DROP = F_: RENAME = (FREQUENCY = COUNT));

PROC FREQ DATA=HAVE;

  BY DOCUMENTATION_VV OCCUPANCY_VV PROPERTY_VV PURPOSE_VV;

  TABLES DOCUMENTATION_INVALID OCCUPANCY_INVALID PROPERTY_INVALID PURPOSE_INVALID /NOCUM NOPERCENT ;

RUN;

ODS OUTPUT CLOSE;

DATA TRANSFORM (KEEP = VAL_INVAL VARIABLE VALID_VALUES DOCUMENTATION_VV COUNT);

  LENGTH  VAL_INVAL $7. VARIABLE $21.;

  VAL_INVAL = "VALID";

  SET OWF;

  BY TABLE;

  IF DOCUMENTATION_INVALID = 1 THEN VAL_INVAL = "INVALID";

  ELSE IF OCCUPANCY_INVALID = 1 THEN VAL_INVAL = "INVALID";

  ELSE IF PROPERTY_INVALID = 1 THEN VAL_INVAL = "INVALID";

  ELSE IF PURPOSE_INVALID = 1 THEN VAL_INVAL = "INVALID";

  VARIABLE = SCAN(TABLE,-1);

  IF VARIABLE = "DOCUMENTATION_INVALID" THEN VALID_VALUES = DOCUMENTATION_VV;

  ELSE IF VARIABLE = "OCCUPANCY_INVALID" THEN VALID_VALUES = OCCUPANCY_VV;

  ELSE IF VARIABLE = "PROPERTY_INVALID" THEN VALID_VALUES = PROPERTY_VV;

  ELSE IF VARIABLE = "PURPOSE_INVALID" THEN VALID_VALUES = PURPOSE_VV;

  OUTPUT;

  IF FIRST.TABLE THEN TOTAL = COUNT;

    ELSE TOTAL+COUNT;

  IF LAST.TABLE  THEN DO;

    COUNT = TOTAL;

    VAL_INVAL = "TOTAL";

    OUTPUT;

  END;

RUN;

PROC TRANSPOSE DATA = TRANSFORM OUT=TRANS (DROP = DOCUMENTATION_VV _:) SUFFIX=_COUNT;

  ID VAL_INVAL ;

  BY VARIABLE VALID_VALUES DOCUMENTATION_VV NOTSORTED;

  VAR COUNT;

RUN;

zqkal
Obsidian | Level 7

Thanks everyone.


SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1248 views
  • 0 likes
  • 5 in conversation