Hello, I need your help.
I have a SAS dataset that looks like this
documentation_invalid, documentation_vv,occupancy_invalid,occupancy_vv,property_invalid,property_vv,purpose_invalid,purpose_vv
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",1,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",1,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
1,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",1,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
I want to summarize the dataset like this what do I need to do?
documentation_invalid, documentation_vv,occupancy_invalid,occupancy_vv,property_invalid,property_vv,purpose_invalid,purpose_vv
1,"FULL, SUB, VOI/VOA, NODOC, QUICK",1,"OWNER, 2NDHOME, INVESTOR",1,"SF, MF, TOWNHOUSE, CONDO",1,"PURCHASE, CASHOUT, RATE/TERM"
Another solution:
ODS OUTPUT ONEWAYFREQS=WORK.OWF (DROP = F_: RENAME = (FREQUENCY = COUNT));
PROC FREQ DATA=HAVE;
BY DOCUMENTATION_VV OCCUPANCY_VV PROPERTY_VV PURPOSE_VV;
TABLES DOCUMENTATION_INVALID OCCUPANCY_INVALID PROPERTY_INVALID PURPOSE_INVALID /NOCUM NOPERCENT ;
RUN;
ODS OUTPUT CLOSE;
DATA TRANSFORM (KEEP = VAL_INVAL VARIABLE VALID_VALUES DOCUMENTATION_VV COUNT);
LENGTH VAL_INVAL $7. VARIABLE $21.;
VAL_INVAL = "VALID";
SET OWF;
BY TABLE;
IF DOCUMENTATION_INVALID = 1 THEN VAL_INVAL = "INVALID";
ELSE IF OCCUPANCY_INVALID = 1 THEN VAL_INVAL = "INVALID";
ELSE IF PROPERTY_INVALID = 1 THEN VAL_INVAL = "INVALID";
ELSE IF PURPOSE_INVALID = 1 THEN VAL_INVAL = "INVALID";
VARIABLE = SCAN(TABLE,-1);
IF VARIABLE = "DOCUMENTATION_INVALID" THEN VALID_VALUES = DOCUMENTATION_VV;
ELSE IF VARIABLE = "OCCUPANCY_INVALID" THEN VALID_VALUES = OCCUPANCY_VV;
ELSE IF VARIABLE = "PROPERTY_INVALID" THEN VALID_VALUES = PROPERTY_VV;
ELSE IF VARIABLE = "PURPOSE_INVALID" THEN VALID_VALUES = PURPOSE_VV;
OUTPUT;
IF FIRST.TABLE THEN TOTAL = COUNT;
ELSE TOTAL+COUNT;
IF LAST.TABLE THEN DO;
COUNT = TOTAL;
VAL_INVAL = "TOTAL";
OUTPUT;
END;
RUN;
PROC TRANSPOSE DATA = TRANSFORM OUT=TRANS (DROP = DOCUMENTATION_VV _:) SUFFIX=_COUNT;
ID VAL_INVAL ;
BY VARIABLE VALID_VALUES DOCUMENTATION_VV NOTSORTED;
VAR COUNT;
RUN;
What do you want your summary to look like?
proc sql;
select max(documentation_invalid), documentation_vv, max(occupancy_invalid), occupancy_vv, max(property_invalid), property_vv, max(purpose_invalid), purpose_vv
from have
group by documentation_vv, occupancy_vv, property_vv, purpose_vv;
quit;
you could use sum instead of max if you would want to count the number of time a row was "invalid" for your group instead of just figuring out if it ever was. It depends on what the data represents and what you ought to do.
Vince
Reeza,
Here is how I want the summary to look like
Varialble total_count valid_count invalid_count valid_values
documentation_invalid 6 5 1 "FULL, SUB, VOI/VOA, NODOC, QUICK"
occupancy_invalid 6 5 1 "OWNER, 2NDHOME, INVESTOR"
property_invalid 6 5 1 SF, MF, TOWNHOUSE, CONDO"
purpose_invalid 6 5 1 "PURCHASE, CASHOUT, RATE/TERM"
In the sample of desired output above, you have a column for valid_values. Does your data have more than one non-unique value for "valid_values" for a given crossing?
For example, are there values other than "FULL, SUB, VOI/VOA, NODOC, QUICK" for the row "documentation_invalid"? (I realize that your sample data only has one unique value for each proposed crossing, but actual data may not).
This works, assuming only one possible valid_value for each crossing of Variable (as per your desired output).
data have ;
infile datalines dlm=',' dsd;
informat documentation_vv occupancy_vv property_vv purpose_vv $50.;
input documentation_invalid documentation_vv $ occupancy_invalid occupancy_vv $ property_invalid property_vv $ purpose_invalid purpose_vv $ ;
datalines ;
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",1,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",1,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
1,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",1,"PURCHASE, CASHOUT, RATE/TERM"
0,"FULL, SUB, VOI/VOA, NODOC, QUICK",0,"OWNER, 2NDHOME, INVESTOR",0,"SF, MF, TOWNHOUSE, CONDO",0,"PURCHASE, CASHOUT, RATE/TERM"
;
;;;;
/* transpose numeric variables */
proc transpose data=have out=counts name=Variable;
run ;
/* cleanup */
data counts ;
set counts ;
drop col1-col6;
tot_count = n ( of col1-col6 );
valid_count = sum ( of col1-col6 ) ;
invalid_count = tot_count - valid_count ;
run ;
/* transpose character vars */
proc transpose data=have out=valid_values name=Variable;
var documentation_vv occupancy_vv property_vv purpose_vv ;
run ;
/* cleanup */
data valid_values ;
set valid_values ;
drop col2-col6;
rename col1=valid_values;
run ;
/* merge */
proc sql ;
create table want as
select t1.*, t2.valid_values
from counts t1
, valid_values t2
where substr ( t1.variable , 1, 8 ) = substr ( t2.variable , 1, 8 )
;
quit;
Message was edited by: Michael McCormick
Another solution:
ODS OUTPUT ONEWAYFREQS=WORK.OWF (DROP = F_: RENAME = (FREQUENCY = COUNT));
PROC FREQ DATA=HAVE;
BY DOCUMENTATION_VV OCCUPANCY_VV PROPERTY_VV PURPOSE_VV;
TABLES DOCUMENTATION_INVALID OCCUPANCY_INVALID PROPERTY_INVALID PURPOSE_INVALID /NOCUM NOPERCENT ;
RUN;
ODS OUTPUT CLOSE;
DATA TRANSFORM (KEEP = VAL_INVAL VARIABLE VALID_VALUES DOCUMENTATION_VV COUNT);
LENGTH VAL_INVAL $7. VARIABLE $21.;
VAL_INVAL = "VALID";
SET OWF;
BY TABLE;
IF DOCUMENTATION_INVALID = 1 THEN VAL_INVAL = "INVALID";
ELSE IF OCCUPANCY_INVALID = 1 THEN VAL_INVAL = "INVALID";
ELSE IF PROPERTY_INVALID = 1 THEN VAL_INVAL = "INVALID";
ELSE IF PURPOSE_INVALID = 1 THEN VAL_INVAL = "INVALID";
VARIABLE = SCAN(TABLE,-1);
IF VARIABLE = "DOCUMENTATION_INVALID" THEN VALID_VALUES = DOCUMENTATION_VV;
ELSE IF VARIABLE = "OCCUPANCY_INVALID" THEN VALID_VALUES = OCCUPANCY_VV;
ELSE IF VARIABLE = "PROPERTY_INVALID" THEN VALID_VALUES = PROPERTY_VV;
ELSE IF VARIABLE = "PURPOSE_INVALID" THEN VALID_VALUES = PURPOSE_VV;
OUTPUT;
IF FIRST.TABLE THEN TOTAL = COUNT;
ELSE TOTAL+COUNT;
IF LAST.TABLE THEN DO;
COUNT = TOTAL;
VAL_INVAL = "TOTAL";
OUTPUT;
END;
RUN;
PROC TRANSPOSE DATA = TRANSFORM OUT=TRANS (DROP = DOCUMENTATION_VV _:) SUFFIX=_COUNT;
ID VAL_INVAL ;
BY VARIABLE VALID_VALUES DOCUMENTATION_VV NOTSORTED;
VAR COUNT;
RUN;
Thanks everyone.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.