I have a dataset as below. I have the condition that if [(given<4) and (taken>=1)] then I have to sum up(collapse cells) for given and taken variables by grouping into company, area, division .
company | area | division | state | given | taken |
xxx | MW | MW_1 | MP | 3 | 7 |
xxx | MW | MW_1 | UP | 8 | 3 |
xxx | E | S_1 | MN | 5 | 4 |
xxx | NE | NE_1 | AP | 2 | 1 |
xxx | NE | NE_1 | TN | 1 | 1 |
xxx | NE | NE_2 | TL | 6 | 2 |
yyy | MW | MW_1 | MP | 3 | 4 |
yyy | MW | MW_1 | OH | 2 | 7 |
The result table should be as below for collapsed 1. I have no clue how to code this in SAS.
company | area | division | geography | given | taken |
xxx | MW | MW_1 | MP_UP | 11 | 10 |
xxx | E | S_1 | MN | 5 | 4 |
xxx | NE | NE_1 | AP_TN | 3 | 2 |
xxx | NE | NE_2 | TL | 6 | 2 |
yyy | MW | MW_1 | MP_OH | 5 | 11 |
Hi @Rock3t4u
Note: Based on your collapsing condition, you should end up with 6 records in the output data set based on your supplied sample!
xxx | NE | NE_1 | AP | 2 | 1 |
xxx | NE | NE_1 | TN | 1 | 1 |
These two records do not meet the [(given<4) and (taken>1)] condition, therefore it shouldn't collapse!
I've attached a modified version of my collapse_data.sas macro, you can download it and save it, then run the following code
/* Create sample data */
data have;
infile datalines dlm="09"x dsd truncover;
input company $ area $ division $ state $ given taken;
datalines;
xxx MW MW_1 MP 3 7
xxx MW MW_1 UP 8 3
xxx E S_1 MN 5 4
xxx NE NE_1 AP 2 1
xxx NE NE_1 TN 1 1
xxx NE NE_2 TL 6 2
yyy MW MW_1 MP 3 4
yyy MW MW_1 OH 2 7
;
run;
/* Ensure the Input data set is sorted by the desired classification variables */
PROC SORT DATA=work.HAVE;
BY company area division state;
RUN;
%include "<SomeDirectory>/collapse_data.sas";
/* Sample macro execution call */
%collapse_data (p_inDsName=work.have
, p_outDsName=work.want
, p_classVars=%STR(company area division state)
, p_prntVar=division
, p_sumedVar=state
, p_groupedVar=Geography
, p_minLimitCondition=%str(given LT 4 AND taken GT 1));
The code should produce the following want data set
company | area | division | given | taken | Geography |
xxx | E | S_1 | 5 | 4 | MN |
xxx | MW | MW_1 | 11 | 10 | UP_MP |
xxx | NE | NE_1 | 2 | 1 | AP |
xxx | NE | NE_1 | 1 | 1 | TN |
xxx | NE | NE_2 | 6 | 2 | TL |
yyy | MW | MW_1 | 5 | 11 | OH_MP |
Hope this helps,
Ahmed
Hi @Rock3t4u
Few years ago I developed a SAS macro to do exactly that 😉
Check it out here Using a SAS® Hash Object to Speed and Simplify the Survey Cell Collapsing Process
Hope this helps,
Ahmed
data have;
infile datalines dlm="09"x dsd truncover;
input company $ area $ division $ state $ given taken;
datalines;
xxx MW MW_1 MP 3 7
xxx MW MW_1 UP 8 3
xxx E S_1 MN 5 4
xxx NE NE_1 AP 2 1
xxx NE NE_1 TN 1 1
xxx NE NE_2 TL 6 2
yyy MW MW_1 MP 3 4
yyy MW MW_1 OH 2 7
;
data want;
set have (rename=(state=_state given=_given taken=_taken));
by company area notsorted division;
length state $20;
retain state;
if first.division
then do;
state = _state;
given = _given;
taken = _taken;
end;
else do;
state = catx("_",state,_state);
given + _given;
taken + _taken;
end;
if last.division;
drop _:;
run;
proc print data=want noobs;
run;
Result:
company area division state given taken xxx MW MW_1 MP_UP 11 10 xxx E S_1 MN 5 4 xxx NE NE_1 AP_TN 3 2 xxx NE NE_2 TL 6 2 yyy MW MW_1 MP_OH 5 11
Thanks for the reply. But the collapsing should be based on the condition[(given<4) and (taken>1)]. I will be dealing with huge dataset and collapsing should be only based on condition.
My code creates your expected result from the data you posted.
If your intended result was not correct, correct it. If additional data is needed to illustrate your issue, post it.
Hi @Rock3t4u
Note: Based on your collapsing condition, you should end up with 6 records in the output data set based on your supplied sample!
xxx | NE | NE_1 | AP | 2 | 1 |
xxx | NE | NE_1 | TN | 1 | 1 |
These two records do not meet the [(given<4) and (taken>1)] condition, therefore it shouldn't collapse!
I've attached a modified version of my collapse_data.sas macro, you can download it and save it, then run the following code
/* Create sample data */
data have;
infile datalines dlm="09"x dsd truncover;
input company $ area $ division $ state $ given taken;
datalines;
xxx MW MW_1 MP 3 7
xxx MW MW_1 UP 8 3
xxx E S_1 MN 5 4
xxx NE NE_1 AP 2 1
xxx NE NE_1 TN 1 1
xxx NE NE_2 TL 6 2
yyy MW MW_1 MP 3 4
yyy MW MW_1 OH 2 7
;
run;
/* Ensure the Input data set is sorted by the desired classification variables */
PROC SORT DATA=work.HAVE;
BY company area division state;
RUN;
%include "<SomeDirectory>/collapse_data.sas";
/* Sample macro execution call */
%collapse_data (p_inDsName=work.have
, p_outDsName=work.want
, p_classVars=%STR(company area division state)
, p_prntVar=division
, p_sumedVar=state
, p_groupedVar=Geography
, p_minLimitCondition=%str(given LT 4 AND taken GT 1));
The code should produce the following want data set
company | area | division | given | taken | Geography |
xxx | E | S_1 | 5 | 4 | MN |
xxx | MW | MW_1 | 11 | 10 | UP_MP |
xxx | NE | NE_1 | 2 | 1 | AP |
xxx | NE | NE_1 | 1 | 1 | TN |
xxx | NE | NE_2 | 6 | 2 | TL |
yyy | MW | MW_1 | 5 | 11 | OH_MP |
Hope this helps,
Ahmed
In the macro mentioned https://support.sas.com/resources/papers/proceedings16/7820-2016.pdf, I do not see the parameters p_sumedvar and p_groupedvar included. Can you post the macro collapse_data you used to get the below result. Thanks.
collapse_data
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.