Hi
Actually i'm working on one project in which i need to collaspe each category in to other (you can see the collapsing rules below) if the sample size is less then 10 then only i need to collaspe
you can see the datasets in attached excel files below
You can see in attached dataset that the COUNT variable has values less than 10 in COLLAPSE DATASET and see their groups carefully
Now i need to collapse these values in another dataset for example CODED(attached) according to bwlow:
for marriage variable, interconvertibly colapse category 2 to 5 or 5 to 2 and rename as 2
for marriage variable, interconvertibly colapse category 3 to 4 or 4 to 3 and rename as 3
for income variable, interconvertibly colapse category 1 to 2 or 2 to 1 and rename as 2
You can also see the syntax for the same
proc print data= collapse;
run;
data test;
set CODED;
/* Region = 1 */
/*If region26 = 1 then do; if income in (1, 2) then income = 2 ; end ;*/
If region26 = 1 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 1 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 2 */
/*If region26 = 2 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
If region26 = 2 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 3 */
/*If region26 = 3 then do; if marriage in (3, 4) then marriage = 3 ; end ;*/
/* Region = 4 */
If region26 = 4 then do; if income in (1, 2) then income = 2 ; end ;
If region26 = 4 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 4 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 5 */
/*If region26 = 5 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
If region26 = 5 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 6 */
If region26 = 6 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 6 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 7 */
If region26 = 7 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 8 */
/*If region26 = 8 then do; if marriage in (3, 4) then marriage = 3 ; end ;*/
/* Region = 9 */
/*If region26 = 9 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
If region26 = 9 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 10 */
/*If region26 = 10 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
If region26 = 10 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 11 */
If region26 = 11 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 12 */
/*If region26 = 12 then do; if income in (1, 2) then income = 1 ; end ;*/
/*If region26 = 12 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
If region26 = 12 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 13 */
If region26 = 13 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 14 */
/*if region26 = 14 then do; if marriage in (3,4) then marriage = 3 ; end;*/
/* Region = 15 */
If region26 = 15 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 16 */
If region26 = 16 then do; if income in (1, 2) then income = 2 ; end ;
If region26 = 16 then do; if income in (3, 4) then income = 4 ; end ;
/*If region26 = 16 then do; if income in (5, 6) then income = 6 ; end ;*/
/*If region26 = 16 then do; if hh_size in (3, 4) then hh_size = 4 ; end ;*/
If region26 = 16 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 16 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 17 */
/*If region26 = 17 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
If region26 = 17 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 18 */
If region26 = 18 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 19 */
/*If region26 = 19 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
If region26 = 19 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 20 */
/*If region26 = 20 then do; if income in (1, 2) then income = 2; end ;*/
If region26 = 20 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 20 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 21 */
If region26 = 21 then do; if income in (1, 2) then income = 1 ; end ;
If region26 = 21 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 21 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 22 */
/*If region26 = 22 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
/*If region26 = 22 then do; if marriage in (3, 4) then marriage = 3 ; end ;*/
/* Region = 23 */
If region26 = 23 then do; if income in (1, 2) then income = 1 ; end ;
If region26 = 23 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 24 */
If region26 = 24 then do; if income in (1, 2) then income = 1 ; end ;
If region26 = 24 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 24 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 25 */
If region26 = 25 then do; if income in (1, 2) then income = 1 ; end ;
If region26 = 25 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 25 then do; if marriage in (3, 4) then marriage = 3 ; end ;
/* Region = 26*/
/*If region26 = 26 then do; if hh_size in (3, 4) then hh_size = 4 ; end ;*/
If region26 = 26 then do; if income in (1, 2) then income = 2 ; end ;
If region26 = 26 then do; if marriage in (2, 5) then marriage = 2 ; end ;
/*If region26 = 26 then do; if marriage in (3, 4) then marriage = 3 ; end ;*/
run;
CODED is very big dataset and contains all variables
You can see first 20 obs of CODED dataset in attached excel file
Here, all categories are not fixed all the time, it will vary depands on data that you can see in commented syntax
Here, everytime i need to look manually to the dataset COLLASPE and then collapse it in CODED dataset, it is taking to much time to do this task everytime
I tried almost all the SAS Codes to automate this but due to different datasets (COLLAPSE and CODED), i'm confused
Please help me out
Please tell me if any question with regard to same
Thanks in advance
Not sure if I correctly understand what you are trying to accomplish, but it sounds like you could use the "collapse" dataset to build a program that you could then include in a datastep with a %include statement. You could, of course, just build it as a temporary file, but below I created it as a permanent file so that you could review it before including it. E.g.,
data _null_;
file "c:\changes.sas";
set collapse;
if _n_ eq 1 then counter=0;
if count lt 10 then do;
counter+1;
if counter eq 1 then put "if region26 eq " @;
else put "else if region26 eq " @;
put region26 " and " var "in (" @;
if var eq 'marriage' then do;
if group in (2, 5) then put "2, 5) then marriage=2;";
if group in (3, 4) then put "3, 4) then marriage=3;";
end;
else if var eq 'income' then do;
if group in (1, 2) then put "1, 2) then income=2;";
if group in (3, 4) then put "3, 4) then income=4;";
if group in (5, 6) then put "5, 6) then income=6;";
end;
end;
run;
Thanks Sir
But can you please tell me in FILE statement which file i should keep
I got your point but i've little bit confusion in FILE statement
Thanks
What I was suiggesting was building an include file, from the collapsed dataset, and then running something like:
data want;
set coded;
%include "c:\changes.sas";
run;
Of course, that assumes I correctly understand what you are trying to do.
Can you do two passes through your data and use formats rather than recoding?
In the first pass, count how many and determine what format you need to apply to each variable
In the second variable apply that format using a putc or putn.
That does require creating new variables but it might be more maintainable in the long term.
Hi
I built one file containing collapsed dataset as follow
data collapse;
set data.collapse;
run;
and then i ran the following syntax
data want;
set coded;
%include "c:\changes.sas";
run;
But here the error is coming
ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant,
a datetime constant, a missing value, iterator, (.
Please tell me if i'm going wrong
Thanks
From the way you described your first datastep, you are indicating that your data are in a library labeled "data".
If both of your files are in the data library then I would have expected that you ran something like the code I'll show below. Is that what you ran?
data _null_;
file "c:\changes.sas";
set data.collapse;
if _n_ eq 1 then counter=0;
if count lt 10 then do;
counter+1;
if counter eq 1 then put "if region26 eq " @;
else put "else if region26 eq " @;
put region26 " and " var "in (" @;
if var eq 'marriage' then do;
if group in (2, 5) then put "2, 5) then marriage=2;";
if group in (3, 4) then put "3, 4) then marriage=3;";
end;
else if var eq 'income' then do;
if group in (1, 2) then put "1, 2) then income=2;";
if group in (3, 4) then put "3, 4) then income=4;";
if group in (5, 6) then put "5, 6) then income=6;";
end;
end;
run;
data data.want;
set data.coded;
%include "c:\changes.sas";
run;
Yup i ran the same code
And if i'm comparing the data with my original data, it is absolutaly non-matching.
And the CHANGES file contains the following
data data.collapse;
set data.collapse;
run;
Please tell me if i'm doing wrong
Your code doesn't make any sense to me. You are overwriting a copy of your original file with your original file. The code I was suggesting was expecting two files, like shown in your example, named:
data.coded and data.collapse. The code wouldn't make any sense unless those two files exist.
Changes is NOT a dataset but, rather, SAS program code (i.e., a text file with a .SAS extension), which would have been created by my first suggested datastep and inluded in the 2nd datastep I proposed.
Hi
i ran the syntax that you send
But still i'm facing problem in some categories
Please see the output of region26 = 4
Without Collapsing
Cumulative Cumulative
marriage Frequency Percent Frequency Percent
1 101 57.06 101 57.06
2 5 2.82 106 59.89
3 14 7.91 120 67.80
4 4 2.26 124 70.06
5 53 29.94 177 100.00
Collapsing with my syntax
The FREQ Procedure
Cumulative Cumulative
marriage Frequency Percent Frequency Percent
1 101 57.06 101 57.06
2 58 32.77 159 89.83
3 18 10.17 177 100.00
Collapsing by using your code
Cumulative Cumulative
marriage Frequency Percent Frequency Percent
1 101 57.06 101 57.06
2 50 28.25 151 85.31
3 18 10.17 169 95.48
5 8 4.52 177 100.00
I'm facing the same problem in four more regions (21,24,25,26)
Please help me out
Thanks
Hi Sir
I got the output
I made some changes in your code as follows
data _null_;
file "C:\changes_new.sas";
set collapse;
if _n_ eq 1 then counter=0;
if count lt 10 then do;
counter+1;
put "if region26 eq " @;
put region26 " then do;" @;
put "if " var "in (" @;
if var eq 'marriage' then do;
if group in (2, 5) then put "2, 5) then marriage=2;";
if group in (3, 4) then put "3, 4) then marriage=3;";
end;
else if var eq 'income' then do;
if group in (1, 2) then put "1, 2) then income=2;";
if group in (3, 4) then put "3, 4) then income=4;";
if group in (5, 6) then put "5, 6) then income=6;";
end;
end;
put "end;";
run;
data want;
set copy;
%include "C:\changes_new.sas";
run;
After trying this code, all the regions are matching with orignal data
Without your help, it could never be possible
I'll post some more questions with referance to the same, i hope you will resolve my problem
Thanks
I learned a lot
There must be something you omitted from the changes you made, as the code you posted would end up causing an error when run.
The code I proposed was designed so that the first condition was an if statement and all of the others were else if statements. I did it that way so that irrelevant checks did not have to be made. I.e., as soon as one condition was met, the others could be disregarded.
It would help if you posted the code that you actually incliuded in your run (i.e., a copy of c:\changes_new.sas).
Regardless, glad to hear that you are either done or at least extremely close.
The code is as follow
if region26 eq 1 then do;if marriage in (2, 5) then marriage=2;
end;
if region26 eq 1 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 2 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 4 then do;if income in (1, 2) then income=2;
end;
if region26 eq 4 then do;if marriage in (2, 5) then marriage=2;
end;
if region26 eq 4 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 5 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 6 then do;if marriage in (2, 5) then marriage=2;
end;
if region26 eq 6 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 7 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 9 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 10 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 11 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 12 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 13 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 15 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 16 then do;if income in (1, 2) then income=2;
end;
if region26 eq 16 then do;if income in (3, 4) then income=4;
end;
if region26 eq 16 then do;if marriage in (2, 5) then marriage=2;
end;
if region26 eq 16 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 17 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 18 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 19 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 20 then do;if marriage in (2, 5) then marriage=2;
end;
if region26 eq 20 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 21 then do;if income in (1, 2) then income=2;
end;
if region26 eq 21 then do;if income in (1, 2) then income=2;
end;
if region26 eq 21 then do;if marriage in (2, 5) then marriage=2;
end;
if region26 eq 21 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 23 then do;if income in (1, 2) then income=2;
end;
if region26 eq 23 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 24 then do;if income in (1, 2) then income=2;
end;
if region26 eq 24 then do;if marriage in (2, 5) then marriage=2;
end;
if region26 eq 24 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 25 then do;if income in (1, 2) then income=2;
end;
if region26 eq 25 then do;if marriage in (2, 5) then marriage=2;
end;
if region26 eq 25 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 25 then do;if marriage in (3, 4) then marriage=3;
end;
if region26 eq 26 then do;if income in (1, 2) then income=2;
end;
if region26 eq 26 then do;if income in (1, 2) then income=2;
end;
if region26 eq 26 then do;if marriage in (2, 5) then marriage=2;
end;
The code I originally suggested would have produced the following which, from what I can tell, would have accomplished the same thing, but would run significantly faster:
if region26 eq 1 and marriage in (2, 5) then marriage=2;
else if region26 eq 1 and marriage in (3, 4) then marriage=3;
else if region26 eq 2 and marriage in (3, 4) then marriage=3;
else if region26 eq 4 and income in (1, 2) then income=2;
else if region26 eq 4 and marriage in (2, 5) then marriage=2;
else if region26 eq 4 and marriage in (3, 4) then marriage=3;
else if region26 eq 5 and marriage in (3, 4) then marriage=3;
else if region26 eq 6 and marriage in (2, 5) then marriage=2;
else if region26 eq 6 and marriage in (3, 4) then marriage=3;
else if region26 eq 7 and marriage in (3, 4) then marriage=3;
else if region26 eq 9 and marriage in (3, 4) then marriage=3;
else if region26 eq 10 and marriage in (3, 4) then marriage=3;
else if region26 eq 11 and marriage in (3, 4) then marriage=3;
else if region26 eq 12 and marriage in (3, 4) then marriage=3;
else if region26 eq 13 and marriage in (3, 4) then marriage=3;
else if region26 eq 15 and marriage in (3, 4) then marriage=3;
else if region26 eq 16 and income in (1, 2) then income=2;
else if region26 eq 16 and income in (3, 4) then income=4;
else if region26 eq 16 and marriage in (2, 5) then marriage=2;
else if region26 eq 16 and marriage in (3, 4) then marriage=3;
else if region26 eq 17 and marriage in (3, 4) then marriage=3;
else if region26 eq 18 and marriage in (3, 4) then marriage=3;
else if region26 eq 19 and marriage in (3, 4) then marriage=3;
else if region26 eq 20 and marriage in (2, 5) then marriage=2;
else if region26 eq 20 and marriage in (3, 4) then marriage=3;
else if region26 eq 21 and income in (1, 2) then income=2;
else if region26 eq 21 and income in (1, 2) then income=2;
else if region26 eq 21 and marriage in (2, 5) then marriage=2;
else if region26 eq 21 and marriage in (3, 4) then marriage=3;
else if region26 eq 23 and income in (1, 2) then income=2;
else if region26 eq 23 and marriage in (3, 4) then marriage=3;
else if region26 eq 24 and income in (1, 2) then income=2;
else if region26 eq 24 and marriage in (2, 5) then marriage=2;
else if region26 eq 24 and marriage in (3, 4) then marriage=3;
else if region26 eq 25 and income in (1, 2) then income=2;
else if region26 eq 25 and marriage in (2, 5) then marriage=2;
else if region26 eq 25 and marriage in (3, 4) then marriage=3;
else if region26 eq 25 and marriage in (3, 4) then marriage=3;
else if region26 eq 26 and income in (1, 2) then income=2;
else if region26 eq 26 and income in (1, 2) then income=2;
else if region26 eq 26 and marriage in (2, 5) then marriage=2;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.