BookmarkSubscribeRSS Feed
UrvishShah
Fluorite | Level 6

Hi

Actually i'm working on one project in which i need to collaspe each category in to other (you can see the collapsing rules below) if the sample size is less then 10 then only i need to collaspe

you can see the datasets in attached excel files below

You can see in attached dataset that the COUNT variable has values less than 10 in COLLAPSE DATASET and see their groups carefully

Now i need to collapse these values in another dataset for example CODED(attached) according to bwlow:

for marriage variable, interconvertibly colapse category 2 to 5 or 5 to 2 and rename as 2


for marriage variable, interconvertibly colapse category 3 to 4 or 4 to 3 and rename as 3


for income  variable, interconvertibly colapse category 1 to 2 or 2 to 1 and rename as 2

You can also see the syntax for the same


proc print data= collapse;
run;

data test;

set CODED;

/* Region = 1 */


/*If region26 = 1 then do; if income in  (1, 2) then income  = 2 ; end ;*/
If region26 = 1 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 1 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 2 */


/*If region26 = 2 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
If region26 = 2 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 3 */

/*If region26 = 3 then do; if marriage in (3, 4) then marriage = 3 ; end ;*/

/* Region = 4 */

If region26 = 4 then do; if income in  (1, 2) then income  = 2 ; end ;
If region26 = 4 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 4 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 5 */

/*If region26 = 5 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
If region26 = 5 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 6 */


If region26 = 6 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 6 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 7 */


If region26 = 7 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 8 */


/*If region26 = 8 then do; if marriage in (3, 4) then marriage = 3 ; end ;*/

/* Region = 9 */


/*If region26 = 9 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
If region26 = 9 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 10 */


/*If region26 = 10 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
If region26 = 10 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 11 */


If region26 = 11 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 12 */


/*If region26 = 12 then do; if income in  (1, 2) then income  = 1 ; end ;*/
/*If region26 = 12 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
If region26 = 12 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 13 */


If region26 = 13 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 14 */


/*if region26 = 14  then do;  if marriage in (3,4) then marriage = 3 ; end;*/

/* Region = 15 */


If region26 = 15 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 16 */


If region26 = 16 then do; if income in  (1, 2) then income  = 2 ; end ;
If region26 = 16 then do; if income in  (3, 4) then income  = 4 ; end ;
/*If region26 = 16 then do; if income in  (5, 6) then income  = 6 ; end ;*/
/*If region26 = 16 then do; if hh_size in  (3, 4) then hh_size  = 4 ; end ;*/
If region26 = 16 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 16 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 17 */


/*If region26 = 17 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
If region26 = 17 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 18 */


If region26 = 18 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 19 */


/*If region26 = 19 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
If region26 = 19 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 20 */


/*If region26 = 20 then do; if income in  (1, 2) then income  = 2; end ;*/
If region26 = 20 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 20 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 21 */


If region26 = 21 then do; if income in (1, 2) then income = 1 ; end ;
If region26 = 21 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 21 then do; if marriage in (3, 4) then marriage = 3 ; end ;


/* Region = 22 */


/*If region26 = 22 then do; if marriage in (2, 5) then marriage = 2 ; end ;*/
/*If region26 = 22 then do; if marriage in (3, 4) then marriage = 3 ; end ;*/

/* Region = 23 */


If region26 = 23 then do; if income in (1, 2) then income = 1 ; end ;
If region26 = 23 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 24 */


If region26 = 24 then do; if income in (1, 2) then income = 1 ; end ;
If region26 = 24 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 24 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 25 */


If region26 = 25 then do; if income in (1, 2) then income = 1 ; end ;
If region26 = 25 then do; if marriage in (2, 5) then marriage = 2 ; end ;
If region26 = 25 then do; if marriage in (3, 4) then marriage = 3 ; end ;

/* Region = 26*/


/*If region26 = 26 then do; if hh_size in  (3, 4) then hh_size  = 4 ; end ;*/
If region26 = 26 then do; if income in (1, 2) then income = 2 ; end ;
If region26 = 26 then do; if marriage in (2, 5) then marriage = 2 ; end ;
/*If region26 = 26 then do; if marriage in (3, 4) then marriage = 3 ; end ;*/

run;

CODED is very big dataset and contains all variables

You can see first 20 obs of CODED dataset in attached excel file

Here, all categories are not fixed all the time, it will vary depands on data that you can see in commented syntax

Here, everytime i need to look manually to the dataset COLLASPE and then collapse it in CODED dataset, it is taking to much time to do this task everytime

I tried almost all the SAS Codes to automate this but due to different datasets (COLLAPSE and CODED), i'm confused

Please help me out

Please tell me if any question with regard to same

Thanks in advance

21 REPLIES 21
art297
Opal | Level 21

Not sure if I correctly understand what you are trying to accomplish, but it sounds like you could use the "collapse" dataset to build a program that you could then include in a datastep with a %include statement.  You could, of course, just build it as a temporary file, but below I created it as a permanent file so that you could review it before including it.  E.g.,

data _null_;

  file "c:\changes.sas";

  set collapse;

  if _n_ eq 1 then counter=0;

  if count lt 10 then do;

    counter+1;

    if counter eq 1 then put "if region26 eq " @;

    else put "else if region26 eq " @;

    put region26 " and " var "in (" @;

    if var eq 'marriage' then do;

      if group in (2, 5) then put "2, 5) then marriage=2;";

      if group in (3, 4) then put "3, 4) then marriage=3;";

    end;

    else if var eq 'income' then do;

      if group in (1, 2) then put "1, 2) then income=2;";

      if group in (3, 4) then put "3, 4) then income=4;";

      if group in (5, 6) then put "5, 6) then income=6;";

    end;

  end;

run;

UrvishShah
Fluorite | Level 6

Thanks Sir

But can you please tell me in FILE statement which file i should keep

I got your point but i've little bit confusion in FILE statement

Thanks

art297
Opal | Level 21

What I was suiggesting was building an include file, from the collapsed dataset, and then running something like:

data want;

  set coded;

  %include "c:\changes.sas";

run;

Of course, that assumes I correctly understand what you are trying to do.

Reeza
Super User

Can you do two passes through your data and use formats rather than recoding?

In the first pass, count how many and determine what format you need to apply to each variable

In the second variable apply that format using a putc or putn.

That does require creating new variables but it might be more maintainable in the long term.

UrvishShah
Fluorite | Level 6


Hi

I built one file containing collapsed dataset as follow

data collapse;

set data.collapse;

run;

and then i ran the following syntax

data want;

set coded;

%include "c:\changes.sas";

run;

But here the error is coming

ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant,

a datetime constant, a missing value, iterator, (.

Please tell me if i'm going wrong

Thanks

art297
Opal | Level 21

From the way you described your first datastep, you are indicating that your data are in a library labeled "data".

If both of your files are in the data library then I would have expected that you ran something like the code I'll show below.  Is that what you ran?

data _null_;

  file "c:\changes.sas";

  set data.collapse;

  if _n_ eq 1 then counter=0;

  if count lt 10 then do;

    counter+1;

    if counter eq 1 then put "if region26 eq " @;

    else put "else if region26 eq " @;

    put region26 " and " var "in (" @;

    if var eq 'marriage' then do;

      if group in (2, 5) then put "2, 5) then marriage=2;";

      if group in (3, 4) then put "3, 4) then marriage=3;";

    end;

    else if var eq 'income' then do;

      if group in (1, 2) then put "1, 2) then income=2;";

      if group in (3, 4) then put "3, 4) then income=4;";

      if group in (5, 6) then put "5, 6) then income=6;";

    end;

  end;

run;

data data.want;

  set data.coded;

  %include "c:\changes.sas";

run;

UrvishShah
Fluorite | Level 6

Yup i ran the same code

And if i'm comparing the data with my original data, it is absolutaly non-matching.

UrvishShah
Fluorite | Level 6

And the CHANGES file contains the following

data data.collapse;

set data.collapse;

run;

Please tell me if i'm doing wrong

art297
Opal | Level 21

Your code doesn't make any sense to me.  You are overwriting a copy of your original file with your original file. The code I was suggesting was expecting two files, like shown in your example, named:

data.coded and data.collapse.  The code wouldn't make any sense unless those two files exist.

Changes is NOT a dataset but, rather, SAS program code (i.e., a text file with a .SAS extension), which would have been created by my first suggested datastep and inluded in the 2nd datastep I proposed.

UrvishShah
Fluorite | Level 6

Hi

i ran the syntax that you send

But still i'm facing problem in some categories

Please see the output of region26 = 4

Without Collapsing

                                      Cumulative    Cumulative
                   marriage    Frequency     Percent     Frequency      Percent

                          1         101       57.06           101        57.06
                          2           5        2.82           106        59.89
                          3          14        7.91           120        67.80
                          4           4        2.26           124        70.06
                          5          53       29.94           177       100.00

Collapsing with my syntax


                                         The FREQ Procedure

                                                        Cumulative    Cumulative
                   marriage    Frequency     Percent     Frequency      Percent

                          1         101       57.06           101        57.06
                          2          58       32.77           159        89.83
                          3          18       10.17           177       100.00

Collapsing by using your code

                                      Cumulative    Cumulative
                   marriage    Frequency     Percent     Frequency      Percent

                          1         101       57.06           101        57.06
                          2          50       28.25           151        85.31
                          3          18       10.17           169        95.48
                          5           8        4.52           177       100.00

I'm facing the same problem in four more regions (21,24,25,26)

Please help me out

Thanks

UrvishShah
Fluorite | Level 6

Hi Sir

I got the output

I made some changes in your code as follows

data _null_;

  file "C:\changes_new.sas";

  set collapse;

  if _n_ eq 1 then counter=0;

  if count lt 10 then do;

    counter+1;

     put "if region26 eq " @;

    put region26 " then do;" @;

    put "if " var "in (" @;

    if var eq 'marriage' then do;

      if group in (2, 5) then put "2, 5) then marriage=2;";

      if group in (3, 4) then put "3, 4) then marriage=3;";

    end;

    else if var eq 'income' then do;

      if group in (1, 2) then put "1, 2) then income=2;";

      if group in (3, 4) then put "3, 4) then income=4;";

      if group in (5, 6) then put "5, 6) then income=6;";

    end;

  end;

put "end;";

run;


data want;

  set copy;

  %include "C:\changes_new.sas";

run;

After trying this code, all the regions are matching with orignal data

Without your help, it could never be possible

I'll post some more questions with referance to the same, i hope you will resolve my problem

Thanks

I learned a lot

art297
Opal | Level 21

There must be something you omitted from the changes you made, as the code you posted would end up causing an error when run.

The code I proposed was designed so that the first condition was an if statement and all of the others were else if statements.  I did it that way so that irrelevant checks did not have to be made.  I.e., as soon as one condition was met, the others could be disregarded.

It would help if you posted the code that you actually incliuded in your run (i.e., a copy of c:\changes_new.sas).

Regardless, glad to hear that you are either done or at least extremely close.

UrvishShah
Fluorite | Level 6

The code is as follow

if region26 eq 1  then do;if marriage in (2, 5) then marriage=2;

end;

if region26 eq 1  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 2  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 4  then do;if income in (1, 2) then income=2;

end;

if region26 eq 4  then do;if marriage in (2, 5) then marriage=2;

end;

if region26 eq 4  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 5  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 6  then do;if marriage in (2, 5) then marriage=2;

end;

if region26 eq 6  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 7  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 9  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 10  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 11  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 12  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 13  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 15  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 16  then do;if income in (1, 2) then income=2;

end;

if region26 eq 16  then do;if income in (3, 4) then income=4;

end;

if region26 eq 16  then do;if marriage in (2, 5) then marriage=2;

end;

if region26 eq 16  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 17  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 18  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 19  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 20  then do;if marriage in (2, 5) then marriage=2;

end;

if region26 eq 20  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 21  then do;if income in (1, 2) then income=2;

end;

if region26 eq 21  then do;if income in (1, 2) then income=2;

end;

if region26 eq 21  then do;if marriage in (2, 5) then marriage=2;

end;

if region26 eq 21  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 23  then do;if income in (1, 2) then income=2;

end;

if region26 eq 23  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 24  then do;if income in (1, 2) then income=2;

end;

if region26 eq 24  then do;if marriage in (2, 5) then marriage=2;

end;

if region26 eq 24  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 25  then do;if income in (1, 2) then income=2;

end;

if region26 eq 25  then do;if marriage in (2, 5) then marriage=2;

end;

if region26 eq 25  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 25  then do;if marriage in (3, 4) then marriage=3;

end;

if region26 eq 26  then do;if income in (1, 2) then income=2;

end;

if region26 eq 26  then do;if income in (1, 2) then income=2;

end;

if region26 eq 26  then do;if marriage in (2, 5) then marriage=2;

end;

art297
Opal | Level 21

The code I originally suggested would have produced the following which, from what I can tell, would have accomplished the same thing, but would run significantly faster:

if region26 eq 1  and marriage in (2, 5) then marriage=2;

else if region26 eq 1  and marriage in (3, 4) then marriage=3;

else if region26 eq 2  and marriage in (3, 4) then marriage=3;

else if region26 eq 4  and income in (1, 2) then income=2;

else if region26 eq 4  and marriage in (2, 5) then marriage=2;

else if region26 eq 4  and marriage in (3, 4) then marriage=3;

else if region26 eq 5  and marriage in (3, 4) then marriage=3;

else if region26 eq 6  and marriage in (2, 5) then marriage=2;

else if region26 eq 6  and marriage in (3, 4) then marriage=3;

else if region26 eq 7  and marriage in (3, 4) then marriage=3;

else if region26 eq 9  and marriage in (3, 4) then marriage=3;

else if region26 eq 10  and marriage in (3, 4) then marriage=3;

else if region26 eq 11  and marriage in (3, 4) then marriage=3;

else if region26 eq 12  and marriage in (3, 4) then marriage=3;

else if region26 eq 13  and marriage in (3, 4) then marriage=3;

else if region26 eq 15  and marriage in (3, 4) then marriage=3;

else if region26 eq 16  and income in (1, 2) then income=2;

else if region26 eq 16  and income in (3, 4) then income=4;

else if region26 eq 16  and marriage in (2, 5) then marriage=2;

else if region26 eq 16  and marriage in (3, 4) then marriage=3;

else if region26 eq 17  and marriage in (3, 4) then marriage=3;

else if region26 eq 18  and marriage in (3, 4) then marriage=3;

else if region26 eq 19  and marriage in (3, 4) then marriage=3;

else if region26 eq 20  and marriage in (2, 5) then marriage=2;

else if region26 eq 20  and marriage in (3, 4) then marriage=3;

else if region26 eq 21  and income in (1, 2) then income=2;

else if region26 eq 21  and income in (1, 2) then income=2;

else if region26 eq 21  and marriage in (2, 5) then marriage=2;

else if region26 eq 21  and marriage in (3, 4) then marriage=3;

else if region26 eq 23  and income in (1, 2) then income=2;

else if region26 eq 23  and marriage in (3, 4) then marriage=3;

else if region26 eq 24  and income in (1, 2) then income=2;

else if region26 eq 24  and marriage in (2, 5) then marriage=2;

else if region26 eq 24  and marriage in (3, 4) then marriage=3;

else if region26 eq 25  and income in (1, 2) then income=2;

else if region26 eq 25  and marriage in (2, 5) then marriage=2;

else if region26 eq 25  and marriage in (3, 4) then marriage=3;

else if region26 eq 25  and marriage in (3, 4) then marriage=3;

else if region26 eq 26  and income in (1, 2) then income=2;

else if region26 eq 26  and income in (1, 2) then income=2;

else if region26 eq 26  and marriage in (2, 5) then marriage=2;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 21 replies
  • 2175 views
  • 0 likes
  • 4 in conversation