Recoding observations

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 81
Accepted Solution

Recoding observations

Hello all,

Just trying to recoding the following two variables. Looking forward (a simple example of my data) for your advise please.

Regards,

Zana

Data have

obs_1  obs_2

1000      0

1007      0

1050     1000

1051     1000

10502   1007

10260   1007

Data want

obs_1  obs_2

  1        0

  2        0

  3        1

  4        1

  6        2

  5        2


Accepted Solutions
Solution
‎06-24-2014 09:38 AM
Respected Advisor
Posts: 3,124

Re: Recoding observations

Bummer for the thickening plot. Hash table may become another option if the performance bothers you. but for now you can have:

DATA HAVE;

INPUT OBS_1 OBS_2;

N=_N_;

CARDS;

1000       0

1001       0

1007       0

1050     1000

1051     1000

10502   1007

10260   1000

;

ODS _ALL_ CLOSE;

ODS OUTPUT SQL_RESULTS=H2;

PROC SQL NUMBER;

SELECT DISTINCT OBS_1 AS OBS FROM HAVE

       UNION CORR

       SELECT DISTINCT OBS_2 AS OBS FROM HAVE

      ORDER BY OBS ;

QUIT;

ODS LISTING;

DATA H3;

SET H2;

      ROW=ROW-1;

      RUN;

PROC SQL;

CREATE TABLE WANT(DROP=N) AS

           SELECT A.*, B.ROW AS OBS_1_NEW, C.ROW AS OBS_2_NEW

                FROM HAVE A, H3 B, H3 C

                     WHERE A.OBS_1=B.OBS

                     AND A.OBS_2=C.OBS

                     ORDER BY N

;

QUIT;

Haikuo

View solution in original post


All Replies
Respected Advisor
Posts: 3,124

Re: Recoding observations

Could it be as simple as the following?

data have;

     input obs_1 obs_2;

     cards;

1000      0

1007      0

1050     1000

1051     1000

10502   1007

10260   1007

;

data want;

     set have;

     by obs_2 obs_1 notsorted;

     new_obs_1=_n_;

     _obs_2+first.obs_2;

     new_obs_2=_obs_2-1;

     keep new:;

run;

Good Luck,

Haikuo

Frequent Contributor
Posts: 81

Re: Recoding observations

Thanks for all you

Dear all, i need a program that recoding all observation step by step. for example, if the smallest observation (except zero) find in the data set (here was 1000) then replacing with 1(in all observations; obs_1, obs_2 and ...), for the next (here was 1007) replacing with 2, and similarly for all others. Note that 1502 is larger than 1260, then it should have a larger number (new recoding number), too. When i run Hai.kuo's program, 1502 and 1260 replacing with 5 and 6, respectively, that it is incorrect. When i changed one of the observations (each of them),  Astoun's program was not correct, too. For instance:

obs_1  obs_2

1000       0

1007       0

1050     1000

1051     1000

10502   1007

10260   1000

Astoun's program output:

obs_1  obs_2   obs_1new   obs_2new

1000        0              1            0

1007        0              2            0

1050     1000            3            1

1051     1000            4            1

10502   1007            6            3

10260   1000            5            2



For this sample i need output like this

obs_1  obs_2   obs_1new   obs_2new

1000        0              1            0

1007        0              2            0

1050     1000            3            1

1051     1000            4            1

10502   1007            6            2

10260   1000            5            1

Best regards

Super User
Posts: 10,530

Re: Recoding observations

Is there some underlying logic to the assignment such as for Obs_1 the sequence order in the data set?

A series of If-then-else or Select clause could do a direct assignment such as

If obs_1=1000 then obs_1=1; But these approaches are limited to the exact data shown in your example and not flexible unless you only have 6 rows of data. And if you have 1000s of observations you likely don't want to write that much code.

If obs_1 is sequence that can be fairly easy:

data want;

     set have;

     obs_1=_n_;

run;

Proc format is sometimes an option for recodes like this but you need a table of base values and assigned values. If obs_2 is supposed to have the obs_1 value associated with the same base value then we could build a format from a pass through the data after obs_1 old and new values are assigned.


Super User
Posts: 5,091

Re: Recoding observations

Part of the answer depends on whether obs_1 is numeric or character.  Since it's easier, I'll assume that it is numeric but there is extra work needed if it is character.

You need different orders to the data to assign values to your variables, so a few steps are needed.

data step1;

   set have;

   by obs_2 notsorted;

   retain obs_2new  -1;

   if first.obs_2 then obs_2new + 1;
run;

proc sort data=step1;

   by obs_1;

run;

data want;

   set step1;

   obs_1new = _n_;

   drop obs_1 obs_2;

    rename obs_1new = obs_1

        obs_2new = obs_2;

run;

It's also not clear what action to take if two observations have the same original value for obs_1.  Should the new value be the same, or should it be different?  This sample code assumes that obs_1 is originally unique.

You don't have to drop and rename variables.  You might want to keep all four variables.

Regular Contributor
Posts: 217

Re: Recoding observations


Zana,

Please provide more information.  I don't understand why 6 in the obs_1 result appears before 5.  Also the one and two in the obs_2 output could represent the 1000 and 1007 in the first and second obs_1 rows in the source data,

obs_1  obs_2

1000      0

1007      0

1050     1000

1051     1000

10502   1007

10260   1007

Data want

obs_1  obs_2

  1        0

  2        0

  3        1

  4        1

  6        2

  5        2

SAS Employee
Posts: 15

Re: Recoding observations

Hello zana,

Maybe try:

data work.test;

  input

    obs_1

    obs_2;

datalines;

1000 0

1007 0

1050 1000

1051 1000

10502 1007

10260 1007

;

run;

proc sort

  data=work.test;

  by

    obs_1

    obs_2;

run;

data work.obs_1;

  set

    work.test;

  if obs_1 ne . then output;

  drop

    obs_2;

run;

data work.obs_2;

  set

    work.test;

  if obs_2 ne . then output;

  drop

    obs_1;

run;

proc sort

  data=work.obs_1;

  by

    obs_1;

run;

proc sort

  data=work.obs_2;

  by

    obs_2;

run;

data work.obs_1;

  set work.obs_1;

  by

    obs_1;

  if first.obs_1;

run;

data work.obs_2;

  set work.obs_2;

  by

    obs_2;

  if first.obs_2;

run;

data work.obs_1;

  set work.obs_1;

  retain

    match_number (1);

  if obs_1 ne 0 then do;

    match_obs_1 = match_number;

    match_number = match_number + 1;

  end;

  else match_obs_1=0;

  output;

  drop

    match_number;

run;

data work.obs_2;

  set work.obs_2;

  retain

    match_number (1);

  if obs_2 ne 0 then do;

    match_obs_2 = match_number;

    match_number = match_number + 1;

  end;

  else match_obs_2=0;

  output;

  drop

    match_number;

run;

data work.test;

  merge

    work.test

    work.obs_1;

  by

    obs_1;

run;

data work.test (

  drop=obs_1 obs_2

  rename=(

    match_obs_1=obs_1

    match_obs_2=obs_2

    )

  );

  merge

    work.test

    work.obs_2;

  by

    obs_2;

run;

Trusted Advisor
Posts: 1,204

Re: Recoding observations

Data have;
input obs_1  obs_2;
datalines;
1000    0
1007    0
1050    1000
1051    1000
10502   1007
10260   1007
;

data want(drop=obs_2 rename=obs2=obs_2);
set have;
id=_n_;
obs2=0;
if obs_2>0 and mod(obs_2,1000)=0 then obs2=1;
else if obs_2>0 and mod(obs_2,1000)>0 then obs2=2;
run;

data final(drop=id obs_1 rename=sort_key=obs_1);
set want;
if obs_2=2 then sort_key=11-id;
else sort_key=id;
run;

Super Contributor
Posts: 275

Re: Recoding observations


data idfmt;
  set have end=last;
  retain fmtname 'idfmt' type 'n';
  start=obs_1;
  label=_n_;
  if last then do;
  start=0;
  label=0;
  fmtname='idfmt';
  end;
  drop obs:;
run;

proc format cntlin=idfmt;
run;

data want;
set have;
obs_2=put(obs_2,idfmt.);
run;

Respected Advisor
Posts: 3,124

Re: Recoding observations

If this is what you want, then it would be combination of several simple steps:

DATA HAVE;

     INPUT OBS_1 OBS_2;

     N=_N_;

     CARDS;

1000       0

1007       0

1050     1000

1051     1000

10502   1007

10260   1000

;

PROC SORT DATA=HAVE(KEEP=OBS_1) OUT=OBS_1;

     BY OBS_1;

RUN;

DATA OBS_1;

     SET OBS_1;

     BY OBS_1;

     IF FIRST.OBS_1;

     OBS_1_NEW+FIRST.OBS_1;

RUN;

PROC SORT DATA=HAVE(KEEP=OBS_2) OUT=OBS_2;

     BY OBS_2;

RUN;

DATA OBS_2;

     SET OBS_2;

     BY OBS_2;

     IF FIRST.OBS_2;

     _NEW+FIRST.OBS_2;

     OBS_2_NEW=_NEW-1;

RUN;

PROC SQL;

     CREATE TABLE WANT(DROP=N) AS 

           SELECT A.*, B.OBS_1_NEW, C.OBS_2_NEW

                FROM HAVE A

                     LEFT JOIN OBS_1 B

                           ON A.OBS_1 = B.OBS_1

                     LEFT JOIN OBS_2 C

                           ON A.OBS_2=C.OBS_2

                     ORDER BY N

     ;

QUIT;


Regards,

Haikuo

Frequent Contributor
Posts: 81

Re: Recoding observations

Dear Hai.kuo

Thank you so much for your reply.

Your cod is right, but only for this simple of data set. I should inform you that, my original data is too large (observations on three column with thousands numerator for each of them). When i delete or add only one observation, out put of your code have more error.

For instance:

obs_1  obs_2

1000       0

1001       0

1007       0

1050     1000

1051     1000

10502   1007

10260   1000

(just by adding the 2nd row)

Out put for your code;

obs_1  obs_2   obs_1new   obs_2new

1000        0              1            0

1001        0              2            0

1007        0              3            0

1050     1000            4            1

1051     1000            5            1

10502   1007            7            2 is incorrect, it should be 3 (the new code for 1007)

10260   1000            6            1

2 is a new code for 1001 not for 1007!! So i need a program that search in all columns and replacing any old number with the new code for that.

Solution
‎06-24-2014 09:38 AM
Respected Advisor
Posts: 3,124

Re: Recoding observations

Bummer for the thickening plot. Hash table may become another option if the performance bothers you. but for now you can have:

DATA HAVE;

INPUT OBS_1 OBS_2;

N=_N_;

CARDS;

1000       0

1001       0

1007       0

1050     1000

1051     1000

10502   1007

10260   1000

;

ODS _ALL_ CLOSE;

ODS OUTPUT SQL_RESULTS=H2;

PROC SQL NUMBER;

SELECT DISTINCT OBS_1 AS OBS FROM HAVE

       UNION CORR

       SELECT DISTINCT OBS_2 AS OBS FROM HAVE

      ORDER BY OBS ;

QUIT;

ODS LISTING;

DATA H3;

SET H2;

      ROW=ROW-1;

      RUN;

PROC SQL;

CREATE TABLE WANT(DROP=N) AS

           SELECT A.*, B.ROW AS OBS_1_NEW, C.ROW AS OBS_2_NEW

                FROM HAVE A, H3 B, H3 C

                     WHERE A.OBS_1=B.OBS

                     AND A.OBS_2=C.OBS

                     ORDER BY N

;

QUIT;

Haikuo

Frequent Contributor
Posts: 81

Re: Recoding observations

Dear Hai.kuo

Thank you very much. Your code is so right. Hardly thanks

Super User
Posts: 9,687

Re: Recoding observations

You need a Hash Table or proc format.

DATA HAVE;
     INPUT OBS_1 OBS_2;
     N=_N_;
     CARDS;
1000       0
1001       0
1007       0
1050     1000
1051     1000
10502   1007
10260   1000
;
run;
data want;
 if _n_ eq 1 then do;
  if 0 then set have;
   declare hash h(dataset:'have');
    h.definekey('OBS_1');
     h.definedata('n');
     h.definedone();
 end;
 set have;
  if h.find(key:OBS_2)=0 then new=n;
   else new=0;
run;

Xia Keshan

Super User
Posts: 5,091

Re: Recoding observations

This is an easily fixable problem.  You can take my original solution, and add this as a first step:

proc sort data=have;

   by obs_2;

run;

The rest of the solution would follow, with no changes, except that you could then remove the word "notsorted".

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 462 views
  • 0 likes
  • 9 in conversation