BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

I have a dataset with multiple rows per person, which I am reducing to one row per person based on values of other variables. There's an ID variable, two categorical variables, and a date variable. Categ_1 can differ across the same ID, but categ_2 will always have the same value for the same ID (or more accurately, it can be missing, but there won't be one person who has two separate non-missing values for this variable).

ID     categ_1     categ_2     date

1             a                         1/2/2015

1             b            x            2/14/2015

2             c            y            10/9/2013

2             c            y             5/4/2012

Basically, my logic for which record to keep for each person is this:

     1. If any of their values of categ_1=a, that record should be kept

     2. If none of their values of categ_1=a, then the record with the most recent date should be kept

So all I did was sort by categ_1 and date (descending), then output the top record.

The problem is that none of the records where categ_1=a has a value for categ_2. Since categ_2 will always  be the same within each ID number, I want to add a step where I fill categ_2 using other row(s) of that ID number before outputting. The way I have it now, the top record for ID1 will output, which is right, but I want it to have categ_2=x before it outputs to the one person/one row dataset.

Any help is much appreciated.

4 REPLIES 4
Sonywell
Fluorite | Level 6

How about something like this:

/* order your input set */

proc sql;

create table part_output as

select *

from input

order by id, categ_1, categ_2, date desc;

/* select records using by processing and a retained flag (twrot) */

data myoutput;

     set part_output;

     retain twrot;

     by id;

     if first.id and categ_1 = 'a' then do;

           twrot = 0;

           output;

     end;

     else if twrot = 1 then do;

           twrot = 0;

           output;

     end;

     if last.id then do;

           twrot = 1;

     end;

run;

Tom
Super User Tom
Super User

UPDATE statement is useful for this type of problem.

data have ;

  input ID     categ_1  $ categ_2 $    date :mmddyy10. ;

  format date yymmdd10.;

cards;

1             a            .             1/2/2015

1             b            x            2/14/2015

2             c y            10/9/2013

2             c            y             5/4/2012

;;;;

proc sql ;

create view middle as

select *,(categ_1 = 'a') as isa

from have

order by id, isa , date

;

quit;

data want ;

  update middle (obs=0) middle ;

  by id ;

  drop isa ;

run;

proc print; run;


Obs ID categ_1 categ_2 date

1 1 a x 2015-01-02

2 2 c y 2013-10-09

Ksharp
Super User

Code: Program1.sas

data have ;
  input ID   categ_1  $ categ_2 $   date :mmddyy10. ;
  format date yymmdd10.;
cards;
1 a . 1/2/2015
1 b x 2/14/2015
2 c y 10/9/2013
2 c y 5/4/2012
;;;;
run;
proc sort data=have;by id date;run;
data want;
do until(last.id);
  set have;
  by id;
  if not missing(categ_2) then _categ_2=categ_2;
  if not found then do;
   _categ_1=categ_1;
   _date=date;
  end;
  if categ_1='a' then do;
   _categ_1 =categ_1 ;
   _date=date;
   found=1;
  end;
end;
keep id _:;
format _date mmddyy10.;
run;
karanfil
Fluorite | Level 6

Hello All,

 

I have file ( attached) .I have data like in first table and I would like to have data looks like second table. If anybody help me code for this I really appreciate it.

Thanks 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 10329 views
  • 0 likes
  • 5 in conversation