Solved: Re: Filling up and Down Efficiently

Katie · Posted 02-04-2014 02:48 PM

Hello,

This is the data I have:

ID_VAR1 ID_VAR2 ID_VAR3 OTHERVAR_1 OTHERVAR_2 OTHERVAR_3

1 xxx 1x1 x . y

1 xxx 1x1 . z y

2 zzz 2x2 p u g

2 zzz 3x3 r f h

3 nnn 4x4 . q .

3 nnn 4x4 t q w

This is the data I would like:

ID_VAR1 ID_VAR2 ID_VAR3 OTHERVAR_1 OTHERVAR_2 OTHERVAR_3

1 xxx 1x1 x z y

2 zzz 2x2 p u g

2 zzz 3x3 r f h

3 nnn 4x4 t q w

So basically per unique combination of ID variables, I need to fill up and down other variables. I have quite a few variables to do this on and would like the most efficient solution.

Thanks!

data_null__ · Posted 02-04-2014 03:17 PM

Efficient to write.

data un;
   input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_3)(:$1.);
   cards; 
1 xxx 1x1 x . y
1 xxx 1x1 . z y
2 zzz 2x2 p u g
2 zzz 3x3 r f h
3 nnn 4x4 . q .
3 nnn 4x4 t q w
;;;;
   run; 
proc print; 
   run; 
data down;
   update un(obs=0) un;
   by ID_:;
   array f first:;
   if f[dim(f)] then index=0; 
   index+1; 
   output; 
   run; 
proc sort data=down;
   key id_:;
   key index / descending; 
   run; 
data up;
   update down(obs=0) down;
   by id_:;
   output; 
   run; 
proc sort data=up;
   key id_: index;
   run; 
proc print; 
   run;

View solution in original post

data_null__ · Posted 02-04-2014 03:01 PM

Efficient to write or efficient to run?

Katie · Posted 02-04-2014 03:04 PM

I would prefer something that is efficient to write. As of right now I can only think of doing it variable by variable. When I would rather have some code cycle through all the variables in one step.

data_null__ · Posted 02-04-2014 03:17 PM

Efficient to write.

data un;
   input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_3)(:$1.);
   cards; 
1 xxx 1x1 x . y
1 xxx 1x1 . z y
2 zzz 2x2 p u g
2 zzz 3x3 r f h
3 nnn 4x4 . q .
3 nnn 4x4 t q w
;;;;
   run; 
proc print; 
   run; 
data down;
   update un(obs=0) un;
   by ID_:;
   array f first:;
   if f[dim(f)] then index=0; 
   index+1; 
   output; 
   run; 
proc sort data=down;
   key id_:;
   key index / descending; 
   run; 
data up;
   update down(obs=0) down;
   by id_:;
   output; 
   run; 
proc sort data=up;
   key id_: index;
   run; 
proc print; 
   run;

data_null__ · Posted 02-07-2014 03:37 PM

I should add that using an array of first variables will not work if there are other variables in the data that have first as the first part of there names. This can be avoided with the addion of validvarname=any as long as there are no names that begin with FIRST. (first dot). For example.

data un;
   input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (firstvar_1-firstvar_3)(:$1.);
   cards; 
1 xxx 1x1 x . y
1 xxx 1x1 . z y
2 zzz 2x2 p u g
2 zzz 3x3 r f h
3 nnn 4x4 . q .
3 nnn 4x4 t q w
;;;;
   run; 
proc print; 
   run; 
options validvarname=any;
data down;
   update un(obs=0) un;
   by ID_:;
   array f 
'first.'n:;
   if f[dim(f)] then index=0; 
   index+1; 
   output; 
   run;

options validvarname=v7;

proc sort data=down;

   key id_:;
   key index / descending; 
   run; 
data up;
   update down(obs=0) down;
   by id_:;
   output; 
   run; 
proc sort data=up;
   key id_: index;
   run; 
proc print; 
   run; 

Peter_C · Posted 02-08-2014 03:23 AM

thank you

That's a really neat way of generalising code to identify the last of (undefined) number of names in a list.

It's not hard to accept the restriction, reserving names beginning "first" for the special by-group indicators, without even needing validvarname=any

.

Peter

Katie · Posted 04-17-2014 12:02 PM

I am trying to adapt the code you have previously shown me to only fill down a couple of variables, however I cannot figure out how to make it work. Can you help?

This is what I have so far:

data down;

update try(obs=0) try;

by CPI;

array f[2] Race Gender first:;

if f[dim(f)] then index = 0;

indext+1;

ouput;

run;

The rest of the code I can do, it is just setting up the array to only update those two variables.

Thanks!

Katie

data_null__ · Posted 04-17-2014 01:40 PM

Consider this modification to my original program with the addition of othervar_4-othervar_6 specifically the CALL MISSING after the output statement. This effectively turns off the LOCF action for these variables.

data un;
   input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_6)(:$1.);
   cards; 
1 xxx 1x1 x . y x . y
1 xxx 1x1 . z y . z y
2 zzz 2x2 p u g p u g
2 zzz 3x3 r f h r f h
3 nnn 4x4 . q . . q .
3 nnn 4x4 t q w t q w
;;;;
   run; 
proc print; 
   run; 
data down;
   update un(obs=0) un;
   by ID_:;
   array f first:;
   if f[dim(f)] then index=0; 
   index+1; 
   output; 
   call missing(of othervar_4-othervar_6);
   run; 
proc sort data=down;
   key id_:;
   key index / descending; 
   run; 
data up;
   update down(obs=0) down;
   by id_:;
   output; 
   call missing(of othervar_4-othervar_6);
   run; 
proc sort data=up;
   key id_: index;
   run; 
proc print; 
   run;

Katie · Posted 04-17-2014 01:56 PM

Thank you so much for your reply. What I'm finding is the variables in the Call statement are the ones NOT getting filled, while the others are. Is there a way to specify only the ones you want to be filled?

data_null__ · Posted 04-17-2014 02:20 PM

See if this is more suitable. You create two macro the BY variables and the LOCF variables and drop and keep them as follows. Notice the variables that are not keys or LOCF now come from SET statement.

data un;
   input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_6)(:$1.);
   cards; 
1 xxx 1x1 x . y x . y
1 xxx 1x1 . z y . z y
2 zzz 2x2 p u g p u g
2 zzz 3x3 r f h r f h
3 nnn 4x4 . q . . q .
3 nnn 4x4 t q w t q w
;;;;
   run; 
proc print; 
   run; 
%let by=ID_:;
%let locf=othervar_1-othervar_3;
data down;
   update un(obs=0 keep=&by) un(keep=&by &locf);
   by ID_:;
   set un(drop=&by &locf);
   array f first:;
   if f[dim(f)] then index=0; 
   index+1; 
   output; 
   run; 
proc sort data=down;
   key id_:;
   key index / descending; 
   run; 
data up;
   update down(obs=0 keep=&by) down(keep=&by &locf);
   by id_:;
   set down(drop=&by &locf);
   output; 
   run; 
proc sort data=up;
   key id_: index;
   run; 
proc print; 
   run;

Ksharp · Posted 04-18-2014 04:33 AM

Null,

Why not DOW skill ?

data un;
   input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_3)(:$1.);
   cards; 
1 xxx 1x1 x . y
1 xxx 1x1 . z y
2 zzz 2x2 p u g
2 zzz 3x3 r f h
3 nnn 4x4 . q .
3 nnn 4x4 t q w
;;;;
   run; 
data want;
 length a1 a2 a3 $ 10;
 do until(last.ID_VAR3);
  set un;
  by ID_VAR1-ID_VAR3;
  a1=coalescec(a1,OTHERVAR_1);
  a2=coalescec(a2,OTHERVAR_2);
  a3=coalescec(a3,OTHERVAR_3);
 end;
 do until(last.ID_VAR3);
  set un;
  by ID_VAR1-ID_VAR3;
  OTHERVAR_1=a1;
  OTHERVAR_2=a2;
  OTHERVAR_3=a3;
  output;
 end;
 drop a1 a2 a3;
run;

Xia Keshan

data_null__ · Posted 04-18-2014 07:24 AM

Because it doesn't work. Add some observations so that the data is more realistic and you will see.

Ksharp · Posted 04-18-2014 07:29 AM

Null,

With all due respect , I agree with you . OP should add some data to explain his question clearer .

Message was edited by: xia keshan

Haikuo · Posted 04-18-2014 10:11 AM

If you only have less or equal two rows per (ID_VAR1 ID_VAR2 ID_VAR3), then PROC SQL and some unconventional usage of Max() function could be put into your advantage here: (Raw input was stolen from _null_'s post)

data un;

input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_6)(:$1.);

cards;

1 xxx 1x1 x . y x . y

1 xxx 1x1 . z y . z y

2 zzz 2x2 p u g p u g

2 zzz 3x3 r f h r f h

3 nnn 4x4 . q . . q .

3 nnn 4x4 t q w t q w

;;;;

run;

proc sql;

select CAT('COALESCEC(',name,',', 'MAX(',NAME,')',') AS ',NAME) into :names SEPARATED BY ' , ' from DICTIONARY.COLUMNS

WHERE LIBNAME='WORK'

AND MEMNAME='UN'

AND NAME NOT IN ('ID_VAR1' 'ID_VAR3' 'ID_VAR2')

;QUIT;

proc sql;

create table want as

select ID_VAR1, ID_VAR2, ID_VAR3 ,&NAMES FROM UN

GROUP BY ID_VAR1, ID_VAR2, ID_VAR3

;

QUIT;

Haikuo

Katie · Posted 04-18-2014 10:26 AM

Here is an example of what my data looks like:

Data Have -

CPI EnrollDate VisitDate BP1 Race Gender

123 1/1/2001 1/1/2001 120/80 White Female

123 1/1/2001 2/1/2001 119/85

123 9/9/2009 9/9/2009 115/75 White

123 9/9/2009 10/9/2009

456 2/2/2002 2/2/2002 118/85 Black Male

456 2/2/2002 3/2/2002 110/82 Black Male

789 3/3/2003 3/3/2003 135/81 Hispanic

789 3/3/2003 4/3/2003 132/83 Hispanic Male

789 3/3/2003 5/3/2003 Male

789 3/3/2003 6/3/2003 128/81

Data Want -

CPI EnrollDate VisitDate BP1 Race Gender

123 1/1/2001 1/1/2001 120/80 White Female

123 1/1/2001 2/1/2001 119/85 White Female

123 9/9/2009 9/9/2009 115/75 White Female

123 9/9/2009 10/9/2009 White Female

456 2/2/2002 2/2/2002 118/85 Black Male

456 2/2/2002 3/2/2002 110/82 Black Male

789 3/3/2003 3/3/2003 135/81 Hispanic Male

789 3/3/2003 4/3/2003 132/83 Hispanic Male

789 3/3/2003 5/3/2003 Hispanic Male

789 3/3/2003 6/3/2003 128/81 Hispanic Male

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away