Hello,
This is the data I have:
ID_VAR1 ID_VAR2 ID_VAR3 OTHERVAR_1 OTHERVAR_2 OTHERVAR_3
1 xxx 1x1 x . y
1 xxx 1x1 . z y
2 zzz 2x2 p u g
2 zzz 3x3 r f h
3 nnn 4x4 . q .
3 nnn 4x4 t q w
This is the data I would like:
ID_VAR1 ID_VAR2 ID_VAR3 OTHERVAR_1 OTHERVAR_2 OTHERVAR_3
1 xxx 1x1 x z y
1 xxx 1x1 x z y
2 zzz 2x2 p u g
2 zzz 3x3 r f h
3 nnn 4x4 t q w
3 nnn 4x4 t q w
So basically per unique combination of ID variables, I need to fill up and down other variables. I have quite a few variables to do this on and would like the most efficient solution.
Thanks!
Efficient to write.
data un;
input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_3)(:$1.);
cards;
1 xxx 1x1 x . y
1 xxx 1x1 . z y
2 zzz 2x2 p u g
2 zzz 3x3 r f h
3 nnn 4x4 . q .
3 nnn 4x4 t q w
;;;;
run;
proc print;
run;
data down;
update un(obs=0) un;
by ID_:;
array f first:;
if f[dim(f)] then index=0;
index+1;
output;
run;
proc sort data=down;
key id_:;
key index / descending;
run;
data up;
update down(obs=0) down;
by id_:;
output;
run;
proc sort data=up;
key id_: index;
run;
proc print;
run;
Efficient to write or efficient to run?
I would prefer something that is efficient to write. As of right now I can only think of doing it variable by variable. When I would rather have some code cycle through all the variables in one step.
Efficient to write.
data un;
input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_3)(:$1.);
cards;
1 xxx 1x1 x . y
1 xxx 1x1 . z y
2 zzz 2x2 p u g
2 zzz 3x3 r f h
3 nnn 4x4 . q .
3 nnn 4x4 t q w
;;;;
run;
proc print;
run;
data down;
update un(obs=0) un;
by ID_:;
array f first:;
if f[dim(f)] then index=0;
index+1;
output;
run;
proc sort data=down;
key id_:;
key index / descending;
run;
data up;
update down(obs=0) down;
by id_:;
output;
run;
proc sort data=up;
key id_: index;
run;
proc print;
run;
I should add that using an array of first variables will not work if there are other variables in the data that have first as the first part of there names. This can be avoided with the addion of validvarname=any as long as there are no names that begin with FIRST. (first dot). For example.
options validvarname=v7;
I am trying to adapt the code you have previously shown me to only fill down a couple of variables, however I cannot figure out how to make it work. Can you help?
This is what I have so far:
data down;
update try(obs=0) try;
by CPI;
array f[2] Race Gender first:;
if f[dim(f)] then index = 0;
indext+1;
ouput;
run;
The rest of the code I can do, it is just setting up the array to only update those two variables.
Thanks!
Katie
Consider this modification to my original program with the addition of othervar_4-othervar_6 specifically the CALL MISSING after the output statement. This effectively turns off the LOCF action for these variables.
Thank you so much for your reply. What I'm finding is the variables in the Call statement are the ones NOT getting filled, while the others are. Is there a way to specify only the ones you want to be filled?
See if this is more suitable. You create two macro the BY variables and the LOCF variables and drop and keep them as follows. Notice the variables that are not keys or LOCF now come from SET statement.
Null,
Why not DOW skill ?
data un; input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_3)(:$1.); cards; 1 xxx 1x1 x . y 1 xxx 1x1 . z y 2 zzz 2x2 p u g 2 zzz 3x3 r f h 3 nnn 4x4 . q . 3 nnn 4x4 t q w ;;;; run; data want; length a1 a2 a3 $ 10; do until(last.ID_VAR3); set un; by ID_VAR1-ID_VAR3; a1=coalescec(a1,OTHERVAR_1); a2=coalescec(a2,OTHERVAR_2); a3=coalescec(a3,OTHERVAR_3); end; do until(last.ID_VAR3); set un; by ID_VAR1-ID_VAR3; OTHERVAR_1=a1; OTHERVAR_2=a2; OTHERVAR_3=a3; output; end; drop a1 a2 a3; run;
Xia Keshan
Because it doesn't work. Add some observations so that the data is more realistic and you will see.
Null,
With all due respect , I agree with you . OP should add some data to explain his question clearer .
Message was edited by: xia keshan
If you only have less or equal two rows per (ID_VAR1 ID_VAR2 ID_VAR3), then PROC SQL and some unconventional usage of Max() function could be put into your advantage here: (Raw input was stolen from _null_'s post)
data un;
input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_6)(:$1.);
cards;
1 xxx 1x1 x . y x . y
1 xxx 1x1 . z y . z y
2 zzz 2x2 p u g p u g
2 zzz 3x3 r f h r f h
3 nnn 4x4 . q . . q .
3 nnn 4x4 t q w t q w
;;;;
run;
proc sql;
select CAT('COALESCEC(',name,',', 'MAX(',NAME,')',') AS ',NAME) into :names SEPARATED BY ' , ' from DICTIONARY.COLUMNS
WHERE LIBNAME='WORK'
AND MEMNAME='UN'
AND NAME NOT IN ('ID_VAR1' 'ID_VAR3' 'ID_VAR2')
;QUIT;
proc sql;
create table want as
select ID_VAR1, ID_VAR2, ID_VAR3 ,&NAMES FROM UN
GROUP BY ID_VAR1, ID_VAR2, ID_VAR3
;
QUIT;
Haikuo
Here is an example of what my data looks like:
Data Have -
CPI EnrollDate VisitDate BP1 Race Gender
123 1/1/2001 1/1/2001 120/80 White Female
123 1/1/2001 2/1/2001 119/85
123 9/9/2009 9/9/2009 115/75 White
123 9/9/2009 10/9/2009
456 2/2/2002 2/2/2002 118/85 Black Male
456 2/2/2002 3/2/2002 110/82 Black Male
789 3/3/2003 3/3/2003 135/81 Hispanic
789 3/3/2003 4/3/2003 132/83 Hispanic Male
789 3/3/2003 5/3/2003 Male
789 3/3/2003 6/3/2003 128/81
Data Want -
CPI EnrollDate VisitDate BP1 Race Gender
123 1/1/2001 1/1/2001 120/80 White Female
123 1/1/2001 2/1/2001 119/85 White Female
123 9/9/2009 9/9/2009 115/75 White Female
123 9/9/2009 10/9/2009 White Female
456 2/2/2002 2/2/2002 118/85 Black Male
456 2/2/2002 3/2/2002 110/82 Black Male
789 3/3/2003 3/3/2003 135/81 Hispanic Male
789 3/3/2003 4/3/2003 132/83 Hispanic Male
789 3/3/2003 5/3/2003 Hispanic Male
789 3/3/2003 6/3/2003 128/81 Hispanic Male
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.