My data is from a survey where members of a household were entered into blocks of variables. For example, if someone meets X criteria they are entered into the variables 1-3, starting with variable 1; if someone meets Y criteria then they are entered into the variables 4-6, starting with variable 4; if someone meets Z criteria they are entered into variables 7-9, starting with variable 7. I am now tasked with creating one set of variables to hold all the information, but I am not sure how to restructure the data.
Example of current data structure:
Fname_1 | Lname_1 | Age_1 | Gender_1 | Fname_2 | Lname_2 | Age_2 | Gender_2 | Fname_3 | Lname_3 | Age_3 | Gender_3 | Fname_4 | Lname_4 | Age_4 | Gender_4 | Fname_5 | Lname_5 | Age_5 | Gender_5 | Fname_6 | Lname_6 | Age_6 | Gender_6 | Fname_7 | Lname_7 | Age_7 | Gender_7 | Fname_8 | Lname_8 | Age_8 | Gender_8 | Fname_9 | Lname_9 | Age_9 | Gender_9 |
Kate | Black | 20 | F | . | . | . | . | . | . | . | . | Louis | White | 35 | M | Kitty | White | 36 | F | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
Lucy | Red | 18 | F | Tom | Red | 17 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | Paul | Red | 76 | M | Sarah | Red | 77 | F | . | . | . | . |
. | . | . | . | . | . | . | . | . | . | . | . | Patricia | Yellow | 60 | F | Frank | Yellow | 62 | M | . | . | . | . | Kate | Yellow | 11 | F | . | . | . | . | . | . | . | . |
Jeff | Blue | 15 | M | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | Lillian | Blue | 25 | F | . | . | . | . | . | . | . | . |
This leaves a lot of missing data. I want to be able to make one household roster, removing the missing data. So that it looks like this:
Fname_1 | Lname_1 | Age_1 | Gender_1 | Fname_2 | Lname_2 | Age_2 | Gender_2 | Fname_3 | Lname_3 | Age_3 | Gender_3 | Fname_4 | Lname_4 | Age_4 | Gender_4 | Fname_5 | Lname_5 | Age_5 | Gender_5 | Fname_6 | Lname_6 | Age_6 | Gender_6 | Fname_7 | Lname_7 | Age_7 | Gender_7 | Fname_8 | Lname_8 | Age_8 | Gender_8 | Fname_9 | Lname_9 | Age_9 | Gender_9 |
Kate | Black | 20 | F | Louis | White | 35 | M | Kitty | White | 36 | F | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
Lucy | Red | 18 | F | Tom | Red | 17 | Paul | Red | 76 | M | Sarah | Red | 77 | F | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
Patricia | Yellow | 60 | F | Frank | Yellow | 62 | M | Kate | Yellow | 11 | F | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
Jeff | Blue | 15 | M | Lillian | Blue | 25 | F | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
My question is: how do I pull out the data that exists and put it into variables as per my example.
I am familiar with arrays and do-loops, and have tried some manipulations with these, without success.
Any guidance, advice, or suggestions are much appreciated!
Thank you!
data work.a;
input
Fname_1 $ Lname_1 $ Age_1 Gender_1 $
Fname_2 $ Lname_2 $ Age_2 Gender_2 $
Fname_3 $ Lname_3 $ Age_3 Gender_3 $
Fname_4 $ Lname_4 $ Age_4 Gender_4 $
Fname_5 $ Lname_5 $ Age_5 Gender_5 $
Fname_6 $ Lname_6 $ Age_6 Gender_6 $
Fname_7 $ Lname_7 $ Age_7 Gender_7 $
Fname_8 $ Lname_8 $ Age_8 Gender_8 $
Fname_9 $ Lname_9 $ Age_9 Gender_9 $
;
cards;
Kate Black 20 F . . . . . . . . Louis White 35 M Kitty White 36 F . . . . . . . . . . . . . . . .
Lucy Red 18 F Tom Red 17 . . . . . . . . . . . . . . . . . Paul Red 76 M Sarah Red 77 F . . . .
. . . . . . . . . . . . Patricia Yellow 60 F Frank Yellow 62 M . . . . Kate Yellow 11 F . . . . . . . .
Jeff Blue 15 M Lillian Blue 25 F . . . . . . . . . . . . . . . . . . . . . . . . . . . .
;
run;
data work.result;
set work.a;
array array_Fname Fname:;
array array_Lname Lname:;
array array_Age Age:;
array array_Gender Gender:;
do i=1 to dim(array_Fname);
if missing(array_Fname(i)) then do j=i+1 to dim(array_Fname);
if not missing(array_Fname(j)) then do;
array_Fname(i)=array_Fname(j);array_Lname(i)=array_Lname(j);array_Age(i)=array_Age(j);array_Gender(i)=array_Gender(j);
call missing(array_Fname(j),array_Lname(j),array_Age(j),array_Gender(j));
leave;
end;
end;
end;
drop i j;
run;
Why?
Looks like a report to me. Try the PANELS= option on PROC REPORT to display the records in multiple columns on the page.
You would be well advised to change the plan. Create a data set with one observation per person:
fname, lname, age, gender
And add two variables:
family, criteria
That would be much more flexible and would support both analysis and reporting.
One way to start implementing @Astounding's suggestion:
data roster; set have; array fn fname_: ;/* or fname_1 - fname_9*/ array ln lname_: ; array af age_: ; /* similar for the repeated variables*/ do i= 1 to dim(fn); if not missing(fn[i] ) then do; firstname=fn[i]; lastname =ln[i]; age =ag[i]; /*continue with other arrays*/ output; end; end; keep firstname lastname age ; run;
the keep list should include any variables you want. Note the location of the OUTPUT. Any other modified/created/ manipulated variables you need for each individual need to be set at that point.
data work.a;
input
Fname_1 $ Lname_1 $ Age_1 Gender_1 $
Fname_2 $ Lname_2 $ Age_2 Gender_2 $
Fname_3 $ Lname_3 $ Age_3 Gender_3 $
Fname_4 $ Lname_4 $ Age_4 Gender_4 $
Fname_5 $ Lname_5 $ Age_5 Gender_5 $
Fname_6 $ Lname_6 $ Age_6 Gender_6 $
Fname_7 $ Lname_7 $ Age_7 Gender_7 $
Fname_8 $ Lname_8 $ Age_8 Gender_8 $
Fname_9 $ Lname_9 $ Age_9 Gender_9 $
;
cards;
Kate Black 20 F . . . . . . . . Louis White 35 M Kitty White 36 F . . . . . . . . . . . . . . . .
Lucy Red 18 F Tom Red 17 . . . . . . . . . . . . . . . . . Paul Red 76 M Sarah Red 77 F . . . .
. . . . . . . . . . . . Patricia Yellow 60 F Frank Yellow 62 M . . . . Kate Yellow 11 F . . . . . . . .
Jeff Blue 15 M Lillian Blue 25 F . . . . . . . . . . . . . . . . . . . . . . . . . . . .
;
run;
data work.result;
set work.a;
array array_Fname Fname:;
array array_Lname Lname:;
array array_Age Age:;
array array_Gender Gender:;
do i=1 to dim(array_Fname);
if missing(array_Fname(i)) then do j=i+1 to dim(array_Fname);
if not missing(array_Fname(j)) then do;
array_Fname(i)=array_Fname(j);array_Lname(i)=array_Lname(j);array_Age(i)=array_Age(j);array_Gender(i)=array_Gender(j);
call missing(array_Fname(j),array_Lname(j),array_Age(j),array_Gender(j));
leave;
end;
end;
end;
drop i j;
run;
Thanks learsaas this is exactly what I was looking for. This solution works perfectly!!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.