My data is from a survey where members of a household were entered into blocks of variables. For example, if someone meets X criteria they are entered into the variables 1-3, starting with variable 1; if someone meets Y criteria then they are entered into the variables 4-6, starting with variable 4; if someone meets Z criteria they are entered into variables 7-9, starting with variable 7. I am now tasked with creating one set of variables to hold all the information, but I am not sure how to restructure the data.
Example of current data structure:
| Fname_1 | Lname_1 | Age_1 | Gender_1 | Fname_2 | Lname_2 | Age_2 | Gender_2 | Fname_3 | Lname_3 | Age_3 | Gender_3 | Fname_4 | Lname_4 | Age_4 | Gender_4 | Fname_5 | Lname_5 | Age_5 | Gender_5 | Fname_6 | Lname_6 | Age_6 | Gender_6 | Fname_7 | Lname_7 | Age_7 | Gender_7 | Fname_8 | Lname_8 | Age_8 | Gender_8 | Fname_9 | Lname_9 | Age_9 | Gender_9 |
| Kate | Black | 20 | F | . | . | . | . | . | . | . | . | Louis | White | 35 | M | Kitty | White | 36 | F | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| Lucy | Red | 18 | F | Tom | Red | 17 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | Paul | Red | 76 | M | Sarah | Red | 77 | F | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | Patricia | Yellow | 60 | F | Frank | Yellow | 62 | M | . | . | . | . | Kate | Yellow | 11 | F | . | . | . | . | . | . | . | . |
| Jeff | Blue | 15 | M | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | Lillian | Blue | 25 | F | . | . | . | . | . | . | . | . |
This leaves a lot of missing data. I want to be able to make one household roster, removing the missing data. So that it looks like this:
| Fname_1 | Lname_1 | Age_1 | Gender_1 | Fname_2 | Lname_2 | Age_2 | Gender_2 | Fname_3 | Lname_3 | Age_3 | Gender_3 | Fname_4 | Lname_4 | Age_4 | Gender_4 | Fname_5 | Lname_5 | Age_5 | Gender_5 | Fname_6 | Lname_6 | Age_6 | Gender_6 | Fname_7 | Lname_7 | Age_7 | Gender_7 | Fname_8 | Lname_8 | Age_8 | Gender_8 | Fname_9 | Lname_9 | Age_9 | Gender_9 |
| Kate | Black | 20 | F | Louis | White | 35 | M | Kitty | White | 36 | F | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| Lucy | Red | 18 | F | Tom | Red | 17 | Paul | Red | 76 | M | Sarah | Red | 77 | F | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| Patricia | Yellow | 60 | F | Frank | Yellow | 62 | M | Kate | Yellow | 11 | F | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| Jeff | Blue | 15 | M | Lillian | Blue | 25 | F | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
My question is: how do I pull out the data that exists and put it into variables as per my example.
I am familiar with arrays and do-loops, and have tried some manipulations with these, without success.
Any guidance, advice, or suggestions are much appreciated!
Thank you!
data work.a;
input
Fname_1 $ Lname_1 $ Age_1 Gender_1 $
Fname_2 $ Lname_2 $ Age_2 Gender_2 $
Fname_3 $ Lname_3 $ Age_3 Gender_3 $
Fname_4 $ Lname_4 $ Age_4 Gender_4 $
Fname_5 $ Lname_5 $ Age_5 Gender_5 $
Fname_6 $ Lname_6 $ Age_6 Gender_6 $
Fname_7 $ Lname_7 $ Age_7 Gender_7 $
Fname_8 $ Lname_8 $ Age_8 Gender_8 $
Fname_9 $ Lname_9 $ Age_9 Gender_9 $
;
cards;
Kate Black 20 F . . . . . . . . Louis White 35 M Kitty White 36 F . . . . . . . . . . . . . . . .
Lucy Red 18 F Tom Red 17 . . . . . . . . . . . . . . . . . Paul Red 76 M Sarah Red 77 F . . . .
. . . . . . . . . . . . Patricia Yellow 60 F Frank Yellow 62 M . . . . Kate Yellow 11 F . . . . . . . .
Jeff Blue 15 M Lillian Blue 25 F . . . . . . . . . . . . . . . . . . . . . . . . . . . .
;
run;
data work.result;
set work.a;
array array_Fname Fname:;
array array_Lname Lname:;
array array_Age Age:;
array array_Gender Gender:;
do i=1 to dim(array_Fname);
if missing(array_Fname(i)) then do j=i+1 to dim(array_Fname);
if not missing(array_Fname(j)) then do;
array_Fname(i)=array_Fname(j);array_Lname(i)=array_Lname(j);array_Age(i)=array_Age(j);array_Gender(i)=array_Gender(j);
call missing(array_Fname(j),array_Lname(j),array_Age(j),array_Gender(j));
leave;
end;
end;
end;
drop i j;
run;
Why?
Looks like a report to me. Try the PANELS= option on PROC REPORT to display the records in multiple columns on the page.
You would be well advised to change the plan. Create a data set with one observation per person:
fname, lname, age, gender
And add two variables:
family, criteria
That would be much more flexible and would support both analysis and reporting.
One way to start implementing @Astounding's suggestion:
data roster;
set have;
array fn fname_: ;/* or fname_1 - fname_9*/
array ln lname_: ;
array af age_: ;
/* similar for the repeated variables*/
do i= 1 to dim(fn);
if not missing(fn[i] ) then do;
firstname=fn[i];
lastname =ln[i];
age =ag[i];
/*continue with other arrays*/
output;
end;
end;
keep firstname lastname age ;
run;
the keep list should include any variables you want. Note the location of the OUTPUT. Any other modified/created/ manipulated variables you need for each individual need to be set at that point.
data work.a;
input
Fname_1 $ Lname_1 $ Age_1 Gender_1 $
Fname_2 $ Lname_2 $ Age_2 Gender_2 $
Fname_3 $ Lname_3 $ Age_3 Gender_3 $
Fname_4 $ Lname_4 $ Age_4 Gender_4 $
Fname_5 $ Lname_5 $ Age_5 Gender_5 $
Fname_6 $ Lname_6 $ Age_6 Gender_6 $
Fname_7 $ Lname_7 $ Age_7 Gender_7 $
Fname_8 $ Lname_8 $ Age_8 Gender_8 $
Fname_9 $ Lname_9 $ Age_9 Gender_9 $
;
cards;
Kate Black 20 F . . . . . . . . Louis White 35 M Kitty White 36 F . . . . . . . . . . . . . . . .
Lucy Red 18 F Tom Red 17 . . . . . . . . . . . . . . . . . Paul Red 76 M Sarah Red 77 F . . . .
. . . . . . . . . . . . Patricia Yellow 60 F Frank Yellow 62 M . . . . Kate Yellow 11 F . . . . . . . .
Jeff Blue 15 M Lillian Blue 25 F . . . . . . . . . . . . . . . . . . . . . . . . . . . .
;
run;
data work.result;
set work.a;
array array_Fname Fname:;
array array_Lname Lname:;
array array_Age Age:;
array array_Gender Gender:;
do i=1 to dim(array_Fname);
if missing(array_Fname(i)) then do j=i+1 to dim(array_Fname);
if not missing(array_Fname(j)) then do;
array_Fname(i)=array_Fname(j);array_Lname(i)=array_Lname(j);array_Age(i)=array_Age(j);array_Gender(i)=array_Gender(j);
call missing(array_Fname(j),array_Lname(j),array_Age(j),array_Gender(j));
leave;
end;
end;
end;
drop i j;
run;
Thanks learsaas this is exactly what I was looking for. This solution works perfectly!!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.