I have a dataset in the following format:
ID | Counts | State |
2997 | 18 | FL |
2997 | 4 | TX |
2997 | 31 | NM |
2997 | 17 | NY |
2997 | 11 | GA |
2998 | 32 | FL |
2998 | 48 | NY |
2998 | 36 | CA |
2998 | 4 | GA |
2998 | 4 | NM |
2999 | 12 | FL |
2999 | 42 | CA |
2999 | 67 | GA |
2999 | 23 | NM |
2999 | 9 | TX |
How do I manipulate it using proc SQL (etc.) if I want it like this?
FL | TX | NM | NY | GA | NY | CA | |
2997 | |||||||
2998 | |||||||
2999 |
Thank you all in advance.
proc transpose
by id
var counts
id state
proc transpose
by id
var counts
id state
use proc transpose
Don't do it with PROC SQL. Use PROC TRANSPOSE.
data have;
input ID Counts State $;
cards;
2997 18 FL
2997 4 TX
2997 31 NM
2997 17 NY
2997 11 GA
2998 32 FL
2998 48 NY
2998 36 CA
2998 4 GA
2998 4 NM
2999 12 FL
2999 42 CA
2999 67 GA
2999 23 NM
2999 9 TX
;
proc transpose data=have out=want;
by id;
id state;
var counts;
run;
I think most users in here think you should use Proc Transpose 🙂
But why on earth would you want to use Proc SQL anyway?
proc sql;
select distinct state into :state separated by ' '
from have;
quit;
%put &state;
data want1;
do _n_=1 by 1 until(last.id);
set have;
by id;
array st &state;
if _n_=1 then call missing(of st(*));
do _i_=1 to dim(st);
if vname(st(_i_))=state then st(_i_)=counts;
end;
end;
run;
As everyone suggested, this is easy and clean using PROC transpose. But if you want to do in PROC sql. you can do it if you have one distinct state per id by lot of hard coding as shown below
proc sql;
create table want as
select id,
max( case when state = "FL" then counts else 0 end) as FL,
max( case when state = "TX" then counts else 0 end) as TX,
max( case when state = "NM" then counts else 0 end) as NM,
max( case when state = "NY" then counts else 0 end) as NY,
max( case when state = "GA" then counts else 0 end) as GA,
max( case when state = "CA" then counts else 0 end) as CA
from have
group by 1;
Thank you for all your quick and helpful responses. The proc transpose threw up some error messages saying ' The ID value "FL" occurs twice in the same BY group'.
The PROC SQL worked fine for this purpose. I didn't try the second suggested SQL procedure since the first worked for my data.
proc sql;
select distinct state into :state separated by ' '
from have;
quit;
%put &state;
data want1;
do _n_=1 by 1 until(last.id);
set have;
by id;
array st &state;
if _n_=1 then call missing(of st(*));
do _i_=1 to dim(st);
if vname(st(_i_))=state then st(_i_)=counts;
end;
end;
run;
I appreciate your efforts. Thank you all once again.
@vdfdd: Check your input dataset! Sounds like you have multiple records for one or more IDs for at least one of your by groups.
Either that's in error, or you have to sum the same-state records before transposing the dataset.
Art, CEO, AnalystFinder.com
This error message from proc transpose also means that your results from the other approaches will not be correct, as values will be overwritten and lost. Another reason why proc transpose is THE tool for this, as it will alert you.
When I added the Let option to the proc transpose, I got the following:
WARNING: 4 observations omitted due to missing ID values.
NOTE: There were 99187 observations read from the data set WORK.CLEANY.
NOTE: The data set WORK.WANTY has 26568 observations and 38 variables.
The same number of observations as the proc sql. Any thought on this?
So you have missing ID values. Decide how you want to treat those.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.