Really a newbie here to SAS and just getting familiar with SAS Enterprise Guide. I am trying to merge multiple records into one record based on an ID field that is in a text format. Along with the ID variables included in the csv file I imported I also have a STATE field. The STATE field provided the 2 digit state abbreviation for most of the records, but some of the records are missing this value.
I'd like to merge the records to create a new dataset. For example, the current table looks like this:
However, I would like to change this so that no record is duplicated. I would like to assign values as follows:
0 - All of the STATE values associated with the ID are blank.
1 - If the STATE value(s) associated with the record equals TX and no other value is present for the STATE field.
2 - If the STATE field includes one or more STATE values and none of the the STATE value(s) associated with the record equals TX.
3 - If the STATE field includes values for both TX and at least one other state.
The new table would look like this:
Any guidance and help would be appreciated. I don't see a way to do this using the EG wizards, but could be missing it.
Thanks!
data have;
infile cards dlm=',';
input (ID STATE) (:$100.);
cards;
KREE990001,TX
KREE990001,TX
KREE990001,
GHTR880001,
GHTR880001,
JHRT330001,NC
JHRT330001,
WQUT110001,TX
WQUT110001,OH
;
proc sort data=have;
by id;
run;
data want;
c=0;
do n=1 by 1 until(last.id);
set have;
by id ;
if missing(state) then c+1;
if state='TX' then tx=1;
else if not missing(state) then no_tx=1;
end;
do until(last.id);
set have;
by id;
if n=c then _state=0;
else if tx and not no_tx then _state=1;
else if not tx and no_tx then _state=2;
else if tx and no_tx then _state=3;
end;
keep id _state;
run;
proc print;run;
SAS Output
GHTR880001 | 0 |
JHRT330001 | 2 |
KREE990001 | 1 |
WQUT110001 | 3 |
Please post the sample as plain text for community members to easily copy paste
I'd like to merge the records to create a new dataset. For example, the current data looks like this:
ID,STATE
KREE990001,TX
KREE990001,TX
KREE990001,
GHTR880001,
GHTR880001,
JHRT330001,NC
JHRT330001,
WQUT110001,TX
WQUT110001,OH
However, I would like to change this so that no record is duplicated. I would like to assign values as follows:
0 - All of the STATE values associated with the ID are blank.
1 - If the STATE value(s) associated with the record equals TX and no other value is present for the STATE field.
2 - If the STATE field includes one or more STATE values and none of the the STATE value(s) associated with the record equals TX.
3 - If the STATE field includes values for both TX and at least one other state.
The new data would look like this:
ID,STATE
KREE990001,1
GHTR880001,0
JHRT330001,2
WQUT110001,3
data have;
infile cards dlm=',';
input (ID STATE) (:$100.);
cards;
KREE990001,TX
KREE990001,TX
KREE990001,
GHTR880001,
GHTR880001,
JHRT330001,NC
JHRT330001,
WQUT110001,TX
WQUT110001,OH
;
proc sort data=have;
by id;
run;
data want;
c=0;
do n=1 by 1 until(last.id);
set have;
by id ;
if missing(state) then c+1;
if state='TX' then tx=1;
else if not missing(state) then no_tx=1;
end;
do until(last.id);
set have;
by id;
if n=c then _state=0;
else if tx and not no_tx then _state=1;
else if not tx and no_tx then _state=2;
else if tx and no_tx then _state=3;
end;
keep id _state;
run;
proc print;run;
SAS Output
GHTR880001 | 0 |
JHRT330001 | 2 |
KREE990001 | 1 |
WQUT110001 | 3 |
Solution worked perfectly - thank you!
Not sure if I need to post this as a new, separate question, but since it is related to the prior solution I thought I'd add it here.
Using the problem identified, but adding one additional variable I'd like to do the same thing but unduplicate the records factoring in the additional variable. For example, the current data looks like this:
ID,STATE,YEAR
KREE990001,TX,2017
KREE990001,TX,2017
KREE990001,TX,2017
KREE990001,TX,2017
KREE990001,TX,2016
KREE990001,TX,2016
KREE990001,
GHTR880001,
GHTR880001,
JHRT330001,NC,2016
JHRT330001,
WQUT110001,TX,2017
WQUT110001,OH,2017
WQUT110001,OH,2017
WQUT110001,OH,2017
WQUT110001,TX,2016
WQUT110001,TX,2016
However, I would like to change this so that no record is duplicated in a single YEAR. I would like to assign values as follows taking the YEAR variable into consideration:
0 - All of the STATE values associated with the ID are blank.
1 - If the STATE value(s) associated with the record equals TX and no other value is present for the STATE field in that YEAR.
2 - If the STATE field includes one or more STATE values and none of the STATE value(s) associated with the record equals TX in that YEAR.
3 - If the STATE field includes values for both TX and at least one other state in that YEAR.
The new data would look like this:
ID,STATE,YEAR
KREE990001,1,2017
KREE990001,1,2016
KREE990001,0
GHTR880001,0
JHRT330001,2,2016
JHRT330001,0
WQUT110001,3,2017
WQUT110001,1,2016
Could the previous solution be modified to accomplish this or is a different method recommended?
@runningjay Sure, will give you the solution. However, I am a little busy today as I am have an exam at my college. If it;s not urgent, can you hang in there for a day plz
Sure - no problem. Appreciate your expertise and insight. Thanks!
@runningjay First off, sorry for the delay as I was feeling very down on account of my poor performance on my college exam last night. My apologies.
Ok,Anyways back to question
Just a minor change to include year in by group processing. See if this works and let me know. Also, I wonder why you have missed values in year column. That makes it inconceivable to do any great analysis in the future using the dataset. Well!
data have;
infile cards dlm=',' truncover;
input (ID STATE) (:$50.) year ;
cards;
KREE990001,TX,2017
KREE990001,TX,2017
KREE990001,TX,2017
KREE990001,TX,2017
KREE990001,TX,2016
KREE990001,TX,2016
KREE990001,
GHTR880001,
GHTR880001,
JHRT330001,NC,2016
JHRT330001,
WQUT110001,TX,2017
WQUT110001,OH,2017
WQUT110001,OH,2017
WQUT110001,OH,2017
WQUT110001,TX,2016
WQUT110001,TX,2016
;
proc sort data=have;
by id year;
run;
data want;
c=0;
do n=1 by 1 until(last.year);
set have;
by id year;
if missing(state) then c+1;
if state='TX' then tx=1;
else if not missing(state) then no_tx=1;
end;
do until(last.year);
set have;
by id year;
if n=c then _state=0;
else if tx and not no_tx then _state=1;
else if not tx and no_tx then _state=2;
else if tx and no_tx then _state=3;
end;
keep id _state year;
run;
Perfect - many thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.