BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
runningjay
Fluorite | Level 6

Really a newbie here to SAS and just getting familiar with SAS Enterprise Guide. I am trying to merge multiple records into one record based on an ID field that is in a text format. Along with the ID variables included in the csv file I imported I also have a STATE field. The STATE field provided the 2 digit state abbreviation for most of the records, but some of the records are missing this value.

 

I'd like to merge the records to create a new dataset. For example, the current table looks like this:

 current table.jpg

 

However, I would like to change this so that no record is duplicated. I would like to assign values as follows:

0 - All of the STATE values associated with the ID are blank.

1 - If the STATE value(s) associated with the record equals TX and no other value is present for the STATE field.

2 - If the STATE field includes one or more STATE values and none of the the STATE value(s) associated with the record equals TX.

3 - If the STATE field includes values for both TX and at least one other state.

 

The new table would look like this:

desired table.jpg

 

Any guidance and help would be appreciated. I don't see a way to do this using the EG wizards, but could be missing it.

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data have;
infile cards dlm=',';
input (ID STATE) (:$100.);
cards;
KREE990001,TX
KREE990001,TX
KREE990001,   
GHTR880001,   
GHTR880001,   
JHRT330001,NC
JHRT330001,   
WQUT110001,TX
WQUT110001,OH
;
proc sort data=have;
by id;
run;
data want;
c=0;
do n=1 by 1 until(last.id);
set have;
by id ;
if missing(state) then c+1;
if state='TX' then tx=1;
else if not missing(state) then no_tx=1;
end;
do until(last.id);
set have;
by id;
if n=c then _state=0;
else if tx and not no_tx then _state=1;
else if not tx and no_tx then _state=2;
else if tx and no_tx then _state=3;
end;
keep id _state;
run;

proc print;run;

SAS Output

Obs ID _state1234
GHTR8800010
JHRT3300012
KREE9900011
WQUT1100013

View solution in original post

8 REPLIES 8
novinosrin
Tourmaline | Level 20

Please post the sample as plain text for community members to easily copy paste 

runningjay
Fluorite | Level 6

I'd like to merge the records to create a new dataset. For example, the current data looks like this:

ID,STATE
KREE990001,TX
KREE990001,TX
KREE990001,   
GHTR880001,   
GHTR880001,   
JHRT330001,NC
JHRT330001,   
WQUT110001,TX
WQUT110001,OH

However, I would like to change this so that no record is duplicated. I would like to assign values as follows:

0 - All of the STATE values associated with the ID are blank.
1 - If the STATE value(s) associated with the record equals TX and no other value is present for the STATE field.
2 - If the STATE field includes one or more STATE values and none of the the STATE value(s) associated with the record equals TX.
3 - If the STATE field includes values for both TX and at least one other state.

The new data would look like this:

ID,STATE
KREE990001,1
GHTR880001,0
JHRT330001,2
WQUT110001,3

novinosrin
Tourmaline | Level 20
data have;
infile cards dlm=',';
input (ID STATE) (:$100.);
cards;
KREE990001,TX
KREE990001,TX
KREE990001,   
GHTR880001,   
GHTR880001,   
JHRT330001,NC
JHRT330001,   
WQUT110001,TX
WQUT110001,OH
;
proc sort data=have;
by id;
run;
data want;
c=0;
do n=1 by 1 until(last.id);
set have;
by id ;
if missing(state) then c+1;
if state='TX' then tx=1;
else if not missing(state) then no_tx=1;
end;
do until(last.id);
set have;
by id;
if n=c then _state=0;
else if tx and not no_tx then _state=1;
else if not tx and no_tx then _state=2;
else if tx and no_tx then _state=3;
end;
keep id _state;
run;

proc print;run;

SAS Output

Obs ID _state1234
GHTR8800010
JHRT3300012
KREE9900011
WQUT1100013
runningjay
Fluorite | Level 6

Solution worked perfectly - thank you!

 

Not sure if I need to post this as a new, separate question, but since it is related to the prior solution I thought I'd add it here.

 

Using the problem identified, but adding one additional variable I'd like to do the same thing but unduplicate the records factoring in the additional variable. For example, the current data looks like this:

 

ID,STATE,YEAR

KREE990001,TX,2017

KREE990001,TX,2017

KREE990001,TX,2017

KREE990001,TX,2017

KREE990001,TX,2016

KREE990001,TX,2016

KREE990001,  

GHTR880001,  

GHTR880001,  

JHRT330001,NC,2016

JHRT330001,  

WQUT110001,TX,2017

WQUT110001,OH,2017

WQUT110001,OH,2017

WQUT110001,OH,2017

WQUT110001,TX,2016

WQUT110001,TX,2016

 

However, I would like to change this so that no record is duplicated in a single YEAR. I would like to assign values as follows taking the YEAR variable into consideration:

 

0 - All of the STATE values associated with the ID are blank.

1 - If the STATE value(s) associated with the record equals TX and no other value is present for the STATE field in that YEAR.

2 - If the STATE field includes one or more STATE values and none of the STATE value(s) associated with the record equals TX in that YEAR.

3 - If the STATE field includes values for both TX and at least one other state in that YEAR.

 

The new data would look like this:

 

ID,STATE,YEAR

KREE990001,1,2017

KREE990001,1,2016

KREE990001,0

GHTR880001,0

JHRT330001,2,2016

JHRT330001,0

WQUT110001,3,2017

WQUT110001,1,2016

 

Could the previous solution be modified to accomplish this or is a different method recommended?

novinosrin
Tourmaline | Level 20

@runningjay Sure, will give you the solution. However, I am a little busy today as I am have an exam at my college. If it;s not urgent, can you hang in there for a day plz

runningjay
Fluorite | Level 6

Sure - no problem. Appreciate your expertise and insight. Thanks!

novinosrin
Tourmaline | Level 20

@runningjay First off, sorry for the delay as I was feeling very down on account of my poor performance on my college exam last night. My apologies. 

Ok,Anyways back to question

Just a minor change to include year in by group processing. See if this works and let me know. Also, I wonder why you have missed values in year column. That makes it inconceivable to do any great analysis in the future using the dataset. Well!

 

data have;
infile cards dlm=',' truncover;
input (ID STATE) (:$50.) year ;
cards;
KREE990001,TX,2017
KREE990001,TX,2017
KREE990001,TX,2017
KREE990001,TX,2017
KREE990001,TX,2016
KREE990001,TX,2016
KREE990001,  
GHTR880001,  
GHTR880001,  
JHRT330001,NC,2016
JHRT330001,  
WQUT110001,TX,2017
WQUT110001,OH,2017
WQUT110001,OH,2017
WQUT110001,OH,2017
WQUT110001,TX,2016
WQUT110001,TX,2016
;

proc sort data=have;
by id year;
run;
data want;
c=0;
do n=1 by 1 until(last.year);
set have;
by id year;
if missing(state) then c+1;
if state='TX' then tx=1;
else if not missing(state) then no_tx=1;
end;
do until(last.year);
set have;
by id year;
if n=c then _state=0;
else if tx and not no_tx then _state=1;
else if not tx and no_tx then _state=2;
else if tx and no_tx then _state=3;
end;
keep id _state year;
run;

 

runningjay
Fluorite | Level 6

Perfect - many thanks!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1527 views
  • 1 like
  • 2 in conversation