G'd a y. I need to create a single list from two separate lists. One list has >10,000 observations; the other has a couple thousand.
Both lists have 7 digit character ID#s (I will list as single digit ID# below to make it easier to understand) . Some IDs match and some don't between the two lists.
I'd like to keep the ID, Name, Phys Street, PhysCity, PhysZip, and Email for the final list. How would I get a final list where each ID is listed once?
List One
ID SchoolName PhysStreet PhysCity PhsZip Email
1
2
3
4
5
List Two
ID SchoolName PhysStreet PhysCity PhsZip Email
2
4
5
9
8
Any help you can give is much appreciated!
proc sort data=one;
by id;
run;
proc sort data=two;
by id;
run;
data want;
update one two;
by id;
run;
Appreciate your response! Correct me if I'm wrong, but I thought the update statetment updates values in dataset one with the values in dataset two that match but doesn't add the observations from dataset two into dataset one that don't match?
Final List
ID SchoolName PhysStreet PhysCity PhsZip Email
1
2
3
4
5
9
8
It does both. Observations in list one will be updated across the variables with the values of list two based on common ids and new ids from list two will also be added to list one to make it final list. I am assuming that ids are unique in both list one and list two.
Appreciate you helping me understand the UPDATE statement better!!
If the IDS match between lists, will the corresponding information, ie school name, street address automatically match as well?
If not, how do you want to deal with that? If using the second file is fine the solution above is appropriate. If you want the first file, flip the data sets on the update statement. If its some other logic you'll need different code.
Good point. The addresses most likely will match between the two lists. How can you tell if they don't match (i.e, keep the ones in the final list that have the same ID but different addresses)?
If this is manually entered data you'll also have issue with the address, where one person spells out street and the other uses st.
I would append and take a look and see if the multiples have differences first to determine what needs to be done to clean the data. The check dataset below will be all ID's with multiple records.
data temp;
set one two indsname=source;
DSET=source;
run;
proc sort data=temp; by id schoolname phystreet;
run;
data check;
set temp;
by id;
if first.id ne last.id;
run;
Great -- appreciate you helping understand how to check the dataset!!
If I understood what you mean.
data a; set sashelp.class; id+1; run; data b; set sashelp.class end=last; id+1; output; if last then do;id+1;name='Arthur.T'; output;end; run; data once; set a b; by id; if first.id and last.id ; run;
Xia Keshan
Ksharp wrote:
If I understood what you mean.
data a; set sashelp.class; id+1; run; data b; set sashelp.class end=last; id+1; output; if last then do;id+1;name='Arthur.T'; output;end; run; data once; set a b; by id; if first.id and last.id ; run;Xia Keshan
What does the following code refer to or do? name='Arthur.T';
It adds a record in to data set B, with the name set as Arthur.T, this record would not be data set A.
It's a way of simulating your data.
Thanks. Reeza . Yes. I just make a dummy table to simulate your data .
Reeza , You gonna surpass Arthur.T to be number one of sas user list . Congratulations !
Xia Keshan
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.