BookmarkSubscribeRSS Feed
tofov2
Calcite | Level 5

Hello everyone.

 

I have large dataset (named "one"). I noticed that some of the observations (but not all) for a particular variable (named "variable") are incorrect. I created a second dataset (named "two") with the same variable and the correct observations.

 

How can I replace the observations for the specific variable in dataset one with those from dataset two while stilling keepign everything else the same in the original dataset (named "one")?

 

To explain in more detail, I have created the scenario below. Essentially, how do I replace the variable values for Carol and David (4 and 5 respectively) in dataset one with those in dataset two (3 and 4 respectively) without affecting Anne, Bill, and Erik or the inputs for Car?

 

data one;

input name$ variable car$;

datalines;

Anne    1     Acura

Bill        2    Buick

Carol    4    Cadillac

David   5    Dodge

Erik      5    Equus; run;

 

data two;

input name variable;

datalines;

Carol    3

David   4; run;

 

3 REPLIES 3
Reeza
Super User

The UPDATE statement does exactly this.

 

data one;
input name$ variable car$;
datalines;
Anne    1     Acura
Bill        2    Buick
Carol    4    Cadillac
David   5    Dodge
Erik      5    Equus
; 
run;

 

data two;
input name $ variable;
datalines;
Carol    3
David   4
; 
run;

 data three;
 update one two;
 by name;
 run;
 
ballardw
Super User

First, two issues with the example data. The datalines end with a semicolon on a row by itself. If you have the semicolon on a row with data that record will not be in the output data set. Second, you did not have Name as the same type.

 

data one;
input name$ variable car$;
datalines;
Anne    1     Acura
Bill        2    Buick
Carol    4    Cadillac
David   5    Dodge
Erik      5    Equus
; run;
 
data two;
input name $ variable;
datalines;
Carol    3
David   4
; run;

/* if not sorted by variables that uniquely identify records do so*/

data want;
   update one two;
   by name;
run;

The Update statement will replace values and add records from Two to the data set One. Match By variables that uniquely identify records in data set one. If the By variables duplicate in One this will fail.

The default behavior for Update is that missing values in the second set will not replace values in the first. If you want that behavior you would add the option UPDATEMODE=NOMISSINGCHECK.

 

Suggestion: You can replace the first the data set but I would create a new data set until you are very familiar with the behavior the update statement.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 449 views
  • 0 likes
  • 4 in conversation