BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
BW18
Fluorite | Level 6

Hi all, 

 

SAS amateur here! I am trying to figure out how to create 2 new variables and assign values to each observation based on the data contained in a reference file. I have created a mock example below. The first dataset is the main one I need to keep. It contains data on individual stores and several stores will have the same corporate ID. The second dataset is a reference that contains approximately 500 Corporate IDs and addresses, including the State and City. I would like to create 2 new variables for CorpState and CorpCity in Dataset 1 based on the CorpID contained in both files. Is this even possible to do and where do I start? Any help would be greatly appreciated. 

 

Dataset 1 (Have): Main dataset with individual store information

Obs#

CorpID

StoreID

StoreState

1

4002

40021234

AL

2

4002

40021237

MS

3

4003

40035678

TN

4

4100

41007890

WV

5

4110

41103456

NC

6

4110

41103478

NC

7

4597

45972345

WA

8

4975

49750023

WY

 

 

 Dataset 2 (Have): Reference dataset with corporate information

Obs# 

CorpID

CorpState

CorpCity

1

4002

AL

Birmingham

2

4003

GA

Atlanta

3

4004

TN

Nashville

4

4100

SC

Charleston

5

4110

NC

Charlotte

6

4111

FL

Tampa

7

4597

CA

San Francisco

8

4975

WY

Jackson

 

 

Want: Dataset 1 file containing store and corporate information

Obs#

CorpID

CorpState

CorpCity

StoreID

StoreState

1

4002

AL

Birmingham

40021234

AL

2

4002

AL

Birmingham

40021237

MS

3

4003

GA

Atlanta

40035678

TN

4

4100

SC

Charleston

41007890

WV

5

4110

NC

Charlotte

41103456

NC

6

4110

NC

Charlotte

41103478

NC

7

4597

CA

San Francisco

45972345

WA

8

4975

WY

Jackson

49750023

WY

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User
Proc sort data=data1;
   by corpid;
run;

Proc sort data=data2;
  by corpid;
run;

data want;
   merge data1 data2;
   by corpid;
run;

Basic match merge. To merge sets on the value of one or more variables generally sort by them and then merge as shown.

 

Proc Sql can also do a join but can require a bit more typing.

View solution in original post

11 REPLIES 11
PaigeMiller
Diamond | Level 26

NOTE: UNTESTED CODE as the data is not in a usable form. This is a MERGE in a DATA step:

 

data want;
    merge have1(in=in1) have2;
    by corpid;
    if in1;
run;

 

This assumes both data sets are sorted by CORPID

--
Paige Miller
ballardw
Super User
Proc sort data=data1;
   by corpid;
run;

Proc sort data=data2;
  by corpid;
run;

data want;
   merge data1 data2;
   by corpid;
run;

Basic match merge. To merge sets on the value of one or more variables generally sort by them and then merge as shown.

 

Proc Sql can also do a join but can require a bit more typing.

BW18
Fluorite | Level 6

So basically I was overthinking by trying to create a new variable and assign values when all I needed to do was merge and keep the variables I want or drop those I don't. Thank you!

 

Unfortunately, I am still having an issue with the merge. It keeps saying that the CorpID is being treated as both character and numeric. I used the following code to convert it to character to match the main dataset, sort the data, and try to merge.  The proc contents results show it as character following the first code, but when I tried to merge, I am still getting the same error. 

 

/* Change the CorpID variable in CorpAddresses from Numeric to character and end up with the original variable name */
data N.CorpAdresses;
set N.CorpAddresses;
CorpID_char = put(CorpID, 4.);
drop CorpID;
rename CorpID_char = CorpID;
run;
proc contents data=N.CorpAdresses;
run;

 


Proc sort data=N.StoreInformation;
by CorpID;
run;

Proc sort data=N.CorpAddresses;
by CorpID;
run;

 

data StoreInfoWithCorp;
merge N.StoreInformation N.CorpAddresses;
by CorpID;
run;

 

ERROR: Variable CorpID has been defined as both character and numeric.

PaigeMiller
Diamond | Level 26

@BW18 wrote:

It keeps saying that the CorpID is being treated as both character and numeric.


This is not allowed, as you have discovered. We can't really debug your code as we don't have the data, but still one of the two data sets has numeric CorpID and one has character CorpID.

 

We cannot use data provided as you did in your initial post. Data must be provided as working SAS data step code, following these instructions and examples. Data provided via any other method (like in your first post) is unreliable, and may fail to represent your actual SAS data sets. So I am going to have to insist on you providing data using the instructions at the link.

 

 

--
Paige Miller
Tom
Super User Tom
Super User

Why did you make it numeric in one dataset and character in the other?

I would suggest you fix that first.

 

In general ID values should be character as you will not be doing arithmetic with the values.

But in converting from numeric to character you need take care that you generate the digit strings in the same way so they match.  Values like '123', '  123', and '0123' might look like the same number to a human but will be three different values as a character string.

BW18
Fluorite | Level 6
I did not purposely make it numeric in one dataset and character in the other. As I said previously, I tried to convert the one that SAS was treating as numeric to character to match the main dataset. This data was imported from a text file, so I am not sure if that is why it was reading it as numeric instead of character.
Tom
Super User Tom
Super User

@BW18 wrote:
I did not purposely make it numeric in one dataset and character in the other. As I said previously, I tried to convert the one that SAS was treating as numeric to character to match the main dataset. This data was imported from a text file, so I am not sure if that is why it was reading it as numeric instead of character.

If you are reading in a TEXT file then you have control over how the variables are created.

If you use PROC IMPORT to read the file then it will have to GUESS how to read the file and the only information is has is what it sees in that particular file.

If you are not sure how to read in a text file you could use the code that PROC IMPORT generated as a starting point.  (But note that PROC IMPORT generates poorly designed code that attaches INFORMATs and FORMATs to many variables that do not need them.)

 

In general to read a delimited file (like a CSV file) the code just needs to look like:

data want;
  infile 'myfile.txt' dsd truncover firstobs=2;
  length varA $20 varB 8 varC $5 ;
  input varA -- varC ;
run;

The DATA statement names the dataset to create. 

 

The INFILE statement names the file to read and sets some processing options. If the delimiter is something other than a comma add the DLM= option to specify the delimiter.  If there is no header (or the header is mote than one line) then remove the FIRSTOBS= option or change the number.  If the lines are longer than 32K bytes then add the LRECL= option to something that is longer than the longest possible line.

 

The LENGTH statement defines the variables type and storage length.  If you define the variable is the order they appear then the INPUT statement can just use the simple positional variable list as shown.

 

If any of the variables require special instructions for converting the text in the file into the value to be stored then add an INFORMAT statement and attach the needed informat.  Most variables do not need this.  The main exceptions are DATE, TIME and DATETIME values.  You might also see some text files that have thousands separators or $ added for numbers which can be read using the COMMA informat.

 

Similarly for FORMAT statement.  Most variables will not need to have a format attached.  The main exception is DATE, TIME and DATETIME values.

 

You could also add LABEL statement to attach a descriptive label to one or more of the variables.

Tom
Super User Tom
Super User

So for example if you have this CSV file.  Let's call it main.csv.

CorpID,StoreID,StoreState
4002,40021234,AL
4002,40021237,MS
4003,40035678,TN
4100,41007890,WV
4110,41103456,NC
4110,41103478,NC
4597,45972345,WA
4975,49750023,WY

Then the code to make your MAIN dataset would simply be:

data main;
  infile 'main.csv' dsd truncover firstobs=2;
  length CorpID $20 StoreID $20 StoreState $2 ;
  input CorpID -- StoreState;
run;

Which is almost less code that it would take to call PROC IMPORT to make guesses about what data the file has.

Patrick
Opal | Level 21

@BW18 

Please share the output running below code:

proc contents data=N.CorpAdresses(keep=CorpID);
run;
proc contents data=N.StoreInformation(keep=CorpID);
run;

 

Do your CorpId values in the data set where the variable is of type character have any leading zeros?

 

BW18
Fluorite | Level 6

There are no leading zeros in the CorpIDs. They range from 4001-4975.  However, I closed out SAS and reimported the text file containing CorpAddresses. I performed the transformation to a character variable for CorpIDs and it appeared to work this time. Following this, I was able to perform the merge successfully. Thank you!

Patrick
Opal | Level 21

@BW18 wrote:

There are no leading zeros in the CorpIDs. They range from 4001-4975.  However, I closed out SAS and reimported the text file containing CorpAddresses. I performed the transformation to a character variable for CorpIDs and it appeared to work this time. Following this, I was able to perform the merge successfully. Thank you!


Good that you've got something working.

Just as a comment: Ideally you read the source string already into a character variable as part of the import step. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 678 views
  • 7 likes
  • 5 in conversation