DATA Step, Macro, Functions and more

Replacing variable with corresponding code

Reply
Regular Contributor
Posts: 215

Replacing variable with corresponding code

[ Edited ]

Hi All,

I have students record where I have to replace them with proper country code(I have 280 country code), language code(20 language code) and citizenship_status(6 citizenship status). Is 'Format' an efficient way because I have such a long list for country?

Can you please tell me what should be the best way to proceed. Thanks,

 

ID      Last_Name    Country      Language    Citizenship_status

101   Mike               USA            English        Citizen          

102   Solomon        Canada       French        Permanent Resident 

103   Husain           Alzeria         Arabic         Refugee

 

Output table will look like this:

 

ID      Last_Name    Country      Language    Citizenship_status

101   Mike               234             6                  1          

102   Solomon        23               8                  2 

103   Husain           8                 1                  4

 

Respected Advisor
Posts: 3,124

Re: Replacing variable with corresponding code

I vote for 'Format'.

Super User
Posts: 17,784

Re: Replacing variable with corresponding code

+1 for Format -> but making sure that you're using the CNTLIN option to create the formats from a table, rather than via typing in the codes. 

 

Example 4 here shows how to create the code for proc format

Example 8 shows how to use the CNTLIN data set for proc format.

 

http://www2.sas.com/proceedings/sugi30/001-30.pdf

 

 

Super User
Super User
Posts: 7,392

Re: Replacing variable with corresponding code

Personally I avoid formats where at all possible.  My suggestion would be to create a lookup dataset:

CODE_LIST       CODE       DECODE

COUNTRY         USA          6

COUNTRY         CANADA    8

...

LANGUAGE       ENGLISH   6

...

 

With this you can use is to either merge the data on, or as a lookup in SQL:

proc sql;
  create table WANT as
  select  A.ID,
A.LAST_NAME,
(select DECODE from CODES where upcase(CODE_LIST)="COUNTRY" and CODE=A.CODE) as COUNTRY_CODE
...

If you ever have to deal with outside technology, i.e. XML, databases etc. you will find the above a quick transition.

Super User
Posts: 10,483

Re: Replacing variable with corresponding code


RW9 wrote:

Personally I avoid formats where at all possible.  My suggestion would be to create a lookup dataset:

CODE_LIST       CODE       DECODE

COUNTRY         USA          6

COUNTRY         CANADA    8

...

LANGUAGE       ENGLISH   6

...

 

With this you can use is to either merge the data on, or as a lookup in SQL:

proc sql;
  create table WANT as
  select  A.ID,
A.LAST_NAME,
(select DECODE from CODES where upcase(CODE_LIST)="COUNTRY" and CODE=A.CODE) as COUNTRY_CODE
...

If you ever have to deal with outside technology, i.e. XML, databases etc. you will find the above a quick transition.


 

If the data is somewhat volatile then a data set and merge or join as @RW9 suggests or hash is probably best. If the data actually fixed then I vote for format. And if the the is read from a "nice" data source possibly even an INFORMAT to create the codes when read avoiding an extra step of replacing them later.

Regular Contributor
Posts: 215

Re: Replacing variable with corresponding code

Hi RW9, I can't see your whole code here. Would you please complete it for me. Thanks,
Super User
Super User
Posts: 7,392

Re: Replacing variable with corresponding code

Hi,

 

You can't see it because I didn't post any.  Its whats called a snippet.  The salient part of the topic was that you can keep your formats, codelist (whatever you call them) as data in datasets and "lookup" the value using either a subquery (which is this part in the snippet):

(select DECODE from CODES where upcase(CODE_LIST)="COUNTRY" and CODE=A.CODE) as COUNTRY_CODE

Or by merging the data on. 

Super User
Posts: 17,784

Re: Replacing variable with corresponding code

We can't provide code because you haven't provided data.

I'm assuming that you would have predefined coding systems for the countries and other variables. 

Or would at least need to generate them...it's a multistep process. 

Super Contributor
Posts: 408

Re: Replacing variable with corresponding code

This question is the SAS equivalent of "who's the best guitar player after Jimi Hendrix?" and "what is the best text editor?".

 

In this case the popular contendors for table lookup techniques are:

 

  • Datastep merge, key lookup
  • SQL
  • Formats
  • Hash table

It is a matter of taste, the impact on performance, maintainability and so on in relationship to your specific use case. No exact generally applicable answer can be given. Quite a few SUGI/SGF papers exist on the topic.

 

- Jan.

 

(PS: Obviously, David Gilmour is the best guitar player after Jimi Hendrix.)

Ask a Question
Discussion stats
  • 8 replies
  • 413 views
  • 6 likes
  • 6 in conversation