BookmarkSubscribeRSS Feed
mlogan
Lapis Lazuli | Level 10

Hi All,

I have students record where I have to replace them with proper country code(I have 280 country code), language code(20 language code) and citizenship_status(6 citizenship status). Is 'Format' an efficient way because I have such a long list for country?

Can you please tell me what should be the best way to proceed. Thanks,

 

ID      Last_Name    Country      Language    Citizenship_status

101   Mike               USA            English        Citizen          

102   Solomon        Canada       French        Permanent Resident 

103   Husain           Alzeria         Arabic         Refugee

 

Output table will look like this:

 

ID      Last_Name    Country      Language    Citizenship_status

101   Mike               234             6                  1          

102   Solomon        23               8                  2 

103   Husain           8                 1                  4

 

8 REPLIES 8
Haikuo
Onyx | Level 15

I vote for 'Format'.

Reeza
Super User

+1 for Format -> but making sure that you're using the CNTLIN option to create the formats from a table, rather than via typing in the codes. 

 

Example 4 here shows how to create the code for proc format

Example 8 shows how to use the CNTLIN data set for proc format.

 

http://www2.sas.com/proceedings/sugi30/001-30.pdf

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Personally I avoid formats where at all possible.  My suggestion would be to create a lookup dataset:

CODE_LIST       CODE       DECODE

COUNTRY         USA          6

COUNTRY         CANADA    8

...

LANGUAGE       ENGLISH   6

...

 

With this you can use is to either merge the data on, or as a lookup in SQL:

proc sql;
  create table WANT as
  select  A.ID,
A.LAST_NAME,
(select DECODE from CODES where upcase(CODE_LIST)="COUNTRY" and CODE=A.CODE) as COUNTRY_CODE
...

If you ever have to deal with outside technology, i.e. XML, databases etc. you will find the above a quick transition.

ballardw
Super User

@RW9 wrote:

Personally I avoid formats where at all possible.  My suggestion would be to create a lookup dataset:

CODE_LIST       CODE       DECODE

COUNTRY         USA          6

COUNTRY         CANADA    8

...

LANGUAGE       ENGLISH   6

...

 

With this you can use is to either merge the data on, or as a lookup in SQL:

proc sql;
  create table WANT as
  select  A.ID,
A.LAST_NAME,
(select DECODE from CODES where upcase(CODE_LIST)="COUNTRY" and CODE=A.CODE) as COUNTRY_CODE
...

If you ever have to deal with outside technology, i.e. XML, databases etc. you will find the above a quick transition.


 

If the data is somewhat volatile then a data set and merge or join as @RW9 suggests or hash is probably best. If the data actually fixed then I vote for format. And if the the is read from a "nice" data source possibly even an INFORMAT to create the codes when read avoiding an extra step of replacing them later.

mlogan
Lapis Lazuli | Level 10
Hi RW9, I can't see your whole code here. Would you please complete it for me. Thanks,
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

You can't see it because I didn't post any.  Its whats called a snippet.  The salient part of the topic was that you can keep your formats, codelist (whatever you call them) as data in datasets and "lookup" the value using either a subquery (which is this part in the snippet):

(select DECODE from CODES where upcase(CODE_LIST)="COUNTRY" and CODE=A.CODE) as COUNTRY_CODE

Or by merging the data on. 

Reeza
Super User

We can't provide code because you haven't provided data.

I'm assuming that you would have predefined coding systems for the countries and other variables. 

Or would at least need to generate them...it's a multistep process. 

jklaverstijn
Rhodochrosite | Level 12

This question is the SAS equivalent of "who's the best guitar player after Jimi Hendrix?" and "what is the best text editor?".

 

In this case the popular contendors for table lookup techniques are:

 

  • Datastep merge, key lookup
  • SQL
  • Formats
  • Hash table

It is a matter of taste, the impact on performance, maintainability and so on in relationship to your specific use case. No exact generally applicable answer can be given. Quite a few SUGI/SGF papers exist on the topic.

 

- Jan.

 

(PS: Obviously, David Gilmour is the best guitar player after Jimi Hendrix.)

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1118 views
  • 6 likes
  • 6 in conversation