Hi All,
I have students record where I have to replace them with proper country code(I have 280 country code), language code(20 language code) and citizenship_status(6 citizenship status). Is 'Format' an efficient way because I have such a long list for country?
Can you please tell me what should be the best way to proceed. Thanks,
ID Last_Name Country Language Citizenship_status
101 Mike USA English Citizen
102 Solomon Canada French Permanent Resident
103 Husain Alzeria Arabic Refugee
Output table will look like this:
ID Last_Name Country Language Citizenship_status
101 Mike 234 6 1
102 Solomon 23 8 2
103 Husain 8 1 4
I vote for 'Format'.
+1 for Format -> but making sure that you're using the CNTLIN option to create the formats from a table, rather than via typing in the codes.
Example 4 here shows how to create the code for proc format
Example 8 shows how to use the CNTLIN data set for proc format.
http://www2.sas.com/proceedings/sugi30/001-30.pdf
Personally I avoid formats where at all possible. My suggestion would be to create a lookup dataset:
CODE_LIST CODE DECODE
COUNTRY USA 6
COUNTRY CANADA 8
...
LANGUAGE ENGLISH 6
...
With this you can use is to either merge the data on, or as a lookup in SQL:
proc sql; create table WANT as select A.ID,
A.LAST_NAME,
(select DECODE from CODES where upcase(CODE_LIST)="COUNTRY" and CODE=A.CODE) as COUNTRY_CODE
...
If you ever have to deal with outside technology, i.e. XML, databases etc. you will find the above a quick transition.
@RW9 wrote:
Personally I avoid formats where at all possible. My suggestion would be to create a lookup dataset:
CODE_LIST CODE DECODE
COUNTRY USA 6
COUNTRY CANADA 8
...
LANGUAGE ENGLISH 6
...
With this you can use is to either merge the data on, or as a lookup in SQL:
proc sql; create table WANT as select A.ID,
A.LAST_NAME,
(select DECODE from CODES where upcase(CODE_LIST)="COUNTRY" and CODE=A.CODE) as COUNTRY_CODE
...If you ever have to deal with outside technology, i.e. XML, databases etc. you will find the above a quick transition.
If the data is somewhat volatile then a data set and merge or join as @RW9 suggests or hash is probably best. If the data actually fixed then I vote for format. And if the the is read from a "nice" data source possibly even an INFORMAT to create the codes when read avoiding an extra step of replacing them later.
Hi,
You can't see it because I didn't post any. Its whats called a snippet. The salient part of the topic was that you can keep your formats, codelist (whatever you call them) as data in datasets and "lookup" the value using either a subquery (which is this part in the snippet):
(select DECODE from CODES where upcase(CODE_LIST)="COUNTRY" and CODE=A.CODE) as COUNTRY_CODE
Or by merging the data on.
We can't provide code because you haven't provided data.
I'm assuming that you would have predefined coding systems for the countries and other variables.
Or would at least need to generate them...it's a multistep process.
This question is the SAS equivalent of "who's the best guitar player after Jimi Hendrix?" and "what is the best text editor?".
In this case the popular contendors for table lookup techniques are:
It is a matter of taste, the impact on performance, maintainability and so on in relationship to your specific use case. No exact generally applicable answer can be given. Quite a few SUGI/SGF papers exist on the topic.
- Jan.
(PS: Obviously, David Gilmour is the best guitar player after Jimi Hendrix.)
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.