Dears,
I am new to sas so please accept my appologies if you feel the question is a little too basis. Here is the problematic I am facing:
I am currently working with a fairly large dataset wich contains a column with an extended key or longkey variable. The longkey variable is always the concatenation of a prefix, a short key and a suffix. To a short key always corresponds a single longkey, i.e the associated prefix and suffix only depend on the short key. Here is a sample dataset:
data longKeys;
input @1 longkey $14.
@16 value 8.;
datalines;
HLKB_key7_BEGO 80926681
KGEH_key4_HLQB 88209836
KGEH_key9_HLQB 94053433
CHOA_key6_ABKQ 82703104
CHOA_key3_ABKQ 92174044
CHOA_key3_ABKQ 35849309
ABCD_key0_DFHG 17346360
CHOA_key6_ABKQ 28834207
ABCD_key0_DFHG 23090564
QOAG_key2_GCBL 74019736
KGEH_key4_HLQB 19951212
HLKB_key7_BEGO 11353465
CHOA_key3_ABKQ 33063093
CHOA_key1_ABKQ 28276788
QOAG_key2_GCBL 29987914
HLKB_key5_BEGO 07296540
CHOA_key3_ABKQ 74638169
CHOA_key3_ABKQ 71900903
ABCQ_key8_QOHG 90651346
CHOA_key3_ABKQ 48446188
KGEH_key9_HLQB 64205650
ABCD_key0_DFHG 58012348
HLKB_key7_BEGO 73923915
KGEH_key4_HLQB 97173720
QOAG_key2_GCBL 51450320
ABCQ_key8_QOHG 56699883
CHOA_key1_ABKQ 64852517
QOAG_key2_GCBL 74892582
HLKB_key7_BEGO 08070758
CHOA_key3_ABKQ 46544906
CHOA_key3_ABKQ 22765177
ABCQ_key8_QOHG 36334812
ABCQ_key8_QOHG 78318514
KGEH_key4_HLQB 17058785
HLKB_key7_BEGO 15515760
ABCQ_key8_QOHG 55507915
HLKB_key5_BEGO 74279902
ABCD_key0_DFHG 52472477
ABCD_key0_DFHG 52305871
CHOA_key3_ABKQ 49451187
ABCQ_key8_QOHG 59462988
CHOA_key3_ABKQ 08492549
HLKB_key7_BEGO 02140768
QOAG_key2_GCBL 36946856
QOAG_key2_GCBL 50707226
;
What I am trying to do is reconstruct the mapping table between the list of short keys and the list of long keys, by reading the longkey variable. The expected answer should be as below. Any idea how this could be effeciently implemented in SAS?
In advance, many many thanks for your help...
Within a DATA step:
length shortkey $ 5; /* or whatever you think is a reasonable length */
shortkey = scan(longkey, 2, '_');
Thank you so much for your answer! I feel I am getting closer already. Here is what I did following your advice:
data mapping;
set longKeys;
length shortkey $ 4; /* or whatever you think is a reasonable length */
shortkey = scan(longkey, 2, '_');
;
Which produced the following output
Obs longkey value shortkey 1 HLKB_key7_BEGO 80926681 key7 2 KGEH_key4_HLQB 88209836 key4 3 KGEH_key9_HLQB 94053433 key9 4 CHOA_key6_ABKQ 82703104 key6 5 CHOA_key3_ABKQ 92174044 key3 6 CHOA_key3_ABKQ 35849309 key3 7 ABCD_key0_DFHG 17346360 key0 8 CHOA_key6_ABKQ 28834207 key6 9 ABCD_key0_DFHG 23090564 key0 10 QOAG_key2_GCBL 74019736 key2 11 KGEH_key4_HLQB 19951212 key4 12 HLKB_key7_BEGO 11353465 key7 13 CHOA_key3_ABKQ 33063093 key3 14 CHOA_key1_ABKQ 28276788 key1 15 QOAG_key2_GCBL 29987914 key2 16 HLKB_key5_BEGO 7296540 key5 17 CHOA_key3_ABKQ 74638169 key3 18 CHOA_key3_ABKQ 71900903 key3 19 ABCQ_key8_QOHG 90651346 key8 20 CHOA_key3_ABKQ 48446188 key3 21 KGEH_key9_HLQB 64205650 key9 22 ABCD_key0_DFHG 58012348 key0 23 HLKB_key7_BEGO 73923915 key7 24 KGEH_key4_HLQB 97173720 key4 25 QOAG_key2_GCBL 51450320 key2 26 ABCQ_key8_QOHG 56699883 key8 27 CHOA_key1_ABKQ 64852517 key1 28 QOAG_key2_GCBL 74892582 key2 29 HLKB_key7_BEGO 8070758 key7 30 CHOA_key3_ABKQ 46544906 key3 31 CHOA_key3_ABKQ 22765177 key3 32 ABCQ_key8_QOHG 36334812 key8 33 ABCQ_key8_QOHG 78318514 key8 34 KGEH_key4_HLQB 17058785 key4 35 HLKB_key7_BEGO 15515760 key7 36 ABCQ_key8_QOHG 55507915 key8 37 HLKB_key5_BEGO 74279902 key5 38 ABCD_key0_DFHG 52472477 key0 39 ABCD_key0_DFHG 52305871 key0 40 CHOA_key3_ABKQ 49451187 key3 41 ABCQ_key8_QOHG 59462988 key8 42 CHOA_key3_ABKQ 8492549 key3 43 HLKB_key7_BEGO 2140768 key7 44 QOAG_key2_GCBL 36946856 key2 45 QOAG_key2_GCBL 50707226 key2
However I am only interested in the unique mapping between shortkey and longkey. As the origninal dataset is very large (not this sample one), how could one create a new dataset by filtering such that each key only appears once in combination to its associated longkey. (such as to generate the table able).
Thank you so much for your help!
Yes it did work! 🙂
Thank you so much
For a few reasons, I would switch from a DATA step to SQL. In one step, it can select only the variables that are needed, eliminate duplicates, and compute the short key. Here's what it could look like:
proc sql;
create table mapping as
select distinct longkey, scan(longkey, 2, '_') as shortkey
from longkeys;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.