BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aobara
Fluorite | Level 6

Dears, 

 

I am new to sas so please accept my appologies if you feel the question is a little too basis. Here is the problematic I am facing:

 

I am currently working with a fairly large dataset wich contains a column with an extended key or longkey variable. The longkey variable is always the concatenation of a prefix, a short key and a suffix. To a short key always corresponds a single longkey, i.e the associated prefix and suffix only depend on the short key. Here is a sample dataset:

 

data longKeys;
 input  @1 longkey  $14.
 		@16 value   8.;
   datalines;
HLKB_key7_BEGO 80926681
KGEH_key4_HLQB 88209836
KGEH_key9_HLQB 94053433
CHOA_key6_ABKQ 82703104
CHOA_key3_ABKQ 92174044
CHOA_key3_ABKQ 35849309
ABCD_key0_DFHG 17346360
CHOA_key6_ABKQ 28834207
ABCD_key0_DFHG 23090564
QOAG_key2_GCBL 74019736
KGEH_key4_HLQB 19951212
HLKB_key7_BEGO 11353465
CHOA_key3_ABKQ 33063093
CHOA_key1_ABKQ 28276788
QOAG_key2_GCBL 29987914
HLKB_key5_BEGO 07296540
CHOA_key3_ABKQ 74638169
CHOA_key3_ABKQ 71900903
ABCQ_key8_QOHG 90651346
CHOA_key3_ABKQ 48446188
KGEH_key9_HLQB 64205650
ABCD_key0_DFHG 58012348
HLKB_key7_BEGO 73923915
KGEH_key4_HLQB 97173720
QOAG_key2_GCBL 51450320
ABCQ_key8_QOHG 56699883
CHOA_key1_ABKQ 64852517
QOAG_key2_GCBL 74892582
HLKB_key7_BEGO 08070758
CHOA_key3_ABKQ 46544906
CHOA_key3_ABKQ 22765177
ABCQ_key8_QOHG 36334812
ABCQ_key8_QOHG 78318514
KGEH_key4_HLQB 17058785
HLKB_key7_BEGO 15515760
ABCQ_key8_QOHG 55507915
HLKB_key5_BEGO 74279902
ABCD_key0_DFHG 52472477
ABCD_key0_DFHG 52305871
CHOA_key3_ABKQ 49451187
ABCQ_key8_QOHG 59462988
CHOA_key3_ABKQ 08492549
HLKB_key7_BEGO 02140768
QOAG_key2_GCBL 36946856
QOAG_key2_GCBL 50707226
   ;

 

What I am trying to do is reconstruct the mapping table between the list of short keys and the list of long keys, by reading the longkey variable. The expected answer should be as below.  Any idea how this could be effeciently implemented in SAS?

 

 mapping_table.jpg

 

In advance, many many thanks for your help...

1 ACCEPTED SOLUTION

Accepted Solutions
Peter_C
Rhodochrosite | Level 12
Create a VIEW (VSHORT) of the longkeys data which extracts the shortkey then run this
Proc sort nodupkeys data=vshort out= wanted;
by shortkey ;
run ;
Tho only problem is that you might run out of sort work utility disk space

View solution in original post

5 REPLIES 5
Astounding
PROC Star

Within a DATA step:

 

length shortkey $ 5;   /* or whatever  you think is a reasonable length */

shortkey = scan(longkey, 2, '_');

 

aobara
Fluorite | Level 6

Thank you so much for your answer! I feel I am getting closer already. Here is what I did following your advice:

 

data mapping;
	set longKeys;
	length shortkey $ 4;   /* or whatever  you think is a reasonable length */
	shortkey = scan(longkey, 2, '_');
;

Which produced the following output

 

                                           Obs       longkey          value     shortkey

                                             1    HLKB_key7_BEGO    80926681      key7  
                                             2    KGEH_key4_HLQB    88209836      key4  
                                             3    KGEH_key9_HLQB    94053433      key9  
                                             4    CHOA_key6_ABKQ    82703104      key6  
                                             5    CHOA_key3_ABKQ    92174044      key3  
                                             6    CHOA_key3_ABKQ    35849309      key3  
                                             7    ABCD_key0_DFHG    17346360      key0  
                                             8    CHOA_key6_ABKQ    28834207      key6  
                                             9    ABCD_key0_DFHG    23090564      key0  
                                            10    QOAG_key2_GCBL    74019736      key2  
                                            11    KGEH_key4_HLQB    19951212      key4  
                                            12    HLKB_key7_BEGO    11353465      key7  
                                            13    CHOA_key3_ABKQ    33063093      key3  
                                            14    CHOA_key1_ABKQ    28276788      key1  
                                            15    QOAG_key2_GCBL    29987914      key2  
                                            16    HLKB_key5_BEGO     7296540      key5  
                                            17    CHOA_key3_ABKQ    74638169      key3  
                                            18    CHOA_key3_ABKQ    71900903      key3  
                                            19    ABCQ_key8_QOHG    90651346      key8  
                                            20    CHOA_key3_ABKQ    48446188      key3  
                                            21    KGEH_key9_HLQB    64205650      key9  
                                            22    ABCD_key0_DFHG    58012348      key0  
                                            23    HLKB_key7_BEGO    73923915      key7  
                                            24    KGEH_key4_HLQB    97173720      key4  
                                            25    QOAG_key2_GCBL    51450320      key2  
                                            26    ABCQ_key8_QOHG    56699883      key8  
                                            27    CHOA_key1_ABKQ    64852517      key1  
                                            28    QOAG_key2_GCBL    74892582      key2  
                                            29    HLKB_key7_BEGO     8070758      key7  
                                            30    CHOA_key3_ABKQ    46544906      key3  
                                            31    CHOA_key3_ABKQ    22765177      key3  
                                            32    ABCQ_key8_QOHG    36334812      key8  
                                            33    ABCQ_key8_QOHG    78318514      key8  
                                            34    KGEH_key4_HLQB    17058785      key4  
                                            35    HLKB_key7_BEGO    15515760      key7  
                                            36    ABCQ_key8_QOHG    55507915      key8  
                                            37    HLKB_key5_BEGO    74279902      key5  
                                            38    ABCD_key0_DFHG    52472477      key0  
                                            39    ABCD_key0_DFHG    52305871      key0  
                                            40    CHOA_key3_ABKQ    49451187      key3  
                                            41    ABCQ_key8_QOHG    59462988      key8  
                                            42    CHOA_key3_ABKQ     8492549      key3  
                                            43    HLKB_key7_BEGO     2140768      key7  
                                            44    QOAG_key2_GCBL    36946856      key2  
                                            45    QOAG_key2_GCBL    50707226      key2  

However I am only interested in the unique mapping between shortkey and longkey. As the origninal dataset is very large (not this sample one), how could one create a new dataset by filtering such that each key only appears once in combination to its associated longkey. (such as to generate the table able).

 

Thank you so much for your help!

 

Peter_C
Rhodochrosite | Level 12
Create a VIEW (VSHORT) of the longkeys data which extracts the shortkey then run this
Proc sort nodupkeys data=vshort out= wanted;
by shortkey ;
run ;
Tho only problem is that you might run out of sort work utility disk space
aobara
Fluorite | Level 6

Yes it did work! 🙂

 

Thank you so much

Astounding
PROC Star

For a few reasons, I would switch from a DATA step to SQL.  In one step, it can select only the variables that are needed, eliminate duplicates, and compute the short key.  Here's what it could look like:

 

proc sql;

create table mapping as

select distinct longkey, scan(longkey, 2, '_') as shortkey

from longkeys;

quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 532 views
  • 2 likes
  • 3 in conversation