Hello,
I have a table which contains 2 columns:
What I want is to split the second column into multiple columns(my delimiter is space) but in order.
Therefore my new table should look just like this:
Country OPT1 OPT2 OTP3 OTP4 OPT5............. OPTn
US #02 #05 #38 Z82
US #02 #16 #33 ZY1
.
.
.
US 1SZ
Any ideas?
Thank you,
Claudia.
@Claudia_SAS wrote:
My dataset is string, and I have no numeric variables.
I need to keep the # or 0 when leads the variable.
Please try the logic in the linked answer on your data there is nothing in that answer that has anything to do with numeric variables. Post the SAS log if you have any problems.
My code is:
/****************************************************
data split;
input rpo $743.;
datalines;
;
data split2 (drop=rpo i);
set test;
length opt1-opt186 $3;
array opt(186) $;
do i = 1 to dim(opt);
opt[i]=scan(rpo,i,' ','M');
end;
run;
/******************************************************
But, this is not what i was looking for
What I am looking for is to put variable "#AA" in a different column and not under the same column as "06D"
I don't know if I make any sense....I was trying to many ways and nothing came the way I wanted.
In the end I want to transpose the columns as leading headers and when I have a blank that should be counted as "0" and when I have a value that should be counted as "1".
Something like this...
Country | #AA | 06D | 1D3 | 1SZ | 2LZ | 4AA | 6X1 |
US | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
US | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
US | 1 | 1 | 1 | 1 | 1 | 1 | 0 |
Please help!
Do you know what the number of OPT variables is that needs to be created? What is the length of the longest OPT value?
Do you expect the same OPT value to be in the same OPT variable for each record?
Your posted "example" of what you seems to imply that each value should be in the same variable. Which leads to the question of just how many VALUES need to be considered? Do you have a list of them somewhere, like another data set?
@Claudia_SAS wrote:
thank you for your reply .....yes...I calculated the length of the rpo column, and then the length of opt(i) always equals to 3, delimiter is always one space.
Not a trivial exercise but not extremely difficult. However a more important question may be what you will do with all of those variables when created. For many types of analysis a single variable with different values is sufficient.
So here are the steps involved to one approach:
1) get a list of ALL of the codes involved. If you have others appear in a later data step that will be a serious complication to your "in order". If you do not have such a list already and have to create one from your existing data the first step is go from one wide variable to a bunch of single values.
2) create a temporary array in a data step with the values
3) identify which order each value in the list has from the temporary array
4) assign the variable the value.
Here is one way. Note the example data is small because I am not going to TYPE from pictures a larger number of values.
data start; infile datalines truncover; input country $ opt $20.; datalines; US #02 #05 #38 Z82 US #02 #16 #33 ZY1 ; /* get each code value as a single record*/ data temp; set start; length o $ 3; do i=1 to countw(opt); o = scan(opt,i); output; end; keep o; run; /* create ordered list of codes suitable for use in temporary array definition */ Proc sql noprint; select distinct quote(o) into: codelist separated by ',' from temp; quit; %let codecount=&sqlobs; %put list of codes = &codelist.; %put number of codes = &codecount.; data want; set start; array opt_(&codecount.) $ 3 opt1 - opt&codecount. ; array t(&codecount.) $ 3 _temporary_( &codelist); do i= 1 to countw(opt); opt_(whichc(scan(opt,i),of t(*))) = scan(opt,i); end; drop i; run;
Some key elements. The temp data set gets all of the values as a single record per value.
The Proc sql select distinct will get the unique values, and by default in an order. If you expect the aphabetic Z82 to appear before #02 that will add some work. There are two macro variables created that you can examine in the log that will be used to write part of the code needed for the last step. The &sqlobs is an automatic SAS variable that returns the number of observations used in the immediately previous Proc SQL step. Assign the value to a known variable so you can retrieve the value later.
The macro variable codecount has how many codes are appear in the data. This is the issue if you run this later with other data that the value of OPTx may change because of the sorting if new code values are added.
In the last data step we use the the codecount to set the size of arrays and create the variable names in the array statement.
The marginally ugly code in the do loop involving WHICHC is a function that finds the position of a value in the list of values. In this case the list of values is the temporary array T and the "of t(*)" say to search all of the values in the array. It returns the number of the found element.
Another approach would be create numeric 0/1 coded variables assigning labels to the variables Opt1 to Optn of the value represented, with 1 indicating "found" and 0 "not found". Which coding makes more sense depends on how you intend to use the data.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.