Dear friends at SAS community
I have a dataset that has 4000 observations.
In this dataset, I have a variable var1 which contains text data like this"[123/121/324/345]". I need to extract the numbers as separate numeric variables. For any observation, there can be up to 30 numbers in var1 that I need to extract:
so I am trying to extract new1=123, new2=121, new3=324 and new4=345. I tried the following code:
new1=scan(compress(var1, '[]'),1,'/');
new2=scan(compress(var1, '[]'),2,'/');
new3=scan(compress(var1, '[]'),3,'/');
new4=scan(compress(var1, '[]'),4,'/');
but the results do not seem correct. The last number is missing and never extracted. Also, when an observation only has one number in var1, e.g. var1="[123]", new1 is missing rather than 123 that I want to have.
so for var1="[123/121/324/345]"; I got new1=123, new2=121, new3=324, new4=. when I want new4=345
for var1="[123]", I got new1=. when I want new1=123
any thoughts of a solution?
thanks
Raymond
data w;
k="[123/121/324/345]";
k1=compress(k,'[]');
do _n_=1 to countw(k1);
num=scan(k1,_n_,'/');
output;
end;
drop k1;
run;
data w;
k="[123/121/324/345]";
k1=compress(k,'[]');
do _n_=1 to countw(k1);
num=scan(k1,_n_,'/');
output;
end;
drop k1;
run;
dear novinosrin,
thanks so much for your help. one of such "var1" variables is cardiac_nuclear_tests_year so I ran the code below
data w;
k1=compress(cardiac_nuclear_tests_year,'[]');
do _n_=1 to countw(k1);
num=scan(k1,_n_,'/');
output;
end;
drop k1;
run;
I am still having problems, because this code puts num in a long format, instead of as separate variables year1, year2....year5. Also other than cardiac_nuclear_tests_year, I have 10 other similar variables to extract from. Any other thought? many thanks
data w;
k="[123/121/324/345]";
k1=compress(k,'[]');
grp=0;
do _n_=1 to countw(k1);
num=scan(k1,_n_,'/');
grp+1;
output;
end;
drop k1;
run;
proc transpose data=w out=want prefix=year;
by k;
var num;
id grp;
run;
@rykwong wrote:
dear novinosrin,
thanks so much for your help. one of such "var1" variables is cardiac_nuclear_tests_year so I ran the code below
data w; k1=compress(cardiac_nuclear_tests_year,'[]'); do _n_=1 to countw(k1); num=scan(k1,_n_,'/'); output; end; drop k1; run;
I am still having problems, because this code puts num in a long format, instead of as separate variables year1, year2....year5. Also other than cardiac_nuclear_tests_year, I have 10 other similar variables to extract from. Any other thought? many thanks
Perhaps:
data w; set have; k1=compress(cardiac_nuclear_tests_year,'[]'); array num{30}; do i=1 to countw(k1); num[i]=scan(k1,i,'/'); end; drop k1 i; run;
I consider creating a variable _n_ to be suboptimal as SAS has an automatic variable _n_ that indicates the iteration of the data step and could 1) be confusing in code and 2) mess up the count if needed.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.