Hello,
I want to extract only string or characters from the variable.
my data looks like this:
data have1;
input id name$ gen$ ID DATE A B C D$;
datalines;
1 ram male 1 1.1 1 . 1.23456 1/n
2 mohan male 1 1.1 99 1 1.23456 1.1/n
3 shyam male 2 2.1 1 99 1.23456 16/dr
4 radha female 2 2.1 90 1 99 2.00/mn
5 gita female 2 2.1 99 95 1.23456 ?/prp
6 sita female 2 3.1 . 1 1.23456 10mg
7 ranu female 3 2.1 1 . 1.23456 1000mg
8 tina female 3 2.1 . 1 1.23456 n/q
9 shan male 4 8.1 1 1 1.6789 2:3
;
run;
I want data look like this:
data have1;
id name gen ID DATE A B C D E ;
1 ram male 1 1.1 1 . 1.23456 1 n
2 mohan male 1 1.1 99 1 1.23456 1 n
3 shyam male 2 2.1 1 99 1.23456 16 dr
4 radha female 2 2.1 90 1 99 2.00 mn
5 gita female 2 2.1 99 95 1.23456 ? prp
6 sita female 2 3.1 . 1 1.23456 10 mg
7 ranu female 3 2.1 1 . 1.23456 1000 mg
8 tina female 3 2.1 . 1 1.23456 nq
9 shan male 4 8.1 1 1 1.6789 2
I have tried following code
data want; set have1;
_numb= scan (D,1,'/','A');
run;
This removes all the string characters. But I need a separate variable with remaining characters (Also want to remove '?' from the dataset )
Kindly help.
Thanks in advance
data have1;
input id name $ gen $ ID DATE A B C _D $;
D=scan(_D,1,'/:','a');
E=compress(_D,'/:' || D,'d');
drop _D;
cards;
1 ram male 1 1.1 1 . 1.23456 1/n
2 mohan male 1 1.1 99 1 1.23456 1.1/n
3 shyam male 2 2.1 1 99 1.23456 16/dr
4 radha female 2 2.1 90 1 99 2.00/mn
5 gita female 2 2.1 99 95 1.23456 ?/prp
6 sita female 2 3.1 . 1 1.23456 10mg
7 ranu female 3 2.1 1 . 1.23456 1000mg
8 tina female 3 2.1 . 1 1.23456 n/q
9 shan male 4 8.1 1 1 1.6789 2:3
;
run;
Obs | id | name | gen | DATE | A | B | C | D | E |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | ram | male | 1.1 | 1 | . | 1.2346 | 1 | n |
2 | 1 | mohan | male | 1.1 | 99 | 1 | 1.2346 | 1.1 | n |
3 | 2 | shyam | male | 2.1 | 1 | 99 | 1.2346 | 16 | dr |
4 | 2 | radha | female | 2.1 | 90 | 1 | 99.0000 | 2.00 | mn |
5 | 2 | gita | female | 2.1 | 99 | 95 | 1.2346 | ? | prp |
6 | 2 | sita | female | 3.1 | . | 1 | 1.2346 | 10 | mg |
7 | 3 | ranu | female | 2.1 | 1 | . | 1.2346 | 1000 | mg |
8 | 3 | tina | female | 2.1 | . | 1 | 1.2346 | nq | |
9 | 4 | shan | male | 8.1 | 1 | 1 | 1.6789 | 2 |
Should the last record be
9 shan male 4 8.1 1 1 1.6789 2 3
or does the : instead of the / mean disregard what is after?
If you are looking to get rid of ?'s what should this records look like?
5 gita female 2 2.1 99 95 1.23456 ? prp
Should it be
5 gita female 2 2.1 99 95 1.23456 prp
I want to separate numbers and characters as sometimes string characters are present before / also
Use scan():
data want; set have1; e=scan(d,2,"/"); d=scan(d,1,"/"); run;
I don't think that will split 1000mg into 1000 and mg.
In which case you need to two logical operations, one for delimited, one for not:
data want; set have1;
if index(d,"/") then do;
e=scan(d,2,"/"); d=scan(d,1,"/");
end;
else do;
e=compress(d," ","d");
d=tranwrd(d,e,"");
end; run;
So first, if a / appears use that, else compress out all numbers and set that as e, then remove e from d.
Thanks for the reply. I have tried scan() function, but it does not work for those instances where '/ ' is not there such as row number 6,7,and 9. As It removes all the characters instead.
data have1;
input id name$ gen$ ID DATE A B C D$;
datalines;
1 ram male 1 1.1 1 . 1.23456 1/n
2 mohan male 1 1.1 99 1 1.23456 1.1/n
3 shyam male 2 2.1 1 99 1.23456 16/dr
4 radha female 2 2.1 90 1 99 2.00/mn
5 gita female 2 2.1 99 95 1.23456 ?/prp
6 sita female 2 3.1 . 1 1.23456 10mg
7 ranu female 3 2.1 1 . 1.23456 1000mg
8 tina female 3 2.1 . 1 1.23456 n/q
9 shan male 4 8.1 1 1 1.6789 2:3
;
run;
data want;
set have1;
d1=compress(d,' ','kd');
e=compress(d,'?/','d');
run;
data have1;
input id name $ gen $ ID DATE A B C _D $;
D=scan(_D,1,'/:','a');
E=compress(_D,'/:' || D,'d');
drop _D;
cards;
1 ram male 1 1.1 1 . 1.23456 1/n
2 mohan male 1 1.1 99 1 1.23456 1.1/n
3 shyam male 2 2.1 1 99 1.23456 16/dr
4 radha female 2 2.1 90 1 99 2.00/mn
5 gita female 2 2.1 99 95 1.23456 ?/prp
6 sita female 2 3.1 . 1 1.23456 10mg
7 ranu female 3 2.1 1 . 1.23456 1000mg
8 tina female 3 2.1 . 1 1.23456 n/q
9 shan male 4 8.1 1 1 1.6789 2:3
;
run;
Obs | id | name | gen | DATE | A | B | C | D | E |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | ram | male | 1.1 | 1 | . | 1.2346 | 1 | n |
2 | 1 | mohan | male | 1.1 | 99 | 1 | 1.2346 | 1.1 | n |
3 | 2 | shyam | male | 2.1 | 1 | 99 | 1.2346 | 16 | dr |
4 | 2 | radha | female | 2.1 | 90 | 1 | 99.0000 | 2.00 | mn |
5 | 2 | gita | female | 2.1 | 99 | 95 | 1.2346 | ? | prp |
6 | 2 | sita | female | 3.1 | . | 1 | 1.2346 | 10 | mg |
7 | 3 | ranu | female | 2.1 | 1 | . | 1.2346 | 1000 | mg |
8 | 3 | tina | female | 2.1 | . | 1 | 1.2346 | nq | |
9 | 4 | shan | male | 8.1 | 1 | 1 | 1.6789 | 2 |
I think making that
'/:?'
gets OP to the desired result (dropping questions marks).
Although I do think 16/nq becoming 16 and nq while n/q becomes " " and nq is a little strange.
I agree, but it is what the OP showed as the desired output.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.