I have the following variable
id want
1 an-ad-abc
1 ad
2 an-ad
2 anc-anb
2 anc
3 anc-anr-an
3 ab-ad
3 anb-anc
3 anc-anb
4 anc-anb-ar
4 ar-anc-anb
I want to retain those that contain anc either alone or in combination, so my first output will look like this:
2 anc-anb
2 anc
3 anc-anr-an
3 anb-anc
3 anc-anb
4 anc-anb-ar
4 ar-anc-anb
next, if the same combination appeared for the same id in different order, I want to keep only one. Which mean anc-anb= anb-anc so keep only one
and the same for anc-anb-ar= ar-anc-anb
so my seconf output would eliminate those two
2 anc-anb
2 anc
3 anc-anr-an
3 anb-anc
4 anc-anb-ar
Well, I got to this, its a bit verbose:
data have; length want $200; input id want $; l=_n_; datalines; id want 1 an-ad-abc 1 ad 2 an-ad 2 anc-anb 2 anc 3 anc-anr-an 3 ab-ad 3 anb-anc 3 anc-anb 4 anc-anb-ar 4 ar-anc-anb ; run; data inter; length wrd $100; set have; do i=1 to countw(want,"-"); wrd=scan(want,i,"-"); output; end; run; proc sort data=inter (keep=id l wrd); by id l wrd; run; proc transpose data=inter out=want; by id l; var wrd; run; data want (keep=id res); set want; length res $200; res=catx('-',of col:); run; proc sort data=want nodupkey; by id res; where index(res,"anc")>0; run;
You essentially have text variables with "words" separated by dashes.
So examine the sas findw function (http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002978282.htm).
Once you understand that then google
sas "subsetting if"
Well, I got to this, its a bit verbose:
data have; length want $200; input id want $; l=_n_; datalines; id want 1 an-ad-abc 1 ad 2 an-ad 2 anc-anb 2 anc 3 anc-anr-an 3 ab-ad 3 anb-anc 3 anc-anb 4 anc-anb-ar 4 ar-anc-anb ; run; data inter; length wrd $100; set have; do i=1 to countw(want,"-"); wrd=scan(want,i,"-"); output; end; run; proc sort data=inter (keep=id l wrd); by id l wrd; run; proc transpose data=inter out=want; by id l; var wrd; run; data want (keep=id res); set want; length res $200; res=catx('-',of col:); run; proc sort data=want nodupkey; by id res; where index(res,"anc")>0; run;
Thank you RW9 for taking the time to write the code. It works fine until your last statement. The statement does not retain anything. Any suggestions?
Thank you!!
proc sort data=want nodupkey; by id res; where index(res,"anc")>0; run;
Explain what you mean by "doesn't retain anything." I ran @RW9's code and it creates a file called want that is exactly like the results you said you wanted, with the one exception that was incorrect in your example, namely that it leave out the last record of your example as it was already listed in the previous record.
4 anc-anb-ar
4 ar-anc-anb
should only be:
4 anc-anb-ar
which is what @RW9's code outputs.
Art, CEO, AnalystFinder.com
I agree with Art and RW9. The code looks OK. I would make a small change to select the observations, adding a "w":
where indexw(res, "anc") > 0;
Using INDEX instead of INDEXW, you might pick up "anc" within a longer string such as:
dollar-franc-peso
Thank you all! there was a mistake on my side! all worked fine thanks for the help
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.