Greetings, I'm fairly new to SAS and I was wondering if this would be possible
On the image that I have attached, I am showing a column from an excel file in which several polymorphisms have been found in some patient samples and are separated by commas. I was wondering if there was a way to extract each polymorphism into its own column as and then count how many of those repeat or are unique.
I modified the first part of the code because I think you meant to use COUNTW() instead of COUNT()
I have attached a screenshot with some of the results, i was wondering if the idea here was to extract only one value into the polyMorph column according to the value of i. Would it be possible of instead of having one row per polymoprhism, you could have several columns for each polymorphism.
data long;
set mydata;
id=_n_;
n_loop=countw(Pr_poly, ",");
do i=1 to n_loop;
polyMorph=scan(pr_poly, i, ',');
output;
end;
run;
I sort of managed to take them into their own column using PROC TRANSPOSE. I suppose I could use other procedures that allow me to count each individual polymorphism
proc transpose data=long out=ordered prefix=polymorph;
by id;
var polymorph;
run;
Eventually I would like to associate all the different PR_polymorphisms under a single category
To continue this thread in case somebody faces the same issue (or as a reference to myself who might need it later), I managed to count unique polymorphisms after I transposed them each one in their own column using the following code
data PrWant;
set hivfull.pr_polymorph;
array pr_polymorph{24}pr_polymorph1-pr_polymorph24;
array new {24} $20 _temporary_;
do _n_=1 to 24;
new{_n_}=pr_polymorph{_n_};
end;
call sortc(of new{*});
count=(new{1}>'');
do _n_=2 to 24;
if new{_n_} ne new{_n_-1} then
count + 1;
end;
run;
The next step would be to evaluate which polymorphisms are the most common regardless of where they are placed in the column, it is likely that polymorphism A in column 23 could be found in a different column number for another patient, which is why i think it would've been more useful to have each polymorphism as its own column name and then rows would check if it is present or not.
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.