I have a data set (.cvs), that I use on a daily basis (includes about 60-75 variables). The source of this data set has gone through some upgrades, and now automatically pulls data into the system. It will not override data, so the system just adds a new iteration (to those questions that allow it) even if it is the exact same data that is already in there. So for each person (row) some of my variables have a string of data separated by commas. I have attached an example excel spreadsheet, showing the data I have and what I want (I work with health data, so this is an example of what my data set looks like). Now, if the unique variables cannot be separated into their own columns, I am okay with having them in the same column.
My SAS skills (SAS 9.4) are fairly minimal (proc freqs, some simple SQL, merging, concatenation, sorting, pulling first obs of a string, etc.)
I have searched on line and tried the following:
data want;
set have;
Want_date=scan(Have_Date, 1, ' ');
do i=2 to countw(Have_Date,' ');
word=scan(Have_Date, i, ' ');
found=find(Want_date, word, 'it');
if found=0 then Want_date=catx(' ', Want_date, word);
end;
run;
This did not work. But to be honest I wasn't sure what this was exactly doing, so I may not have done it right.
When that didn't work I also found this code online and tried it:
data want;
if not prxId then prxId + prxParse("s/\b(\w{2,})\b(.*)\b(\1\s*)\b/\1\2/io");
set have;
Want_date= Have_Date;
do i = 1 to 100 until (times=0);
call prxChange(prxId,1,Want_date,Want_date,len,trunc,times);
end;
drop i prxId len trunc times;
run;
This was close, I ended up with a single date but it was followed by all the commas and the slash marks from the other dates in the field. (example: 05/30/2019,//,//,//,//,//,//,//,//). Again, it could be because I am not sure what the code is all doing, so I wasn't able to correct it to remove the commas and slash marks.
I just sent this one example but I have multiple columns that all have this same problem. So any help would be appreciated.
Thank you
So in your input are mainly 3 variables (I omit prefix HAVE_ ) :
person_id, date and store.
Try next code:
data temp(keep=person_ID datex store);
set have(rename=(have_person_id = person_ID));
i=1;
/* assuming number of date = number of stores in strings */
do until (missing(date));
datex = scan(have_date,i,',');
store = scan(have_store,i,',');
output;
i+1;
end;
run;
proc sort data=temp out=temp1 nodupkey;
by person_id datex;
run;
You can use the result dataset as is for any analysis you need, without replacing the long format to wide format, as shown in wanted output.
If you insist to have the wide format use PROC TRANSPOSE.
Hi @Tracy_Bis
Here is an attempt to achieve this:
data have;
infile datalines dlm="09"x;
input Have_PersonID $ Have_Date:$200. Have_Store:$200.;
datalines;
123 04/12/2019,04/12/2019,04/12/2019,04/12/2019,04/12/2019,04/12/2019,04/12/2019,04/12/2019 Tracy's Store,Tracy's Store,Tracy's Store,Tracy's Store,Tracy's Store,Tracy's Store,Tracy's Store,Tracy's Store
457 07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019 ,,,,,,,,,,,,,,
789 03/19/2019,03/19/2019,03/19/2019,03/19/2019,03/19/2019,03/19/2019,03/19/2019,03/19/2019,03/19/2019 ,,,,,,,,,
101 05/18/2019,05/18/2019 Mickeys,Mickeys
234 08/02/2019,08/02/2019,08/02/2019,08/02/2019,08/02/2019,08/02/2019,08/02/2019,08/02/2019,08/02/2019,08/02/2019 Donalds, ND,Donalds, ND,Donalds, ND,Donalds, ND,Donalds, ND,Donalds, ND,Donalds, ND,Donalds, ND,Donalds, ND,Donalds, ND
557 05/19/2019,05/19/2019,05/19/2019,05/19/2019,05/30/2019,05/30/2019,05/30/2019 Daisy's, TX,Daisy's, TX,Daisy's, TX,Daisy's, TX,Pluto, MT,Pluto, MT,Pluto, MT
891 06/21/2019,06/21/2019,06/21/2019,07/10/2019 Hewy, Dewy, And Lewy,Hewy, Dewy, And Lewy,Hewy, Dewy, And Lewy,Goofy, Daffy, And Minnie
;
run;
%macro _mymac(param);
data _have_¶m. (keep= Have_PersonID have_¶m._i i);
set have;
do i=1 to countw(Have_¶m.);
have_¶m._i = scan(Have_¶m.,i,",");
if not missing(have_¶m._i) then output _have_¶m.;
end;
run;
proc sort data=_have_¶m. nodupkey;
by Have_PersonID have_¶m._i;
run;
proc sort data=_have_¶m.;
by Have_PersonID i;
run;
proc transpose data=_have_¶m. out=_have_¶m._tr (drop=_:);
var have_¶m._i;
by Have_PersonID;
run;
data _have_¶m._final;
set _have_¶m._tr;
length have_¶m._i $ 200;
want_¶m. = catx(", ", of col:);
keep Have_PersonID want_¶m.;
run;
%mend;
%_mymac(date)
%_mymac(store)
data want;
merge _have_date_final _have_store_final;
by Have_PersonID;
run;
Output:
All the best,
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.