BookmarkSubscribeRSS Feed
Tracy_Bis
Calcite | Level 5

I have a data set (.cvs), that I use on a daily basis (includes about 60-75 variables).  The source of this data set has gone through some upgrades, and now automatically pulls data into the system.  It will not override data, so the system just adds a new iteration (to those questions that allow it) even if it is the exact same data that is already in there. So for each person (row) some of my variables have a string of data separated by commas. I have attached an example excel spreadsheet, showing the data I have and what I want (I work with health data, so this is an example of what my data set looks like). Now, if the unique variables cannot be separated into their own columns, I am okay with having them in the same column.

My SAS skills (SAS 9.4) are fairly minimal (proc freqs, some simple SQL, merging, concatenation, sorting, pulling first obs of a string, etc.)

I have searched on line and tried the following:

data want;
set have;
Want_date=scan(Have_Date, 1, ' ');
do i=2 to countw(Have_Date,' ');
word=scan(Have_Date, i, ' ');
found=find(Want_date, word, 'it');
if found=0 then Want_date=catx(' ', Want_date, word);
end;
run; 

 

This did not work.  But to be honest I wasn't sure what this was exactly doing, so I may not have done it right.

When that didn't work I also found this code online and tried it:

data want;
if not prxId then prxId + prxParse("s/\b(\w{2,})\b(.*)\b(\1\s*)\b/\1\2/io");
set have;
Want_date= Have_Date;
do i = 1 to 100 until (times=0);
call prxChange(prxId,1,Want_date,Want_date,len,trunc,times);
end;
drop i prxId len trunc times;
run;

 

This was close, I ended up with a single date but it was followed by all the commas and the slash marks from the other dates in the field. (example: 05/30/2019,//,//,//,//,//,//,//,//).  Again, it could be because I am not sure what the code is all doing, so I wasn't able to correct it to remove the commas and slash marks.  

I just sent this one example but I have multiple columns that all have this same problem.  So any help would be appreciated.

Thank you 

2 REPLIES 2
Shmuel
Garnet | Level 18

So in your input are mainly 3 variables (I omit prefix HAVE_ ) :

  person_id, date and store.

Try next code:

data temp(keep=person_ID datex store);
 set have(rename=(have_person_id = person_ID));
       i=1;
      /* assuming number of date = number of stores in strings */
      do until (missing(date));
           datex = scan(have_date,i,',');
           store = scan(have_store,i,',');
           output;
           i+1;
      end;
run;
proc sort data=temp out=temp1 nodupkey;
  by person_id datex;
run;

You can use the result dataset as is for any analysis you need, without replacing the  long format to wide format, as shown in wanted output.

 

If you insist to have the wide format use PROC TRANSPOSE.

 

ed_sas_member
Meteorite | Level 14

Hi @Tracy_Bis 

 

Here is an attempt to achieve this:

data have;
	infile datalines dlm="09"x;
	input Have_PersonID	$ Have_Date:$200. Have_Store:$200.;
	datalines;
123	04/12/2019,04/12/2019,04/12/2019,04/12/2019,04/12/2019,04/12/2019,04/12/2019,04/12/2019	Tracy's Store,Tracy's Store,Tracy's Store,Tracy's Store,Tracy's Store,Tracy's Store,Tracy's Store,Tracy's Store
457	07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019,07/05/2019	,,,,,,,,,,,,,,
789	03/19/2019,03/19/2019,03/19/2019,03/19/2019,03/19/2019,03/19/2019,03/19/2019,03/19/2019,03/19/2019	,,,,,,,,,
101	05/18/2019,05/18/2019	Mickeys,Mickeys
234	08/02/2019,08/02/2019,08/02/2019,08/02/2019,08/02/2019,08/02/2019,08/02/2019,08/02/2019,08/02/2019,08/02/2019	Donalds, ND,Donalds, ND,Donalds, ND,Donalds, ND,Donalds, ND,Donalds, ND,Donalds, ND,Donalds, ND,Donalds, ND,Donalds, ND
557	05/19/2019,05/19/2019,05/19/2019,05/19/2019,05/30/2019,05/30/2019,05/30/2019	Daisy's, TX,Daisy's, TX,Daisy's, TX,Daisy's, TX,Pluto, MT,Pluto, MT,Pluto, MT
891	06/21/2019,06/21/2019,06/21/2019,07/10/2019	Hewy, Dewy, And Lewy,Hewy, Dewy, And Lewy,Hewy, Dewy, And Lewy,Goofy, Daffy, And Minnie
;
run;

%macro _mymac(param);
	data _have_&param. (keep= Have_PersonID have_&param._i i);
		set have;
		do i=1 to countw(Have_&param.);
			have_&param._i = scan(Have_&param.,i,",");
			if not missing(have_&param._i) then output _have_&param.;
		end;
	run;
	
	proc sort data=_have_&param. nodupkey;
		by Have_PersonID have_&param._i;
	run;
	
	proc sort data=_have_&param.;
		by Have_PersonID i;
	run;
	proc transpose data=_have_&param. out=_have_&param._tr (drop=_:);
		var have_&param._i;
		by Have_PersonID;
	run;
	
	data _have_&param._final;
		set _have_&param._tr;
		length have_&param._i $ 200;
		want_&param. = catx(", ", of col:);
		keep Have_PersonID want_&param.;
	run;
%mend;

%_mymac(date)
%_mymac(store)

data want;
	merge _have_date_final _have_store_final;
	by Have_PersonID;
run;

Output:

Capture d’écran 2020-07-05 à 10.08.39.png

 

 

 

 

 

All the best,

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 682 views
  • 1 like
  • 3 in conversation