I have an issue with my data hope u can help me in this. In my data set one variable data is data is populated lik car; tv; washing machine; mobile; cam . I have 15mill records in that data set. For this variable I need to separate each product into new variable.
Data a1;
set a;
Input var $50;
Product="car; tv; fridge; washingmachine; mobile"; output;
Product="Car; fridge; phone"; output;
Run;
This is sample data. Each product is seperated by " ; ". Next to ; is new product which need to be created in seperate variable
I'm looking for output like
x1 x2 x3 x4 product product1 product2 product3 product 4 product5
1 a a a car; tv; fridge; washingmachine; mobile car tv fridge washing machine Mobile
2 b b b Car; fridge; phone car Fridge Phone
x1 x2 x3 x4 are the other variables which are present in my data set.
Regard,
venky
It would help if you know how many products there can be as a maximum.
Could be solved by using an array, do-loop with scan.
Thanks for the reply. The problem here is we don't know how many products are there a person may took finance for single or multiple products.
I recommend a different structure, as that will be sparse. This is also easier to generate and summarize later on.
id product
1 car
1 tv
1 fridge
1 washing machine
2 car
2 fridge
2 phone
You are correct but my data in that column is defined like that. I found of alternative way. the problem is if this variable becomes significant at time of modeling its problem we need to share the logic that apples on there DB.
I agree with https://communities.sas.com/people/Reeza, much better layout for reporting/analysis.
And Beauty of it is that to accomplish that, you don't need to know about how many Products there may be, and you don't have to involve arrays. Just use do - until with explicit output.
This could be a way forward - mainly depending in what kind of data resides in the other variables.
I'm confused. It sounds like you have the data in one format. Are you uploading it into a database that is restricted to that format?
There is no issue with the Have data set, I have issues with the Want data set.
Hi.. I think this might work for you..
Data have;
infile datalines missover;
input x1 x2 $ x3 $ x4 $ products :$100.;
datalines4;
1 a a a car;tv;fridge;washingmachine;mobile
2 b b b Car;fridge;phone
;;;;
run;
proc sql noprint;
select put((max(count(products,";"))+1),best.) into :tot from have;
quit;
data want(drop=i);
set have;
array pro(*) $30 product1-product%sysfunc(strip(&tot));
do i=1 to %sysfunc(strip(&tot));
pro(i)=scan(products,i,";");
end;
run;
try this,
Data a1;
length product $10000;
Product="car; tv; fridge; washingmachine; mobile"; output;
product="Car; fridge; phone"; output;
Product="car1; tv; fridge; washingmachine; mobile;car; tv; fridge; washingmachine; mobile"; output;
Run;
%macro test;
data a1;set a1;
dlm= countw(product,';');
run;
proc sql noprint;
select max(dlm) into :max_dlm from a1;
quit;
data want;
set a1;
%do i=1 %to &max_dlm.;
product&i=scan(product,&i,';');
%end;
run;
%mend;
%test;
Hi Pallis,
I agree with pradeepalankar's solution. Sometimes we cannot change the data source. Architects build databases for reasons that are best for their purposes. P's solution uses the data source as it is given to you. Also, P's solution allows for the use of a scan function to start looking for new products after the position of the fourth semicolon (fifth word) "scan(productlist,4,';')" if you want to ignore the first four products.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.