BookmarkSubscribeRSS Feed
pallis
Fluorite | Level 6

I have an issue with my data hope u can help me in this. In my data set one variable data is data is populated lik car; tv; washing machine; mobile; cam . I have 15mill records in that data set. For this variable I need to separate each product into new variable.


Data a1;

set a;
Input var $50;
Product="car; tv; fridge; washingmachine; mobile"; output;

Product="Car; fridge; phone"; output;
Run;
This is sample data. Each product is seperated by " ; ".  Next to ; is new product which need to be created in seperate variable


I'm looking for output like


x1 x2 x3 x4 product                                                            product1 product2  product3   product 4                      product5

  1  a   a   a  car; tv; fridge; washingmachine; mobile     car             tv            fridge         washing machine     Mobile        

  2   b   b  b Car; fridge; phone                                           car              Fridge    Phone                                                    



x1 x2 x3 x4 are the other variables which are present in my data set.


Regard,

venky


9 REPLIES 9
LinusH
Tourmaline | Level 20

It would help if you know how many products there can be as a maximum.

Could be solved by using an array, do-loop with scan.

Data never sleeps
pallis
Fluorite | Level 6

Thanks for the reply. The problem here is we don't know how many products are there a person may took finance for single or multiple products.

Reeza
Super User

I recommend a different structure, as that will be sparse.  This is also easier to generate and summarize later on. 

id product

1 car

1 tv

1 fridge

1 washing machine

2 car

2 fridge

2 phone

pallis
Fluorite | Level 6

You are correct but  my data in that column is defined like that. I found of alternative way. the problem is if this variable becomes significant at time of modeling its problem we need to share the logic  that apples on there DB.

LinusH
Tourmaline | Level 20

I agree with https://communities.sas.com/people/Reeza, much better layout for reporting/analysis.

And Beauty of it is that to accomplish that, you don't need to know about how many Products there may be, and you don't have to involve arrays. Just use do - until with explicit output.

This could be a way forward - mainly depending in what kind of data resides in the other variables.

Data never sleeps
Reeza
Super User

I'm confused. It sounds like you have the data in one format. Are you uploading it into a database that is restricted to that format?

There is no issue with the Have data set, I have issues with the Want data set.

SKK
Calcite | Level 5 SKK
Calcite | Level 5

Hi.. I think this might work for you..

Data have;
infile datalines missover;
input x1 x2 $ x3 $ x4 $ products :$100.;
datalines4;
1  a   a   a  car;tv;fridge;washingmachine;mobile          
2   b   b  b Car;fridge;phone    
;;;;
run;


proc sql noprint;
select put((max(count(products,";"))+1),best.) into :tot from have;
quit;

data want(drop=i);
set have;
array pro(*) $30 product1-product%sysfunc(strip(&tot));
do i=1 to %sysfunc(strip(&tot));
pro(i)=scan(products,i,";");
end;
run;

pradeepalankar
Obsidian | Level 7

try this,

Data a1;

length product $10000;

Product="car; tv; fridge; washingmachine; mobile"; output;

product="Car; fridge; phone"; output;

Product="car1; tv; fridge; washingmachine; mobile;car; tv; fridge; washingmachine; mobile"; output;

Run;

%macro test;

data a1;set a1;

dlm= countw(product,';');

run;

proc sql noprint;

select max(dlm) into :max_dlm from a1;

quit;

data want;

set a1;

%do i=1 %to &max_dlm.;

product&i=scan(product,&i,';');

%end;

run;

%mend;

%test;

jwillis
Quartz | Level 8

Hi Pallis,

I agree with pradeepalankar's solution.  Sometimes we cannot change the data source.  Architects build databases for reasons that are best for their purposes.  P's solution uses the data source as it is given to you.  Also, P's solution allows for the use of a scan function to start looking for new products after the position of the fourth semicolon (fifth word)  "scan(productlist,4,';')" if you want to ignore the first four products.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1170 views
  • 0 likes
  • 6 in conversation