DATA Step, Macro, Functions and more

Not getting correct output during match merging

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Not getting correct output during match merging

Hi Expert,

I am not getting correct output when testing this code in huge data. The programming is running with zero output. 

 

I want all records from child sheet when country, product and dosage matches between two datasets. 

            or country-product matches between two dataset only when dosage values is null for child dataset

 

Product data column should be considered as pattern matching.

Can you please help me where I have done the mistake or is there any altenative approach to retrieve data in a faster way.

 

data Child;
input pid country $ product & $25. dosage $;
cards;
101 France Atorvastatin          Oral
102 France PaRacetamol            .
103 Russia Nimusulide injection  sc
104 China DIovan                 Tablet
105 Japan Gelusil                Tablet
106 India Codeine                Tablet 
;
run;

data Parent;
input country $ product & $25. dosage $;
cards;
JapaN Atorvastatin calcium 		Oral
China Atorvastatin 				Oral   
France Atorvastatin calcium 	Oral   
Russia Paracetamol 				Oral   
India Paracetamol 				Oral   
France Paracetamol 				Oral   
RusSia Nimusulide 				sc   
China diovan Syrup   			.
China diovan 					.
China Gelusil 					.
india Codeine 					.
Japan Gelusil Tablet			.
;
run;

proc sql;
create table match_country_prod_dos as
			select c.* 
				from child c, parent p
				where upper(compress(c.country)) in (select upper(compress(p.country)) from parent)
				and (upper(compress(c.product)) like '%' || upper(compress(p.product)) || '%'  or upper(compress(p.product)) like '%' ||upper(compress(c.product))|| '%')
				and upper(compress(c.dosage)) in (select upper(compress(p.dosage)) from parent);
quit;
 

proc sql;
create table match_country_prod_dosagenull as
			select c.* 
				from child c, parent p
				where c.dosage is null
				and upper(compress(c.country)) in (select upper(compress(p.country)) from parent)
				and (upper(compress(c.product)) like '%' || upper(compress(p.product)) || '%'  or upper(compress(p.product)) like '%' ||upper(compress(c.product))|| '%')
				;
quit;
 

data final;
set  match_country_prod_dos match_country_prod_dosagenull;
run;

 

 


Accepted Solutions
Solution
‎11-15-2017 09:38 AM
Super User
Super User
Posts: 9,227

Re: Not getting correct output during match merging

Add in the upcase(strip()) part and it should work:

proc sql;
  create table match_country_prod_dos as
  select A.*
  from   CHILD A
  where  exists(select distinct B.COUNTRY from PARENT B where (upcase(strip(B.COUNTRY))=upcase(strip(A.COUNTRY)) 
                  and upcase(strip(B.DOSAGE))=upcase(strip(A.DOSAGE)) and index(upcase(strip(B.PRODUCT)),upcase(strip(A.PRODUCT)))>0));
quit;

However, I still advise you, if you are doing anything with this data, to first process it into a usable, standardised format - this will help you so much later on.  I.e. make sure the data is all formatted correctly - upcase country in both, apply standardised codes to the countries so different spellings no longer factor in.  Then merge on WHODrug or a drug library to get the various group levels to provide an acurate mergable variable.

View solution in original post


All Replies
Super User
Super User
Posts: 9,227

Re: Not getting correct output during match merging

Compress is not the function you want to use.  You want to remove blanks from start and end, compress will trim out all spaces.  If you modify your code to:

proc sql;
  create table match_country_prod_dos as
select c.* from child c, parent p where upper(strip(c.country)) in (select upper(strip(p.country)) from parent)
and (upper(strip(c.product)) like cats('%',upper(p.product),'%') or upper(strip(p.product)) like cats('%',upper(c.product),'%')) and upper(strip(c.dosage)) in (select upper(strip(p.dosage)) from parent); quit;

Then you get results.

Occasional Contributor
Posts: 9

Re: Not getting correct output during match merging

Thanks in looking into my concern.

I would like to know whether the below code is correct. it means whether I should use alias p.country or only country to get correct result.

 

upper(strip(c.country)) in (select upper(strip(p.country)) from parent)

But I am not getting output while testing the same in my project. 

 

Is there any alternative approach to retreive the data in  a more faster way.

 

 

Super User
Super User
Posts: 9,227

Re: Not getting correct output during match merging

You could try:

proc sql;
  create table match_country_prod_dos as
select * from CHILD where exists(select B.COUNTRY from PARENT B where upcase(strip(B.COUNTRY))=upcase(strip(COUNTRY)) and index(B.PRODUCT,PRODUCT)>0 and upcase(strip(B.DOSAGE))=upcase(strip(DOSAGE))); quit;

I would question why you have drug terms, which are not coded.  I do not see how you will merge these correctly.  Drugnames should be code with the appropriate coding dictionary WHODrug for instance.  This will give you the various coded levels for you to properly merge on.

Occasional Contributor
Posts: 9

Re: Not getting correct output during match merging

The exists operator is not working and I am getting all rows from child dataset.

Actually the drug code is not available for this raw data for which I am comparing the product name.

 

 

Solution
‎11-15-2017 09:38 AM
Super User
Super User
Posts: 9,227

Re: Not getting correct output during match merging

Add in the upcase(strip()) part and it should work:

proc sql;
  create table match_country_prod_dos as
  select A.*
  from   CHILD A
  where  exists(select distinct B.COUNTRY from PARENT B where (upcase(strip(B.COUNTRY))=upcase(strip(A.COUNTRY)) 
                  and upcase(strip(B.DOSAGE))=upcase(strip(A.DOSAGE)) and index(upcase(strip(B.PRODUCT)),upcase(strip(A.PRODUCT)))>0));
quit;

However, I still advise you, if you are doing anything with this data, to first process it into a usable, standardised format - this will help you so much later on.  I.e. make sure the data is all formatted correctly - upcase country in both, apply standardised codes to the countries so different spellings no longer factor in.  Then merge on WHODrug or a drug library to get the various group levels to provide an acurate mergable variable.

Occasional Contributor
Posts: 9

Re: Not getting correct output during match merging

Any help please
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 120 views
  • 2 likes
  • 2 in conversation