Solved: Re: Not getting correct output during match merging

Sauda · Posted 11-14-2017 09:50 AM

Hi Expert,

I am not getting correct output when testing this code in huge data. The programming is running with zero output.

I want all records from child sheet when country, product and dosage matches between two datasets.

or country-product matches between two dataset only when dosage values is null for child dataset

Product data column should be considered as pattern matching.

Can you please help me where I have done the mistake or is there any altenative approach to retrieve data in a faster way.

data Child;
input pid country $ product & $25. dosage $;
cards;
101 France Atorvastatin          Oral
102 France PaRacetamol            .
103 Russia Nimusulide injection  sc
104 China DIovan                 Tablet
105 Japan Gelusil                Tablet
106 India Codeine                Tablet 
;
run;

data Parent;
input country $ product & $25. dosage $;
cards;
JapaN Atorvastatin calcium 		Oral
China Atorvastatin 				Oral   
France Atorvastatin calcium 	Oral   
Russia Paracetamol 				Oral   
India Paracetamol 				Oral   
France Paracetamol 				Oral   
RusSia Nimusulide 				sc   
China diovan Syrup   			.
China diovan 					.
China Gelusil 					.
india Codeine 					.
Japan Gelusil Tablet			.
;
run;

proc sql;
create table match_country_prod_dos as
			select c.* 
				from child c, parent p
				where upper(compress(c.country)) in (select upper(compress(p.country)) from parent)
				and (upper(compress(c.product)) like '%' || upper(compress(p.product)) || '%'  or upper(compress(p.product)) like '%' ||upper(compress(c.product))|| '%')
				and upper(compress(c.dosage)) in (select upper(compress(p.dosage)) from parent);
quit;
 

proc sql;
create table match_country_prod_dosagenull as
			select c.* 
				from child c, parent p
				where c.dosage is null
				and upper(compress(c.country)) in (select upper(compress(p.country)) from parent)
				and (upper(compress(c.product)) like '%' || upper(compress(p.product)) || '%'  or upper(compress(p.product)) like '%' ||upper(compress(c.product))|| '%')
				;
quit;
 

data final;
set  match_country_prod_dos match_country_prod_dosagenull;
run;

RW9 · Posted 11-15-2017 04:46 AM

Add in the upcase(strip()) part and it should work:

proc sql;
  create table match_country_prod_dos as
  select A.*
  from   CHILD A
  where  exists(select distinct B.COUNTRY from PARENT B where (upcase(strip(B.COUNTRY))=upcase(strip(A.COUNTRY)) 
                  and upcase(strip(B.DOSAGE))=upcase(strip(A.DOSAGE)) and index(upcase(strip(B.PRODUCT)),upcase(strip(A.PRODUCT)))>0));
quit;

However, I still advise you, if you are doing anything with this data, to first process it into a usable, standardised format - this will help you so much later on. I.e. make sure the data is all formatted correctly - upcase country in both, apply standardised codes to the countries so different spellings no longer factor in. Then merge on WHODrug or a drug library to get the various group levels to provide an acurate mergable variable.

View solution in original post

RW9 · Posted 11-14-2017 09:56 AM

Compress is not the function you want to use. You want to remove blanks from start and end, compress will trim out all spaces. If you modify your code to:

proc sql;
  create table match_country_prod_dos as
  select c.* 
  from child c, parent p
  where upper(strip(c.country)) in (select upper(strip(p.country)) from parent)
    and (upper(strip(c.product)) like cats('%',upper(p.product),'%')  or upper(strip(p.product)) like cats('%',upper(c.product),'%'))
    and upper(strip(c.dosage)) in (select upper(strip(p.dosage)) from parent);
quit;

Then you get results.

Sauda · Posted 11-14-2017 10:01 AM

Thanks in looking into my concern.

I would like to know whether the below code is correct. it means whether I should use alias p.country or only country to get correct result.

upper(strip(c.country)) in (select upper(strip(p.country)) from parent)

But I am not getting output while testing the same in my project.

Is there any alternative approach to retreive the data in a more faster way.

RW9 · Posted 11-14-2017 10:10 AM

You could try:

proc sql;
  create table match_country_prod_dos as
  select * 
  from   CHILD
  where  exists(select B.COUNTRY from PARENT B where upcase(strip(B.COUNTRY))=upcase(strip(COUNTRY)) and index(B.PRODUCT,PRODUCT)>0 and upcase(strip(B.DOSAGE))=upcase(strip(DOSAGE)));
quit;

I would question why you have drug terms, which are not coded. I do not see how you will merge these correctly. Drugnames should be code with the appropriate coding dictionary WHODrug for instance. This will give you the various coded levels for you to properly merge on.

Sauda · Posted 11-14-2017 12:35 PM

The exists operator is not working and I am getting all rows from child dataset.

Actually the drug code is not available for this raw data for which I am comparing the product name.

RW9 · Posted 11-15-2017 04:46 AM

Add in the upcase(strip()) part and it should work:

proc sql;
  create table match_country_prod_dos as
  select A.*
  from   CHILD A
  where  exists(select distinct B.COUNTRY from PARENT B where (upcase(strip(B.COUNTRY))=upcase(strip(A.COUNTRY)) 
                  and upcase(strip(B.DOSAGE))=upcase(strip(A.DOSAGE)) and index(upcase(strip(B.PRODUCT)),upcase(strip(A.PRODUCT)))>0));
quit;

However, I still advise you, if you are doing anything with this data, to first process it into a usable, standardised format - this will help you so much later on. I.e. make sure the data is all formatted correctly - upcase country in both, apply standardised codes to the countries so different spellings no longer factor in. Then merge on WHODrug or a drug library to get the various group levels to provide an acurate mergable variable.

Sauda · Posted 11-14-2017 10:04 PM

Any help please

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away