Hi Expert,
I am not getting correct output when testing this code in huge data. The programming is running with zero output.
I want all records from child sheet when country, product and dosage matches between two datasets.
or country-product matches between two dataset only when dosage values is null for child dataset
Product data column should be considered as pattern matching.
Can you please help me where I have done the mistake or is there any altenative approach to retrieve data in a faster way.
data Child;
input pid country $ product & $25. dosage $;
cards;
101 France Atorvastatin Oral
102 France PaRacetamol .
103 Russia Nimusulide injection sc
104 China DIovan Tablet
105 Japan Gelusil Tablet
106 India Codeine Tablet
;
run;
data Parent;
input country $ product & $25. dosage $;
cards;
JapaN Atorvastatin calcium Oral
China Atorvastatin Oral
France Atorvastatin calcium Oral
Russia Paracetamol Oral
India Paracetamol Oral
France Paracetamol Oral
RusSia Nimusulide sc
China diovan Syrup .
China diovan .
China Gelusil .
india Codeine .
Japan Gelusil Tablet .
;
run;
proc sql;
create table match_country_prod_dos as
select c.*
from child c, parent p
where upper(compress(c.country)) in (select upper(compress(p.country)) from parent)
and (upper(compress(c.product)) like '%' || upper(compress(p.product)) || '%' or upper(compress(p.product)) like '%' ||upper(compress(c.product))|| '%')
and upper(compress(c.dosage)) in (select upper(compress(p.dosage)) from parent);
quit;
proc sql;
create table match_country_prod_dosagenull as
select c.*
from child c, parent p
where c.dosage is null
and upper(compress(c.country)) in (select upper(compress(p.country)) from parent)
and (upper(compress(c.product)) like '%' || upper(compress(p.product)) || '%' or upper(compress(p.product)) like '%' ||upper(compress(c.product))|| '%')
;
quit;
data final;
set match_country_prod_dos match_country_prod_dosagenull;
run;
Add in the upcase(strip()) part and it should work:
proc sql; create table match_country_prod_dos as select A.* from CHILD A where exists(select distinct B.COUNTRY from PARENT B where (upcase(strip(B.COUNTRY))=upcase(strip(A.COUNTRY)) and upcase(strip(B.DOSAGE))=upcase(strip(A.DOSAGE)) and index(upcase(strip(B.PRODUCT)),upcase(strip(A.PRODUCT)))>0)); quit;
However, I still advise you, if you are doing anything with this data, to first process it into a usable, standardised format - this will help you so much later on. I.e. make sure the data is all formatted correctly - upcase country in both, apply standardised codes to the countries so different spellings no longer factor in. Then merge on WHODrug or a drug library to get the various group levels to provide an acurate mergable variable.
Compress is not the function you want to use. You want to remove blanks from start and end, compress will trim out all spaces. If you modify your code to:
proc sql; create table match_country_prod_dos as
select c.* from child c, parent p where upper(strip(c.country)) in (select upper(strip(p.country)) from parent)
and (upper(strip(c.product)) like cats('%',upper(p.product),'%') or upper(strip(p.product)) like cats('%',upper(c.product),'%')) and upper(strip(c.dosage)) in (select upper(strip(p.dosage)) from parent); quit;
Then you get results.
Thanks in looking into my concern.
I would like to know whether the below code is correct. it means whether I should use alias p.country or only country to get correct result.
upper(strip(c.country)) in (select upper(strip(p.country)) from parent)
But I am not getting output while testing the same in my project.
Is there any alternative approach to retreive the data in a more faster way.
You could try:
proc sql; create table match_country_prod_dos as
select * from CHILD where exists(select B.COUNTRY from PARENT B where upcase(strip(B.COUNTRY))=upcase(strip(COUNTRY)) and index(B.PRODUCT,PRODUCT)>0 and upcase(strip(B.DOSAGE))=upcase(strip(DOSAGE))); quit;
I would question why you have drug terms, which are not coded. I do not see how you will merge these correctly. Drugnames should be code with the appropriate coding dictionary WHODrug for instance. This will give you the various coded levels for you to properly merge on.
The exists operator is not working and I am getting all rows from child dataset.
Actually the drug code is not available for this raw data for which I am comparing the product name.
Add in the upcase(strip()) part and it should work:
proc sql; create table match_country_prod_dos as select A.* from CHILD A where exists(select distinct B.COUNTRY from PARENT B where (upcase(strip(B.COUNTRY))=upcase(strip(A.COUNTRY)) and upcase(strip(B.DOSAGE))=upcase(strip(A.DOSAGE)) and index(upcase(strip(B.PRODUCT)),upcase(strip(A.PRODUCT)))>0)); quit;
However, I still advise you, if you are doing anything with this data, to first process it into a usable, standardised format - this will help you so much later on. I.e. make sure the data is all formatted correctly - upcase country in both, apply standardised codes to the countries so different spellings no longer factor in. Then merge on WHODrug or a drug library to get the various group levels to provide an acurate mergable variable.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.