About Pumpp

Pumpp · ‎11-03-2020

I have approx 12-20 words in the Address field. My Location field in Country dataset has 3-5 words. I cannot use find as the spellings in both the datasets are incorrect. So i use functions like comged and comlev to match the words. Below is the code: """""" data work.namedata; set work.namedata; n+1; do i=1 to countw(Address," "); Address_Split=compress(scan(Address,i," ","m"),", / : @ # & ( ) ; \ . ! 0 1 2 3 4 5 6 7 8 9");output; end; drop n i; run; data work.namedata; set work.namedata; Add_split_lag = lag (Address_Split); Location_Count = _N_; run; proc sort data = work.namedata ; by descending Location_Count ; run; data work.namedata; set work.namedata; Add_split_lead = lag (Address_split); run; proc sql; create table Name_Country_Cart as select * from Namedata as a left join Countrydata as b on a.Country=b.Country; quit; data Name_Country_Cart _01; set Name_Country_Cart ; m+1; do j=1 to countw(Location," "); n+1; do i=1 to countw(Sub_Location," "); Sub_Split=compress(scan(Sub_Location,i," ","m"),", / - : @ # & ( ) ; \ . ! 0 1 2 3 4 5 6 7 8 9");output; Location_Split=compress(scan(Location,j," ","m"),", / - : @ # & ( ) ; \ . ! 0 1 2 3 4 5 6 7 8 9");output; end; end; drop j m i n; run; proc sort data = Name_Country_Cart _01 nodupkey; by Name Address Address_split Add_split_lag Add_split_lead Location Location_Split Sub_Split; where Sub_split is not missing; run; data Name_Country_Cart_02; set Name_Country_Cart_01; where Location_Split is not missing; Ged_Score_SDT = compged (Address_split , Location_Split); Lev_Score_SDT = complev (Address_split , Location_Split); run; '''''' The code runs for some observations but gets stuck after a certain point while running Name_Country_Cart_01. Which is why I decided to spilt the Country dataset and run the name dataset one name at a time.

Pumpp · ‎10-31-2020

I initially tried without splitting the dataset. But the problem is, the dataset gets too heavy, and post the joining. I need to do further analysis after this joining step. My analysis includes splitting each word from the address field (from Name dataset) into a number of observations and try to match it with the location field from the Country dataset. This process takes a lot of time and crashes the sas server which is why I decided to split the datasets and then do further analysis. I finally append the smaller datasets into one (using proc append) and delete the small datasets after my analysis using proc datasets.

Pumpp · ‎10-31-2020

Hi, I have 2 datasets(Sample is attached). One is Name dataset (having more than 600,000 observations) and the other is Country dataset (having more than 200,000 observations). I split the Country datasets into multiple different datasets based on the Country Name giving me close to 100 Unique datasets. I want to join 2 datasets one based on Name and the other based on Country. For eg: In the Name dataset, I want to 1st filter on the Name Andy, then since the Country of Andy is US, I want to join this with the US dataset. Then 2nd step I want to filter on the Name Andrew and join it with the New Zealand dataset. Then 3rd step filter on Luca and join it with Italy dataset. And do this process for all the unique names. Is there any way to do it? Or any Macro which could simplify my coding? Thanks

Pumpp · ‎07-08-2020

Thankyou for the response. Currently even I am joining both the datasets and trying to perform the actions. But I was trying to know any method without joining the 2 datasets because each datasets have more than 1lakh records and joining causes data to explode. But the below code is of help. I used to break each strings into into separate observations and then perform the analysis like with combinations of COMPGED, COMPLEV, SPEEDIS, SOUNDEX. But this involves lot of data exploding. Your code can atleast help me avoid 2 times data exploding. data tmp2; set temp; wn = countw(row); flag=0; do i=1 to wn; cmpl1 = complev(word,scan(row,i)); if cmpl1 =1 then flag=1; end; *if flag=1; keep word row flag; run;

Pumpp · ‎07-08-2020

Its a good answer, but it only works if both datasets are of correct spelling. Is there any way I can incorporate this with combination of some fuzzy functions like COMGED, SPEDIS, SOUNDEX etc?

Pumpp · ‎07-08-2020

I have 2 datasets - dataA is a Sentence, dataB is a Name field. DataA has observations like - "This Apple was bought yesterday", "The Orange is delicious", "Those Carrots are spoilt". DataB has observations like - "Strawberries", "Oranges", "Potatoes", "Aple", "Grapes", "Carot". (Note: All the characters are in UPCASE). I want to fuzzy match both datasets to see if DataB matches any of the strings in DataA. If it matches, I want DataA to have a new Column which says Match = "YES". Can we match both the datasets without merging?

Pumpp · ‎03-19-2020

I have produced some results using SAS Visual Analytics. I want to Schedule a mail every day sending these results using sas. Can anyone help me?

Pumpp · ‎02-25-2020

Thank you very much. The BY statement actually worked.

Pumpp · ‎02-24-2020

I do not want concat to come in TABLE statement. I want it in the same format as shown in the code. And also I do not want the total of all concat's. I want the where statement of concat to be one those 30+ distinct observations. And i want 30+ different results displayed, rather than one single result. I want to see the results based on individual concat values.

Pumpp · ‎02-23-2020

I am getting an error message. ERROR 180-322: Statement is not valid or it is used out of proper order. Where do I put the %macro Variables(class1=); statement? And also I want the Results(in tabular form) not output. Is it possible to use the dosubl syntax or call execute syntax in Proc tabulate? Like i said, I have 30+ unique concat observations, and I want 30+ seperate results and then export those 30+ results in excel. The method I used gave me the desired output, but i had to manually write all those 30+ class variables to get the output. I want a code like dosubl or call execute that can be used in porc tabulate to get the same results.

Pumpp · ‎02-21-2020

Below is the code which works fine and gives results as expected. %macro Variables(class1=); proc tabulate data=work.policy_consolidated01 missing; class year; class quarter; class type; class Concat; var No_of_policies; var premium; var Avg_Premium / weight = No_of_Policies; table (year=' ')*(quarter=' ' all='Sub Total'), sum=' '*(no_of_policies='NOP'*Format=comma16.0 premium='Prem'*Format=comma16.0) mean=' '*(Avg_Premium='Avg Prem'*Format=comma16.0) /printmiss nocellmerge box='year'; title &class1; where year in ('2017','2016','2015', '2014', '2013') and Concat = &class1; run; %Mend; %Variables(class1='AA'); %Variables(class1='AB'); %Variables(class1='AC'); %Variables(class1='BA'); %Variables(class1='BB'); %Variables(class1='CE'); etc.... There are 30 unique Concat observations. Instead of manually writing all the &class1 variables, I want a code such that it it loops through each of the 30 unique Concat observations and produces the same results. Also is this looping possible if there are more than 1 class variables?

Pumpp · ‎02-07-2020

Your answer was actually helpful, I modified the code a bit and got my desired output. Thankyou.

Pumpp · ‎02-07-2020

This is not the what i wanted. In this the Exposure count stops after 250. What I want is, for example taking 1st row from the Data Have, I want 45690 rows, where 1st Column has Name A, then 2nd column is Exposure having value 1(45690 times) and 3rd column as Count, where Count value is 1, but apprears only in the 1st 250 rows and for remaining rows (till row number 45690) has value 0. I hope I made sense to my problem statement?

Pumpp · ‎02-07-2020

With regards to the same subject. Now my data is Name Exposure Count A 45690 250 B 3320 120 C 12356 354 D 73987 456 E 5467 90 I want to do the same thing, but 3rd coloum as Count included. For example, Exposure has 45690 rows with values 1 each and Name A corresponding to that. What i want is a 3rd coloum as Count but only 250 rows having value 1 and remaining rows should have value is 0. Then do the same for Name B, but only 120 rows having count as 1 and remaining counts as 0. What do I do for this?

Pumpp · ‎02-07-2020

Thank you very much. This was of great help.

Online Status	Offline
Date Last Visited	‎11-03-2020 03:57 AM

Re: by Re: Joining 2 datasets based on Variable and dataset name

by Re: Joining 2 datasets based on Variable and dataset name

Joining 2 datasets based on Variable and dataset name

Re: Matching two datasets without merging

Re: Matching two datasets without merging

Matching two datasets without merging

Scheduling a daily mail on the output

Re: Looping through a Macro variable

Re: Looping through a Macro variable

Re: Looping through a Macro variable

Re: Matching two datasets without merging

Re: Looping through a Macro variable

Re: Looping through a Macro variable

Re: Looping through a Macro variable

Re: Spilt a number into that many number of rows

Re: by Re: Joining 2 datasets based on Variable and dataset name

by Re: Joining 2 datasets based on Variable and dataset name

Joining 2 datasets based on Variable and dataset name

Re: Matching two datasets without merging

Re: Matching two datasets without merging

Matching two datasets without merging

Scheduling a daily mail on the output

Re: Looping through a Macro variable

Re: Looping through a Macro variable

Re: Looping through a Macro variable

Looping through a Macro variable

Re: Spilt a number into that many number of rows

Re: Spilt a number into that many number of rows

Re: Spilt a number into that many number of rows

Re: Spilt a number into that many number of rows