Help using Base SAS procedures

Compare categorical data over months at individual level the change, and overall % change for vars

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Compare categorical data over months at individual level the change, and overall % change for vars

[ Edited ]

Good day,

 

I have a data few data sets of the same information, each table represents a different month.  I consolidated the data into one using SAS, and I am trying to track changes for each variable for each person each month. Here is a mock of the data--I created PK combining LIC ID and First and Last Name. 

LIC_ID PK LIC_Num LIC_Desc Last_Name First_Name Middle_Name Business_Name CountyID County Expire_Date address City State ZipCode
5001 5001DoeJohn 11111   Doe John E Family Med 88 COOK 10/1/2017 47500   North Memphis TN 45566
5665 5665DoeJane 22222 VOID Doe Jane M   44 Dupage 12/20/2018 817 Terrain  Bensonville IL 97887
488 488SmithJoe 33333 HOLD Smith Joe D     Out of State 9/1/2017     NY 75566
5899 5899ParkSam 44444 Expired Park Sam M   48 ALACHUA 9/30/2017 58088 23rd street Orlando FL 84998
5001 5001DoeJohn 11111 LIC Change Doe John E Family Med 88 COOK 10/1/2020 47500   North Memphis TN 45566
5665 5665DoeJane 22222 VOID Doe Jane M Corner Clinic 44 Dupage 12/20/2018 817 Terrain  Bensonville IL 97887
488 488SmithJoe 33333   Smith Joe D   46 NEW YORK 1/1/2020 342 Washington New York NY 75566
5899 5899ParkSam 44444 Expired Park Sam M   48 ALACHUA 9/30/2017 58088 23rd street Orlando FL 84998
48889 48889LockJohn 55555 NEW Lock John   Lock Clinic   MIAMI-DADE 10/25/2019   Atlanta GA 79914

 

I am not sure if it was easier to do this as one table, or keep them as separate.  This will be an ongoing analysis and would like to create a macro to do this, but am unsure where to start, or if it is possible. I am looking to have two outputs. One being a SAS dataset which will be the change for each individual for each variable by month, the second being some table using proc tabulate, or proc sgpanel.   Basically, in my head the output would be like the below for the sgpanel:

 

Column1 JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
LIC_ID 0% 10% 0.05% 2% 0% 0% 5% 25% 1% 0% 0% 2%
PK 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
LIC_NUM 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
LIC_DESC 1% 0% 5% 0% 10% 2% 0% 0% 1.50% 1% 0% 2%
LAST_NAME 0% 0.10% 0% 0.01% 0% 0% 0% 0% 0% 0% 0% 0%
FIRST_NAME 0% 0% 0% 0% 0% 0.01% 0% 0% 0% 0% 0% 0%
MIDDLE_NAME 0% 0% 0% 0% 0% 0.01% 0% 0% 0% 0% 0% 0%
BUSINESS_NAME etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc.
COUNTY_ID etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc.
COUNTYID etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc.
COUNTY etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc.
EXPIRE_DATE etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc.
ADDRESS etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc.
CITY etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc.
STATE etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc.
ZIPCODE etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc.

 

 

Any help is much appreciated, I have been struggling trying to score categorical data over time as well as trying to figure out how to create a data set that would show here is the specific records, over this many months, this specific variable changed this many times, and it show at which month it changed.  Thank you again! (The percentages do not correlate to the original table, just used for visual purposes).

 

 


Accepted Solutions
Solution
‎10-17-2017 09:38 AM
Respected Advisor
Posts: 4,736

Re: Compare categorical data over months at individual level the change, and overall % change for v

Posted in reply to khoffmann

@khoffmann

What about something along the line of below?

data have;
  infile datalines dlm='|' dsd truncover;
  input LICID report_dt:date9. (Last_Name LN_CharCT LIC_Desc LIC_desc_CharCT) (:$30.);
  format report_dt date9.;
  datalines;
111|01May2017|Doe|3|Expired|7
111|01Jun2017|Smith|5|Renewed|7
222|01May2017|Parc|4|VOID|4
222|01Jun2017|Park|4|VOID|4
;
run;

proc sql;
  create table inter as
  select
    l.licid,
    l.report_dt,
    case
      when (not missing(r.licid)) then compged(r.Last_Name, l.Last_Name) 
      else .
      end as dif_Last_Name,
    case
      when (not missing(r.licid)) then compged(r.LN_CharCT, l.LN_CharCT) 
      else .
      end as dif_LN_CharCT,
    case
      when (not missing(r.LIC_Desc)) then compged(r.LIC_Desc, l.LIC_Desc) 
      else .
      end as dif_LIC_Desc,
    case
      when (not missing(r.LIC_desc_CharCT)) then compged(r.LIC_desc_CharCT, l.LIC_desc_CharCT) 
      else .
      end as dif_LIC_desc_CharCT
  from 
    have l left join have r
      on l.licid=r.licid and r.report_dt=intnx('month',l.report_dt,-1,'b')
  order by l.licid, l.report_dt 
  ;
quit;

proc transpose data=inter out=want(rename=(_name_=Var_Name));
  by licid;
  id report_dt;
  format report_dt monyy5.;
  var dif_Last_Name dif_LN_CharCT dif_LIC_Desc dif_LIC_desc_CharCT
  ;
run;
  

View solution in original post


All Replies
Super User
Posts: 13,583

Re: Compare categorical data over months at individual level the change, and overall % change for v

Posted in reply to khoffmann

Whatever you are going to do it will probably work out much easier if you have a single record for each individual and point in time. Since you say this is "on going", which implies to me that you will be adding records at later dates, that point in time variable should be a SAS Date valued variable as it is much easier to calculate durations and intervals from SAS date values than from "May".

 

Here is an example using a SAS supplied data set you should have available. The first step summarized a daily data set to monthly and then the differences for a couple of variables are calculated from the monthly summaries. Then Tabulate is used to display them.

proc summary data=sashelp.snacks nway;
   class product date;
   Format date Monyy7.;
   var QtySold Price;
   output out=work.snacksum (drop= _:) sum(QtySold)=MonthSale mean(Price)=MonthMeanPrice;
run;

data work.difference;
   set work.snacksum;
   by product date;

   QuantityDif = dif(MonthSale);
   PriceDif    = round( dif(MonthMeanPrice), 0.01);
   if first.product then do;
      QuantityDif =0;
      PriceDif=0;
   end;
   
run;

proc tabulate data=work.difference;
   class product date;
   var QuantityDif PriceDif;
   tables product *(QuantityDif PriceDif),
          date * max=''*f=best8.;
   label
      QuantityDif='Units sold difference'
      priceDif   ='Difference mean monthly price'
   ;
run;

If you add data to the starting data set, such as sashelp.snacks then the intermediate steps will summarize using the date value and then tabulate (or proc report) shows the results including the new data. No macro needed. The Product would be replaced by your individual identifier. Notice that the report table handles the display of the monthly differences in  a "nice" manner: sorted by value which does not happen with values like "May" and "August" as text varaibles.

 

 

I do wonder what you may have been thinking about % for Last name though???

Occasional Contributor
Posts: 10

Re: Compare categorical data over months at individual level the change, and overall % change for v

That definitely makes sense! Thank you for the reply! 

 

That is the piece I am struggling with, I want to almost do a letter by letter macro analysis for the characters to say for instance below within a matrix style table:

 

LICID  Month  Last Name   LN_CharCT    LIC_Desc   LIC_desc_CharCT

111      May     Doe                  3                   Expired        7

111     June     Smith               5                   Renewed     7

222     May     Parc                 4                    VOID           4

222     May     Park                 4                    VOID           4

 

Matrix:

                     Month   

LIC ID    VAR                 .,..    MAY        JUNE 

111         Last_Name             0%           160 %

               LIC_Desc               0%           100%

222        Last_Name              0%           25%

              LIC_Desc                0%            0%

 

Even if the percentage is a stretch, or an exaggerated solution, it would be nice to see then instead of the percentage in the matix simply [changed, no_changed].  These will be monthly files that really what I care about is this last name changed from May to June, but the ability to show this in a standardized manner for both qualitative and quantitative pieces.  I have no problem with the change/no_change in the table but I would like it for both pieces of data.  

Super User
Posts: 13,583

Re: Compare categorical data over months at individual level the change, and overall % change for v

Posted in reply to khoffmann

khoffmann wrote:

That definitely makes sense! Thank you for the reply! 

 

That is the piece I am struggling with, I want to almost do a letter by letter macro analysis for the characters to say for instance below within a matrix style table:

 

LICID  Month  Last Name   LN_CharCT    LIC_Desc   LIC_desc_CharCT

111      May     Doe                  3                   Expired        7

111     June     Smith               5                   Renewed     7

222     May     Parc                 4                    VOID           4

222     May     Park                 4                    VOID           4

 

Matrix:

                     Month   

LIC ID    VAR                 .,..    MAY        JUNE 

111         Last_Name             0%           160 %

               LIC_Desc               0%           100%

222        Last_Name              0%           25%

              LIC_Desc                0%            0%

 

Even if the percentage is a stretch, or an exaggerated solution, it would be nice to see then instead of the percentage in the matix simply [changed, no_changed].  These will be monthly files that really what I care about is this last name changed from May to June, but the ability to show this in a standardized manner for both qualitative and quantitative pieces.  I have no problem with the change/no_change in the table but I would like it for both pieces of data.  


If you are looking for differences in spelling there are a number of functions that will return a numeric "distance" between words.

Compged and Complev are two of them.

 

In a data set sorted as you need it you could use

 

LastNameDif = compged(lag(Lastname), Lastname));

 

With a corresponding: If first.personid then LastNameDif=0.

 

If the names are identical then compged will return 0. Small differences "Jones" vs "Jone" will have small values for the result (50 in this case) "Jones" and "Freddy" have a compged value of 610.

So you could see successive changes in a magnitude sense. You can customize the results by using the function Call COMPCOST to set how much importance you see for a deleted/inserted letter or punctuation (set low if you want "ABC Company, Inc" to match "ABC Company Inc" for instance.

Solution
‎10-17-2017 09:38 AM
Respected Advisor
Posts: 4,736

Re: Compare categorical data over months at individual level the change, and overall % change for v

Posted in reply to khoffmann

@khoffmann

What about something along the line of below?

data have;
  infile datalines dlm='|' dsd truncover;
  input LICID report_dt:date9. (Last_Name LN_CharCT LIC_Desc LIC_desc_CharCT) (:$30.);
  format report_dt date9.;
  datalines;
111|01May2017|Doe|3|Expired|7
111|01Jun2017|Smith|5|Renewed|7
222|01May2017|Parc|4|VOID|4
222|01Jun2017|Park|4|VOID|4
;
run;

proc sql;
  create table inter as
  select
    l.licid,
    l.report_dt,
    case
      when (not missing(r.licid)) then compged(r.Last_Name, l.Last_Name) 
      else .
      end as dif_Last_Name,
    case
      when (not missing(r.licid)) then compged(r.LN_CharCT, l.LN_CharCT) 
      else .
      end as dif_LN_CharCT,
    case
      when (not missing(r.LIC_Desc)) then compged(r.LIC_Desc, l.LIC_Desc) 
      else .
      end as dif_LIC_Desc,
    case
      when (not missing(r.LIC_desc_CharCT)) then compged(r.LIC_desc_CharCT, l.LIC_desc_CharCT) 
      else .
      end as dif_LIC_desc_CharCT
  from 
    have l left join have r
      on l.licid=r.licid and r.report_dt=intnx('month',l.report_dt,-1,'b')
  order by l.licid, l.report_dt 
  ;
quit;

proc transpose data=inter out=want(rename=(_name_=Var_Name));
  by licid;
  id report_dt;
  format report_dt monyy5.;
  var dif_Last_Name dif_LN_CharCT dif_LIC_Desc dif_LIC_desc_CharCT
  ;
run;
  
Occasional Contributor
Posts: 10

Re: Compare categorical data over months at individual level the change, and overall % change for v

Thank you Patrick!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 344 views
  • 3 likes
  • 3 in conversation