BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wlierman
Lapis Lazuli | Level 10

I am wanting to convert a series of ID values that are formatted as text to numeric using

Data SASCHART.Date_Fix_Data (drop = Contact_Person_ID);
  Retain Day_mon Case_ID County Day_mon1 Contact_Person_ID County_T;
  Set SASCHART.Date_Fix_Data;
  Contact_ID = input(Contact_Person_ID, 8.);
run;

The vast majority of the entries in Contact_Person_ID field are all numbers like

54632784

53627928

64538272   and so on.

 

But there are some entries that begin with an alpha like

 

A324577

A564732

 

How do I remove the alpha character in the first place so I am left with just the numbers and can perform the  input conversion?

 

Thank you for your help.

 

wlierman

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26
if anyalpha(Contact_Person_ID)=1 then Contact_Person_ID=substr(Contact_Person_ID,2);
--
Paige Miller

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26
if anyalpha(Contact_Person_ID)=1 then Contact_Person_ID=substr(Contact_Person_ID,2);
--
Paige Miller
wlierman
Lapis Lazuli | Level 10

Thank you for the function anyalpha

ballardw
Super User

Why do you want an Id variable to be numeric? Are you going to do arithmetic with it? If you aren't doing arithmetic then it is probably better to leave the variable as character.

 

Here is one example of removing alphabetic characters from a variable.

data example;
  input Contact_Person_ID $ ;
  contact_person_id = compress(Contact_Person_ID,,'a');
datalines;
A324577
A564732
;
wlierman
Lapis Lazuli | Level 10

I need the numeric form so that I can use the first. last. method to count all individual cases and contacts by day_month over the time frame that Public Health has data from 2020 to current.  I then hope to create a line graph of cases and contacts using SAS Graph.

 

Thank you for the method to remove alpha character.

 

wlierman

Tom
Super User Tom
Super User

Using BY group processing does not require numeric variables.

proc sort data=sashelp.class out=class;
  by sex age ;
run;

data want;
  set class;
  by sex;
  if first.sex then youngest=name;
  retain youngest;
  if last.sex;
  oldest = name;
  keep sex youngest oldest;
run;

proc print;
run;

image.png

wlierman
Lapis Lazuli | Level 10

Thank you.

 

That looks to be a simpler solution to the counting up of cases and contacts.

 

wlierman

Tom
Super User Tom
Super User

@wlierman wrote:

I need the numeric form so that I can use the first. last. method to count all individual cases and contacts by day_month over the time frame that Public Health has data from 2020 to current.  I then hope to create a line graph of cases and contacts using SAS Graph.

 


You can use the DISTINCT keyword in SQL code.  So if you have one record per contact that includes both a DATE and a CASE id then you should be able to use COUNT(*) for number of  contacts and COUNT(DISTINCT CASE) as number of cases.   Not sure what a day month is, perhaps you want "01-01" for January first?

proc sql;
create want as
  select substr(put(date,yymmdd10.),6) as MMDD 
          , count(distinct cases) as n_cases
          , count(*) as n_contacts
  from have
  group by MMDD
;
quit;
wlierman
Lapis Lazuli | Level 10
Thank you for the proc sql code insight.
My data set looks like this

Day_mon     CaseID     County    Day_mon1    Contact_Person_ID    County_T
01MAR20   5463278      Baker     11SEP20            4536272                Yamhill05MAR20   5367382     Jackson   11SEP20           5436275                Washington
and so on.   I need to keep both Day_mon since the data on contacts doesn't begin until Sept 2020whereas cases start in March 2020 and in the data set I am using the last contact is roughly October 12, 2021 while cases run through Oct 22, 2021.  Once the graph has been created and everything looks good I am going to refresh both datasets which will then have November data.
With the data I have then I wrote the code as follows:

Proc sql;
Create Table SASCHART.Setup_Case as
  Select substr(put(day_mon,yymmdd10.),6) as MMDD
          , count(distinct CaseID) as n_cases
          /*, count(*) as n_contacts*/
  from SASCHART.Date_Fix_Data
  group by MMDD
;
quit;
but received the following error

ERROR: Numeric format YYMMDD in PUT function requires a numeric argument.


The wanted data set would look like
Day_mon      n_cases         Day_mon1        N_contacts1MAR20              1               11SEP20                355MAR20              9               12SEP20               67     

and so on through the end of the time series for each
Your guidance is very appreciated.
wlierman

Tom
Super User Tom
Super User

Your DAY_MON1 variable looks like a date, but apparently it is NOT a date.  You can convert it to a date using the INPUT() function.  Why is it called "DAY_MON" if it includes day, month and year?  Note that if you want to use character strings for dates it is best to use YYMMDD. format instead of others because then lexicographic order will also be chronological order. 

 

So use something like:

create Table SASCHART.Setup_Case as
  select input(day_mon1,date9.) as date format=yymmdd10.
          , count(distinct CaseID) as n_cases
          /*, count(*) as n_contacts*/
  from SASCHART.Date_Fix_Data
  group by 1
;

To count at the individual date level.

 

If you want to aggregate to MONTH of other time division use INTNX() function to all of the dates in the same interval to the same value.

create Table SASCHART.Setup_Case as
  select intnx('month',input(day_mon1,date9.),0) 
             as month format=yymmd7.
       , count(distinct CaseID) as n_cases_per_month
     /*, count(*) as n_contacts*/
  from SASCHART.Date_Fix_Data
  group by 1
;

Or if you are looking to see months from some base date perhaps you want to use intck() function instead?

create Table SASCHART.Setup_Case as
  select intck('month','01OCT2020'd,input(day_mon1,date9.)) 
             as month_num
       , count(distinct CaseID) as n_cases_per_month
     /*, count(*) as n_contacts*/
  from SASCHART.Date_Fix_Data
  where input(day_mon1,date9.) >= '01OCT2020'd
  group by 1
;
wlierman
Lapis Lazuli | Level 10
Tom

Thank you very much.  This is the solution that I am needing.I tried indicating as a solution but it is part of a thread that already has a solution. 
How can I mark your latest answer as a solution so you can receive the credit for this work?


Since you have helped me tremendously, I have one more step and that is to create a graph.The cases time line has 601 rows that sounds about right since cases were measured from patient 1 on March 1, 2020.Contacts have 396 rows which reflects the effort it took to build up the contact effort from scratch.

The data sets look like this
Cases                                                      Contacts        
    date                n_cases                           date                n_contacts

2020-03-01                1                         2020-09-11               253                        
2020-03-05                3                         2020-09-12               350
2020-03-06                5                         2020-09-13               185
      .                                                              .
      .                                                              .
      .                                                              .
2021-06-07              300                       2021-06-07                 41                
      .                                                              .
      .                                                              .
      .                                                              .
2021-10-26            1360                       2021-10-12                  71

Would SAS graph be able to produce a good graph?  Having 601 dates for cases and 396 dates for contactswould make the x-axis illegible.  Maybe choosing an interval to show just a few dates. How would you suggest presenting the visualization?  Then the contacts time line needs to overlay the cases time line to keep the chronological order.

Again thank you for all your knowledge and experience.

wlierman              

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 976 views
  • 5 likes
  • 4 in conversation