How to read this data? Date in multiple formats?
Person ID Exp_Date Origin Case ID Name Type of Exposure
1 12/20/20 100 Joe Smith Household
1 9/30/20 135 Joe Smith Workplace
2 1/5/21 168 Maria Rojas Education
3 11/4/20 96 Mary Hernandez Workplace
3 11/4/20 121 Mary Hernandez Workplace
3 1/25/21 365 Mary Hernandez Household
Your DATE values don't look too bad. They all appear to be in MDY order. So use the MMDDYY. informat.
Your NAME field is going to be a problem however. Your examples do not have any way to tell where the NAME ends and where the TypeOfExposure field starts. If the last field can only ever be one word then you might be able to find a way to read the data even if some names had three (or four or one) words.
data want ;
infile cards truncover ;
length PersonID Exp_Date OriginCaseID 8 Name $80 TypeofExposure $20;
input PersonID Exp_Date OriginCaseID Name $80. ;
TypeofExposure = scan(name,-1,' ');
name = substrn(name,1,length(name)-length(typeofexposure));
informat exp_date mmddyy.;
format exp_date yymmdd10. ;
cards;
1 12/20/20 100 Joe Smith Household
1 9/30/20 135 Joe Smith Workplace
2 1/5/21 168 Maria Rojas Education
3 11/4/20 96 Mary Hernandez Workplace
3 11/4/20 121 Mary Hernandez Workplace
3 1/25/21 365 Mary Hernandez Household
;
proc print;
run;
results
Person Origin Typeof Obs ID Exp_Date CaseID Name Exposure 1 1 2020-12-20 100 Joe Smith Household 2 1 2020-09-30 135 Joe Smith Workplace 3 2 2021-01-05 168 Maria Rojas Education 4 3 2020-11-04 96 Mary Hernandez Workplace 5 3 2020-11-04 121 Mary Hernandez Workplace 6 3 2021-01-25 365 Mary Hernandez Household
Not seeing multiple formats, can you please elaborate?
Does an informat of MMDDYY10 not work?
@rimal_mahesh wrote:
How to read this data? Date in multiple formats?
Person ID Exp_Date Origin Case ID Name Type of Exposure
1 12/20/20 100 Joe Smith Household
1 9/30/20 135 Joe Smith Workplace
2 1/5/21 168 Maria Rojas Education
3 11/4/20 96 Mary Hernandez Workplace
3 11/4/20 121 Mary Hernandez Workplace
3 1/25/21 365 Mary Hernandez Household
This
11/4/20
might be
2020-04--11 2020-11-04 2011-04-20 1920-04-11 1920-11-04 1911-04-20
to state just a few of the possible dates.
Even if you know the source data is in MDY order, it is still ambiguous with regards to the century.
Using 2-digit years is extremely stupid.
So I would return this garbage to sender and request data instead.
Your DATE values don't look too bad. They all appear to be in MDY order. So use the MMDDYY. informat.
Your NAME field is going to be a problem however. Your examples do not have any way to tell where the NAME ends and where the TypeOfExposure field starts. If the last field can only ever be one word then you might be able to find a way to read the data even if some names had three (or four or one) words.
data want ;
infile cards truncover ;
length PersonID Exp_Date OriginCaseID 8 Name $80 TypeofExposure $20;
input PersonID Exp_Date OriginCaseID Name $80. ;
TypeofExposure = scan(name,-1,' ');
name = substrn(name,1,length(name)-length(typeofexposure));
informat exp_date mmddyy.;
format exp_date yymmdd10. ;
cards;
1 12/20/20 100 Joe Smith Household
1 9/30/20 135 Joe Smith Workplace
2 1/5/21 168 Maria Rojas Education
3 11/4/20 96 Mary Hernandez Workplace
3 11/4/20 121 Mary Hernandez Workplace
3 1/25/21 365 Mary Hernandez Household
;
proc print;
run;
results
Person Origin Typeof Obs ID Exp_Date CaseID Name Exposure 1 1 2020-12-20 100 Joe Smith Household 2 1 2020-09-30 135 Joe Smith Workplace 3 2 2021-01-05 168 Maria Rojas Education 4 3 2020-11-04 96 Mary Hernandez Workplace 5 3 2020-11-04 121 Mary Hernandez Workplace 6 3 2021-01-25 365 Mary Hernandez Household
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.