DATA Step, Macro, Functions and more

how do I input 4 different variables in the same datalines and only keep those lines that matches?

Reply
Occasional Contributor
Posts: 6

how do I input 4 different variables in the same datalines and only keep those lines that matches?

Hello everyone

How do I input the below dataline into sas with variable : gender, birthday, name, status(last letter of each line) for fisrt dataset

datalines;

M10/21/46CADYA
F11/11/50CLINEB
M11/11/52SMITHA
F10/10/80OPPENHEIMERB
M04/04/60JOSEA

 

second dataset with variables gender birthday name height

datalines:

M10/21/46CODY155
F11/11/50CLEIN166
F11/11/52SMITH185
F10/10/80OPPENHAIMER199
M02/07/60JOSA173

 

And I also want to keep those rows that are same(matches) in two datalines with all variables(including status and height)

Frequent Contributor
Posts: 75

Re: how do I input 4 different variables in the same datalines and only keep those lines that matche

Hi, see code below. None of your data matches up the way you listed it because either names are different or gender is different, I don't know if those are typos in the names or if that's how your data is. But I changed the first name in the second dataset so you can see what the dataset 'want' looks like if all three values match.

data have1;
input Gender $ 1 DOBin $ 2-9 NameStatus $ 10-25;
datalines;
M10/21/46CADYA
F11/11/50CLINEB
M11/11/52SMITHA
F10/10/80OPPENHEIMERB
M04/04/60JOSEA
;
run;

data have1;
set have1;
format DOB mmddyy8.;
DOB=input(DOBin,mmddyy8.);
Name=substr(NameStatus,1,length(NameStatus)-1);
Status=substr(NameStatus,length(NameStatus));
drop DOBin NameStatus;
run;

proc sort data=have1;
by gender DOB Name;
run;

data have2;
input Gender $ 1 DOBin $ 2-9 NameHeight $ 10-25;
datalines;
M10/21/46CADY155
F11/11/50CLEIN166
F11/11/52SMITH185
F10/10/80OPPENHAIMER199
M02/07/60JOSA173
;
run;

data have2;
set have2;
format DOB mmddyy8.;
DOB=input(DOBin,mmddyy8.);
Name=compress(NameHeight,'','d');
Height=input(compress(NameHeight,'','a'),3.);
drop DOBin NameHeight;
run;

proc sort data=have2;
by gender DOB Name;
run;

data want;
merge have1(in=a) have2(in=b);
by gender DOB Name;
if a and b;
run;
Contributor
Posts: 43

Re: how do I input 4 different variables in the same datalines and only keep those lines that matche

[ Edited ]

i prepared 2 different datasets have and have2. i tried they are correct. if you want you can change data formats.


/*                 part one             */

data have;
INPUT comment $ 30.;
DATALINES;
M10/21/46CADYA
F11/11/50CLINEB
M11/11/52SMITHA
F10/10/80OPPENHEIMERB
M04/04/60JOSEA
;
RUN;

PROC SQL;
CREATE TABLE WORK.QUERY_FOR_HAVE AS
SELECT /* gender */
(SUBSTR(t1.comment, 1, 1)) AS gender,
/* birthday */
(SUBSTR(t1.comment,2,8)) AS birthday,
/* status */
(substr(t1.comment,(LENGTH(t1.comment)),1)) AS status,
/* name */
(substr(t1.comment,10, LENGTH(t1.comment)-10)) AS name
FROM WORK.HAVE t1;
QUIT;

 

/*                        part two                         */


data have2;
INPUT comment2 $ 30.;
DATALINES;

M10/21/46CODY155
F11/11/50CLEIN166
F11/11/52SMITH185
;
run;

 

PROC SQL;
CREATE TABLE WORK.QUERY_FOR_HAVE_2 AS
SELECT /* gender */
(SUBSTR(t1.comment2, 1, 1)) AS gender,
/* birthday */
(SUBSTR(t1.comment2,2,8)) AS birthday,
/* status */
(substr(t1.comment2,(LENGTH(t1.comment2))-2,3)) AS status,
/* name */
(substr(t1.comment2,10, LENGTH(t1.comment2)-12)) AS name
FROM WORK.HAVE2 t1;
QUIT;

 

Respected Advisor
Posts: 3,124

Re: how do I input 4 different variables in the same datalines and only keep those lines that matche

[ Edited ]

This kinda of question always brought me back to the good old days when programming does matter Smiley Happy, You will need to read SAS docs on input, infile, informats etc.

 

 

filename FT15F001 temp lrecl=512; 

data test;
infile FT15F001  length=len ;
input @;
len=len-10;
input gender $1.
       birthday mmddyy8. 
        name $varying100. len status $1.;

format birthday mmddyy8.;
parmcards4;;;;
M10/21/46CADYA
F11/11/50CLINEB
M11/11/52SMITHA
F10/10/80OPPENHEIMERB
M04/04/60JOSEA
;;;;
run;

filename FT15F001 clear;


filename FT15F001 temp lrecl=512; 

data test2;
infile FT15F001  length=len ;
input @;
len=len-12;
input gender $1.
       birthday mmddyy8. 
        name $varying100. len height;

format birthday mmddyy8.;
parmcards4;;;;
M10/21/46CODY155
F11/11/50CLEIN166
F11/11/52SMITH185
F10/10/80OPPENHAIMER199
M02/07/60JOSA173
;;;;
run;

filename FT15F001 clear;
Respected Advisor
Posts: 4,646

Re: how do I input 4 different variables in the same datalines and only keep those lines that matche

I added one dataline to show a matching case:

 

data a;
length gender $1 name $20 birthday 8 status $1 line $100;
format birthday yymmdd10.;
if not prxId then prxId + prxParse("@([MF])(\d{1,2}/\d{1,2}/\d{1,2})(\D+)(\w)@");
input line;
if prxMatch(prxId, line) then do;
    gender = prxPosn(prxId, 1, line);
    birthday = input(prxPosn(prxId, 2, line), mmddyy8.);
    name = prxPosn(prxId, 3, line);
    status = prxPosn(prxId, 4, line);
    output;
    end;
drop line prxId;
datalines;
M10/21/46CADYA
F11/11/50CLINEB
M11/11/52SMITHA
F10/10/80OPPENHEIMERB
M04/04/60JOSEA
F11/11/52SMITHA
;

data b;
length gender $1 name $20 birthday 8 weight 8 line $100;
format birthday yymmdd10.;
if not prxId then prxId + prxParse("@([MF])(\d{1,2}/\d{1,2}/\d{1,2})(\D+)(\d+)@");
input line;
if prxMatch(prxId, line) then do;
    gender = prxPosn(prxId, 1, line);
    birthday = input(prxPosn(prxId, 2, line), mmddyy8.);
    name = prxPosn(prxId, 3, line);
    weight = input(prxPosn(prxId, 4, line), best.);
    output;
    end;
drop line prxId;
datalines;
M10/21/46CODY155
F11/11/50CLEIN166
F11/11/52SMITH185
F10/10/80OPPENHAIMER199
M02/07/60JOSA173
;

proc sql;
create table ab as
select a.*, b.weight
from a natural join b;
select * from ab;
quit;

PG
Ask a Question
Discussion stats
  • 4 replies
  • 140 views
  • 1 like
  • 5 in conversation