BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Colleen9400
Fluorite | Level 6

Colleen9400_0-1696039356380.png

 

I've been struggling with this homework question for hours and I can't figure it out because the raw data has both * and - as delimiters as well as missing data. We are supposed to combine the raw datasets by the "set" function and maybe the "by" function using the concat or interleaving method. 

I am brand new to this so I can't use a complicated method which I haven't been taught yet. I tried the missover syntax to help with the missing data and it still doesn't work. Also tried DSD and DLM with no success either. this is currently the best I've gotten and as you can see with the output it is completely wrong.

libname emplib "C:\Users\Colle\Documents\Grad School\A BINF 5210 Health Data Analytics with SAS\Employee Library";

filename F1 "C:\Users\Colle\Documents\Grad School\A BINF 5210 Health Data Analytics with SAS\Employee Library\demo8-1-1.txt";
filename F2 "C:\Users\Colle\Documents\Grad School\A BINF 5210 Health Data Analytics with SAS\Employee Library\demo9-1-1.txt";

data emplib.demo1;
    infile F1 dlm='*-' missover;
    input id gender $ age salary;
run;

proc sort data=emplib.demo1;
    by id;
run;

proc print data=emplib.demo1;
run;

data emplib.demo2;
    infile F2 dlm='*-' missover;
    input id gender $ age salary;
run;

proc sort data=emplib.demo2;
    by id;
run;

proc print data=emplib.demo2;
run;

data emplib.final;
    set emplib.demo1 emplib.demo2;
    by id;
run;

proc print data=emplib.final;
run;



Colleen9400_0-1696041827078.png

 

 

I am also attaching my lecture notes so you can see what I am supposed to be working with or maybe where I am going wrong. 

Attached are both sets of my raw data too.

 

Thank you so much!!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The files use THREE delimiters. Can you see what they are?

 

You are using the troublesome old MISSOVER option instead of the modern (30 years old?) TRUNCOVER option.

 

But the main thing is you didn't include the DSD option which will allow adjacent delimiters to be seen as indicating a missing value.  Without that SAS will treat adjacent delimiters as one.  Like it would in a normal file delimited with spaces where users will put in extra spaces to make the columns look neater.

Name  Age Sex
Jane   12  F
Harry   9  M

 

Look carefully at the requested output.  How are the observations ordered?

 

Spoiler
%let path=c:\downloads ;
filename src ("&path/demo9-1-1.txt","&path/demo8-1-1.txt");

data want;
  infile src dsd dlm=' -*' truncover;
  input Emp_Id $ Gender $ Age Salary;
run;

proc sort;
  by salary;
run;
Obs    Emp_Id    Gender    Age    Salary

  1      19                 90         .
  2      10                 89         .
  3      13        F        78      1000
  4      1         F        45      1000
  5      6         F        67      2000
  6      11        M        67      3000
  7      3                  78      4000
  8      15        F        45      7000
  9      2         F       100     10000
 10      7         M        90     10000
 11      4         F        75     20000
 12      14                 87     40000
 13      20        M        90     50000
 14      21        F        91     70000

Just because the employee ID values LOOK like numbers does not mean you should read them as such.  What is the meaning of the MEAN() or MAX() of an ID number?

 

View solution in original post

1 REPLY 1
Tom
Super User Tom
Super User

The files use THREE delimiters. Can you see what they are?

 

You are using the troublesome old MISSOVER option instead of the modern (30 years old?) TRUNCOVER option.

 

But the main thing is you didn't include the DSD option which will allow adjacent delimiters to be seen as indicating a missing value.  Without that SAS will treat adjacent delimiters as one.  Like it would in a normal file delimited with spaces where users will put in extra spaces to make the columns look neater.

Name  Age Sex
Jane   12  F
Harry   9  M

 

Look carefully at the requested output.  How are the observations ordered?

 

Spoiler
%let path=c:\downloads ;
filename src ("&path/demo9-1-1.txt","&path/demo8-1-1.txt");

data want;
  infile src dsd dlm=' -*' truncover;
  input Emp_Id $ Gender $ Age Salary;
run;

proc sort;
  by salary;
run;
Obs    Emp_Id    Gender    Age    Salary

  1      19                 90         .
  2      10                 89         .
  3      13        F        78      1000
  4      1         F        45      1000
  5      6         F        67      2000
  6      11        M        67      3000
  7      3                  78      4000
  8      15        F        45      7000
  9      2         F       100     10000
 10      7         M        90     10000
 11      4         F        75     20000
 12      14                 87     40000
 13      20        M        90     50000
 14      21        F        91     70000

Just because the employee ID values LOOK like numbers does not mean you should read them as such.  What is the meaning of the MEAN() or MAX() of an ID number?

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 809 views
  • 1 like
  • 2 in conversation