Solved: Re: Missing observations

shami · Posted 04-08-2024 12:34 PM

I've got two datasets that are similar. I've run frequency counts for the same variable, however I noticed that one dataset has two less observations. What's the best way/approach to identify what those two observations that are missing from the other data set are?

Patrick · Posted 04-08-2024 07:19 PM

The in= option lets you capture which source table contributes data to the target table.

data raw3;
  merge raw1(in=in1) raw2(in=in2);
  by patient_local_id;
  if in1=0 or in2=0 then
    do;
      in_raw1=in1;
      in_raw2=in2;
      output;
    end;
run;

View solution in original post

PaigeMiller · Posted 04-08-2024 12:52 PM

If there is a common ID variable (such as customer ID or patient ID or similar), you can merge the two data sets and see which observations don't line up.

Example:

proc sql;
    create table combined as select
        coalesce(a.id,b.id) as id
        ,a.variable as variable_in_a
        ,b.variable as variable_in_b
    from dataset_a as a full join dataset_b as b
    on a.id=b.id;
quit;

Then it is easy to search this data set (or use a WHERE clause in the above SQL) for observations that are missing values in either dataset_a or dataset_b.

You would be wise to provide sample data along with your explanation; we really can't write code for your situation without sample data. Sample data should be provided as working SAS data step code (examples and instructions).

--
Paige Miller

shami · Posted 04-08-2024 12:57 PM

I merged the dataset with the following code by the patient ID:

data raw3;
merge raw1 raw2;
by patient_local_id;

run;

I'm not sure how I'd code after this to get the missing observations...

PaigeMiller · Posted 04-08-2024 01:03 PM

You can just look at data set RAW3 in a viewer (such as viewtable) and scroll up and down to find missing observations. Or you can do a PROC PRINT with a WHERE statement to print only observations that are missing some values.

Also

@shami wrote:

data raw3;
merge raw1 raw2;
by patient_local_id;

run;

Unless the variable names are different in RAW1 and RAW2, or you force them to be different, this won't work.

--
Paige Miller

Patrick · Posted 04-08-2024 07:19 PM

The in= option lets you capture which source table contributes data to the target table.

data raw3;
  merge raw1(in=in1) raw2(in=in2);
  by patient_local_id;
  if in1=0 or in2=0 then
    do;
      in_raw1=in1;
      in_raw2=in2;
      output;
    end;
run;

shami · Posted 04-09-2024 06:23 PM

Thank you this is helpful