BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ucdcrush
Obsidian | Level 7

Hello all. I am working with data similar to the following

(this is the individual row level data with many other columns for a given lab test received):

labname     received_date    

Quest            2/1/17              

Quest            2/4/17                 

Quest            2/4/17

Labcorp         2/1/17

Labcorp        2/2/17

 

and I would like to produce a final summary table that looks like this, which includes rows for all days in a range, not only

the days that had data associated with them for that lab, i.e.,

 

labname        date     count 

Quest           2/1/17       1

Quest           2/2/17       0

Quest           2/3/17       0

Quest           2/4/17       2

Labcorp       2/1/17        1  

Labcorp       2/2/17        1  

Labcorp       2/3/17        0

Labcorp       2/4/17        0

...

 

I am able to get the counts in a table from proc freq where there were records for a given lab on a particular day. But I would like to produce the final table/report to include all days within a range, so that we can look at the report and see when there were no records sent on particular days.

 

I am not sure how to approach this. Something in my brain says I could create a separate table of all dates of interest, then somehow join or merge to have them included in the data, but I don't know how to do that nor am I sure that's the path to go down.

 

Any ideas are appreciated, thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Assuming:

 

  1. data are grouped (not necessarily sorted) by labname
  2. data are sorted by received_date within labname
  3. every labname begins with are received_date=2/1/17
  4. the maximum date is known to be 2/4/17

then:

 

data want;
  do count=1 by 1 until (last.received_date);
    set have ;
    by labname received_date notsorted;
    if eod=0 then set have (firstobs=2 keep=received_date rename=(received_date=nxtdate)) end=eod;
  end;
  output;

  lastdate=ifn(last.labname,'04feb2017'd,nxtdate-1);
  count=0;
  if lastdate>received_date then do received_date=received_date+1 to lastdate;
    output;
  end;
run;

 

 

If not all lab's begin on 2/1/2017, then you could modify to:

 

%let begdate=01feb2017;
%let enddate=04feb2017;

data want;
  do count=1 by 1 until (last.received_date);
    set have ;
    by labname received_date notsorted;
    if eod=0 then set have (firstobs=2 keep=received_date rename=(received_date=nxtdate)) end=eod;
  end;

  /* Fill any gaps at the beginning of a lab */
  if lag(labname)^=labname and received_date> "&begdate"d then do;
    store_count=count; store_date=received_date;
    count=0;
    do received_date="&begdate"d to store_date-1;
      output;
    end;
    count=store_count; received_date=store_date;
  end;

  /* Output current count */
  output;

  /* Take care of any gaps following this date */
  lastdate=ifn(last.labname,"&enddate"d,nxtdate-1);
  count=0;
  if lastdate>received_date then do received_date=received_date+1 to lastdate;
    output;
  end;
run;

 

In this program only the expected date range is known in advance (see the %let begdate=  and %let enddate= statements).  You could also run a PROC SQL with "select into :" to gets those dates, as in:

 

proc sql noprint;

  select min(received_date) format=date9., max(received_date) format=date9.

  into :begdate , :enddate

  from have;

quit;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
jondowns
Calcite | Level 5

Hi UCDCRush,

 

So many firsts today! Had my first question and my first (hopefully helpful) answer.

 

I've quickly written some code you may find useful. In general, proc sql works great for these kinds of issues. The only downside is I don't really know how to generate 0's for days where you have no data for a particular lab.

 

*Input data;
data test;
	input lab: $ 
		  date: mmddyy10.;
		format date mmddyy10.;
	datalines;
Quest	2/1/17
Quest	2/4/17
Quest	2/4/17
Labcorp	2/1/17
Labcorp	2/2/17
;

run;

*Print to make sure results are as expected;
proc print data=test;
	attrib _all_ format=;
run;

*Use SQL to create a table. You want the count of total entries for each day grouped by both the date and the lab;
proc sql;
	create table sums as
	select lab as lab,  date as date, count(lab) as count
	from test
	group by date, lab;
quit;

*Print new table, check for accuracy;
proc print data=sums;
run;

You can use "where" statements after you generate your count data to limit to the dates you want. So, if you wanted to limit to dates between 2/1/17 and 2/4/17, you could use this code:

 

*Print new table, check for accuracy;
proc print data=sums;
	where 20851<=date<=20854;
run;

You just have to be familiar with how SAS codes dates. using "attrib _all_ format=.;" is a statement you can add to almost any proc to see the unformatted values of variables.

 

 

Hope this helps!

jondowns
Calcite | Level 5

Sorry for the double post. Here is the table you get when you print the "sums" dataset:

 

Obs

lab

date

count

1

Labcorp

02/01/2017

1

2

Quest

02/01/2017

1

3

Labcorp

02/02/2017

1

4

Quest

02/04/2017

2

mkeintz
PROC Star

Assuming:

 

  1. data are grouped (not necessarily sorted) by labname
  2. data are sorted by received_date within labname
  3. every labname begins with are received_date=2/1/17
  4. the maximum date is known to be 2/4/17

then:

 

data want;
  do count=1 by 1 until (last.received_date);
    set have ;
    by labname received_date notsorted;
    if eod=0 then set have (firstobs=2 keep=received_date rename=(received_date=nxtdate)) end=eod;
  end;
  output;

  lastdate=ifn(last.labname,'04feb2017'd,nxtdate-1);
  count=0;
  if lastdate>received_date then do received_date=received_date+1 to lastdate;
    output;
  end;
run;

 

 

If not all lab's begin on 2/1/2017, then you could modify to:

 

%let begdate=01feb2017;
%let enddate=04feb2017;

data want;
  do count=1 by 1 until (last.received_date);
    set have ;
    by labname received_date notsorted;
    if eod=0 then set have (firstobs=2 keep=received_date rename=(received_date=nxtdate)) end=eod;
  end;

  /* Fill any gaps at the beginning of a lab */
  if lag(labname)^=labname and received_date> "&begdate"d then do;
    store_count=count; store_date=received_date;
    count=0;
    do received_date="&begdate"d to store_date-1;
      output;
    end;
    count=store_count; received_date=store_date;
  end;

  /* Output current count */
  output;

  /* Take care of any gaps following this date */
  lastdate=ifn(last.labname,"&enddate"d,nxtdate-1);
  count=0;
  if lastdate>received_date then do received_date=received_date+1 to lastdate;
    output;
  end;
run;

 

In this program only the expected date range is known in advance (see the %let begdate=  and %let enddate= statements).  You could also run a PROC SQL with "select into :" to gets those dates, as in:

 

proc sql noprint;

  select min(received_date) format=date9., max(received_date) format=date9.

  into :begdate , :enddate

  from have;

quit;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ucdcrush
Obsidian | Level 7
mkeintz - thank you very much, and for providing the alternate methods to specify the date range. It works perfectly!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 885 views
  • 0 likes
  • 3 in conversation