BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
wlierman
Lapis Lazuli | Level 10

I am continuing from an earlier question.

 

My data input set looks like the following:

 

ID              County        Race_Ethnicity    NAICS_Sector    Sector_Type       Date

1056200    Yamhill                                      Goods           Construction       10Jun2020

1054324    Baker                                        Goods     Agriculture Forestry  10Jun2020

2314536    Grant                                         Goods         Construction         10Jun2020                     .

                    .

2145735     Curry                                         Goods          Construction       22Jun2020

                    .

                    .

176834       Columbia                                   Goods        Manufacturing     13Nov2020

170034       Columbia                                    Goods       Manufacturing     13Nov2020

198784       Columbia                                    Goods        Manufacturing    13Nov2020

 

and so on.  The main characteristic is that Contact ID  can appear any number of times for any particular date - it is dependent on the success of the contact tracing.

 

I want to keep the fields shown but for the next sql code I need to have the total IDs

for each date totaled.  Again the number of contacts for any date depends on the contact tracing effort and success of those efforts.

 

I used the following code.

Proc sql noprint;
CREATE Table SAS_EMPL.POE_Goods_Trend AS
   SELECT Contact_Person_ID,
          County_1,
		  Race_Ethnicity,
		  NAICS_Sector,
		  Sector_Type,
		  datepart(Created_On) AS Date label = "Date" format=date9.,
		  count(distinct Contact_Person_ID) AS N
   FROM SAS_EMPL.POE_GOODS_FINI
   Where '01JAN20'd <= Calculated Date <= '31MAR22'd
   And NAICS_Sector = "GOODS"
   Group by date;
   *Order by 1;
   quit;

However it didn't give what I need

 

This what I got:

 

 

    Date                 N

04JUN2020          1

05JUN2020          1

09JUN2020          1

09JUN2020          1

09JUN2020          1

11JUN2020          1

11JUN2020          1

11JUN2020          1

11JUN2020          1

11JUN2020          1

11JUN2020          1

 

This what I need/want:

 

    Date                  N

04JUN2020           1

05JUN2020           1

09JUN2020           3

11JUN2020           6

 

So I need the output to look like the last little snippet for all the dates that are in the

dataset (the cumulative total for each date).

 

Thank you.

 

For information the log accompanying the above code is

 Proc sql noprint;
328  CREATE Table SAS_EMPL.POE_Goods_Trend AS
329     SELECT Contact_Person_ID,
330            County_1,
331            Race_Ethnicity,
332            NAICS_Sector,
333            Sector_Type,
334            datepart(Created_On) AS Date label = "Date" format=date9.,
335            count(distinct Contact_Person_ID) AS N
336     FROM SAS_EMPL.POE_GOODS_FINI
337     Where '01JAN20'd <= Calculated Date <= '31MAR22'd
338     And NAICS_Sector = "GOODS"
339     Group by date;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Compressing data set SAS_EMPL.POE_GOODS_TREND decreased size by 60.00 percent.
      Compressed is 4 pages; un-compressed would require 10 pages.
NOTE: Table SAS_EMPL.POE_GOODS_TREND created, with 2467 rows and 7 columns.

340     *Order by 1;
341     quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.04 seconds
      cpu time            0.04 seconds

Thank you for your help in answering this specific request.

 

wlierman

1 ACCEPTED SOLUTION

Accepted Solutions
10 REPLIES 10
andreas_lds
Jade | Level 19

Please post data in usable form (data step with datalines, or proc sql with insert), so that we can run the code you have provided.

The expected output doesn't match the data posted and i don't see "I want to keep the fields shown" in the expected output, too.

 

wlierman
Lapis Lazuli | Level 10

Okay here is the input data set (in xlsx format)

Created_On	Contact_Person_ID	NAICS_Sector
7/15/2020 9:38	1000850	GOODS
10/16/2020 10:14	1011546	GOODS
9/22/2020 8:41	1012367	GOODS
6/13/2020 9:30	1012517	GOODS
11/2/2020 8:52	1014369	GOODS
8/11/2020 9:55	101964	GOODS
10/13/2020 11:33	1025507	GOODS
12/5/2020 9:35	1026268	GOODS
10/31/2020 11:28	1027330	GOODS
9/1/2020 10:12	1028370	GOODS
10/15/2020 10:57	1050427	GOODS
12/2/2020 8:15	1050453	GOODS

The code that I used is as follows

 

Proc sql noprint;
CREATE Table SAS_EMPL.POE_Goods_Trend_2 AS
   SELECT Contact_Person_ID,
		  NAICS_Sector,
		  datepart(Created_On) AS Date label = "Date" format=date9.,
		  count(distinct Contact_Person_ID) AS N
   FROM SAS_EMPL.POE_GOODS_FINI_2
   Where '01JAN20'd <= datepart(Created_On) <= '30APR22'd
   And NAICS_Sector = "GOODS"
   Group by date
   Order by date;
   quit;

 The output dataset (partial and again in xlsx format) is

 

Contact_Person_ID	NAICS_Sector	Date	N
1056200	GOODS	10-Jun-20	1
1069490	GOODS	13-Jun-20	2
1012517	GOODS	13-Jun-20	2
1074374	GOODS	14-Jun-20	6
1072985	GOODS	14-Jun-20	6
1074445	GOODS	14-Jun-20	6
1074884	GOODS	14-Jun-20	6
951206	GOODS	14-Jun-20	6
1074312	GOODS	14-Jun-20	6
1077220	GOODS	15-Jun-20	1
1079736	GOODS	16-Jun-20	1
1083450	GOODS	17-Jun-20	1
1089620	GOODS	18-Jun-20	5
1090413	GOODS	18-Jun-20	5
9991088408	GOODS	18-Jun-20	5
1090459	GOODS	18-Jun-20	5
1087981	GOODS	18-Jun-20	5
1092420	GOODS	19-Jun-20	2
9991088337	GOODS	19-Jun-20	2
1098412	GOODS	20-Jun-20	7
1098376	GOODS	20-Jun-20	7
1098407	GOODS	20-Jun-20	7
1097263	GOODS	20-Jun-20	7
1080505	GOODS	20-Jun-20	7
1098432	GOODS	20-Jun-20	7
1081349	GOODS	20-Jun-20	7
1101656	GOODS	21-Jun-20	3
1102603	GOODS	21-Jun-20	3
666589	GOODS	21-Jun-20	3
1107510	GOODS	22-Jun-20	4
1106453	GOODS	22-Jun-20	4
1107392	GOODS	22-Jun-20	4
1107703	GOODS	22-Jun-20	4
933588	GOODS	23-Jun-20	5
1110589	GOODS	23-Jun-20	5
1110335	GOODS	23-Jun-20	5
1109851	GOODS	23-Jun-20	5
1109712	GOODS	23-Jun-20	5

This not what I want.  I want this result

 

 Date          N
10-Jun-20   1
13-Jun-20   2
14-Jun-20   6
15-Jun-20   1
16-Jun-20   1
17-Jun-20   1
15-Jun-20   3
18-Jun-20   5

and so on for the
entire data
set

I need the cumulative total of contacts by each distinct date.  Notice here that there is a lone 15-JUN-20 1  in the 4th obs but should be in the

second to last line so that 15-JUN-20 should be  4

 

What changes in the code can produce that total / cumulative count of contacts (N) for each distinct / unique date.

 

Thank you

 

Kurt_Bremser
Super User

I ask once again (and for the very last time) why do you select additional variables, when all you want is the date and the count?

 

PS I edited your post down to a size that the browsers can easily handle. No need to swamp the post with hundreds of lines when a few are sufficient to illustrate your issue. Nobody answers to a post that takes more than a minute to load.

And you were specifically asked to provide data in a usable form (as a data step with datalines or a proc sql with insert). A copy/paste directly out of Excel is not usable (it forces us to put import code around it, making assumptions about things we must know). Right now we can't tell if you have dates and times or datetimes, and if those are stored as SAS numeric values or strings.

wlierman
Lapis Lazuli | Level 10
I need those variables in subsequent calculations so I carry them along to help speed up the process.
I used the usual data step process to get the output that I need for the time series charts.
Thank you for your help.

wlierman
Kurt_Bremser
Super User

Then you will never get your expected data as you posted it. You can only get that with

select date, count(...) as count
from have
group by date;
wlierman
Lapis Lazuli | Level 10

Right.

 

Thank you for that insight.

 

wlierman

Tom
Super User Tom
Super User

Your example data does not have any replication of dates.

So first thing is to fix that.  

Here is how to supply example data.  I modified the data to have multiple contacts on the same day.

data have;
  input Created_On :datetime. Contact_Person_ID :$20. NAICS_Sector :$20.;
  format Created_On datetime19. ;
cards;
15JUL2020:09:38:00 1000850 GOODS
15OCT2020:10:14:00 1011546 GOODS
22SEP2020:08:41:00 1012367 GOODS
13JUN2020:09:30:00 1012517 GOODS
02NOV2020:08:52:00 1014369 GOODS
11AUG2020:09:55:00  101964 GOODS
15OCT2020:11:33:00 1025507 GOODS
05DEC2020:09:35:00 1026268 GOODS
31OCT2020:11:28:00 1027330 GOODS
01SEP2020:10:12:00 1028370 GOODS
15OCT2020:10:57:00 1050427 GOODS
02DEC2020:08:15:00 1050453 GOODS
;

Now your code should work fine.

proc sql ;
create table want AS
  select Contact_Person_ID
       , NAICS_Sector
       , datepart(Created_On) AS Date format=date9.
       , count(distinct Contact_Person_ID) AS N
  from have
  where '01JAN2020'd <= datepart(Created_On) <= '30APR2022'd
    and NAICS_Sector = "GOODS"
  group by date
  order by date
;
quit;

Results

       Contact_
       Person_     NAICS_
Obs       ID       Sector         Date    N

  1    1012517     GOODS     13JUN2020    1
  2    1000850     GOODS     15JUL2020    1
  3    101964      GOODS     11AUG2020    1
  4    1028370     GOODS     01SEP2020    1
  5    1012367     GOODS     22SEP2020    1
  6    1025507     GOODS     15OCT2020    3
  7    1050427     GOODS     15OCT2020    3
  8    1011546     GOODS     15OCT2020    3
  9    1027330     GOODS     31OCT2020    1
 10    1014369     GOODS     02NOV2020    1
 11    1050453     GOODS     02DEC2020    1
 12    1026268     GOODS     05DEC2020    1

So you can see that 15OCT2020 has 3 different customers.

 

PS It does seem silly to order the data by DATE and not have DATE as the FIRST variable in the dataset.  It makes it hard for humans to review the dataset to check if it was created properly.

wlierman
Lapis Lazuli | Level 10
Thank you for your explanation and set up.
I really appreciate that.
wlierman

wlierman
Lapis Lazuli | Level 10

Thank you very much.

 

This is exactly what I was looking for.

 

This should also be labeled as SOLUTION

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 848 views
  • 1 like
  • 4 in conversation