DATA Step, Macro, Functions and more

Creating an alternative table from unique counts and rates from original dataset

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 77
Accepted Solution

Creating an alternative table from unique counts and rates from original dataset

Hello. I have a dataset that contains 5 variables. "ID" is a unique identifier of a person in my dataset. "Gender" is male (1) or female (2). AgeRange contains only 1 through 6, which indicates which age range they fall into based on a categorical scheme for my report. "Screen" is if they were ever medically screened. "Status" has 3 entries, positive, negative, or blank. It is blank when the patient has not been screened. Here is the sample dataset:

 

 

data JANFEB.SAMPLEUNIQUE;
  infile datalines dsd truncover;
  input ID:BEST. Gender:BEST. AgeRange:BEST. Screen:$3. Result:$8.;
datalines4;
1,1,1,Yes,Positive
2,2,2,No,
3,1,3,Yes,Negative
4,2,4,No,
5,1,5,Yes,Positive
6,2,6,No,
7,1,1,Yes,Negative
8,2,2,No,
9,1,3,Yes,Positive
10,2,4,No,
11,1,5,Yes,Negative
12,2,6,Yes,Positive
13,1,1,Yes,Negative
14,2,3,Yes,Positive
;;;;

 

How do I create a table from this dataset that groups screening and status results in this following way? I would like to primarily group my new data table by age range (1-6) and gender (1-2).

 

Age RangeGenderPositiveScreenedAllAll Screened DifferenceScreened Positive DifferenceScreening ratePrevalence Rate
1110278039351155277071%0.4%
12623253302977231970%0.3%
212511721386214114785%2.1%
222230983592494307686%0.7%
3191951114919886083%9.6%
323918352221386179683%2.1%
411891073131724488481%17.6%
4227945119024591879%2.9%
5119815861921335138883%12.5%
524211321413281109080%3.7%
6117939024793891372381%4.6%
6217539324862930375781%4.5%

 

Here are the explanation of the columns:

- Positive: Unique count under each age range and specific gender where the status variable had 'Positive' in it.

- Screened: Unique count under each age range and specific gender where the screened variable had 'Yes' in it.

- All: Unique count under each age range and specific gender in the dataset.

- All Screened Difference: All - Screened

Screened Positive Difference: Screened - Positive

- Screening rate: Screened/Total

- Positive prevalence; Positive/Screen

 

Thanks so much for your help!


Accepted Solutions
Solution
‎04-25-2017 03:35 AM
Frequent Contributor
Posts: 77

Re: Creating an alternative table from unique counts and rates from original dataset

I did a roundabout way of getting to your solution. I separated genders into two different tables and then merged them together and proc sort them according to age range and gender. I'm not sure what exactly happened though. The output of your code and mine are off. Do you know what is wrong with the code? I added a bigger dataset so that all age ranges for all genders would have some kind of entry.

 

/*extended sample dataset of 30 entries*/

data sampleunique;
  infile datalines dsd truncover;
  input ID:BEST. Gender:BEST. AgeRange:BEST. Screen:$3. Result:$8.;
datalines4;
1,1,1,Yes,Positive
2,2,2,No,
3,1,3,Yes,Negative
4,2,4,No,
5,1,5,Yes,Positive
6,2,6,No,
7,1,1,Yes,Negative
8,2,2,No,
9,1,3,Yes,Positive
10,2,4,No,
11,1,5,Yes,Negative
12,2,6,Yes,Positive
13,1,1,Yes,Negative
14,2,3,Yes,Positive
15,1,1,Yes,Negative
16,1,2,Yes,Negative
17,1,3,Yes,Negative
18,1,4,No,
19,1,5,No,
20,1,6,No,
21,2,1,No,
22,2,2,No,
23,2,3,No,
24,2,4,No,
25,2,5,Yes,Positive
26,2,6,Yes,Positive
27,2,1,Yes,Positive
28,2,6,Yes,Negative
29,1,3,Yes,Negative
30,2,1,Yes,Positive
;;;;

/*this was your code*/

proc sql;
  create table WANT as
  select  AGERANGE,
          GENDER,
          count(case when char(RESULT,1)="P" then 1 else 0 end) as POSITIVE,
          count(case when char(SCREEN,1)="Y" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          CALCULATED TOTAL - CALCULATED SCREENED as TOTAL_SCREEN_DIFF,
          CALCULATED SCREENED - CALCULATED POSITIVE as SCREEN_POS_DIFF,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREEN_RATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POS_PREV
  from    SAMPLEUNIQUE
  group by AGERANGE,
           GENDER;
quit;

/*this is mine*/

proc sql;
  create table sampleMALE as
  select  agerange,
  		  sum(case when char(result,1)="P" then 1 else 0 end) as POSITIVE,
          sum(case when screen="Yes" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          (CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
          (CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
  from    sampleunique
  where gender=1
  group by agerange;
quit;

data samplemale;
set samplemale;
gender=1;
run;

proc sql;
  create table sampleFEMALE as
  select  agerange,
  		  sum(case when char(result,1)="P" then 1 else 0 end) as POSITIVE,
          sum(case when screen="Yes" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          (CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
          (CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
  from    sampleunique
  where gender=2
  group by agerange;
quit;

data sampleFEMALE;
set sampleFEMALE;
Gender=2;
run;

data sampleMERGE;
retain agerange gender positive;
set sampleMALE sampleFEMALE;
run;

proc sort data=sampleMERGE;
by agerange gender;
run;

View solution in original post


All Replies
Super User
Super User
Posts: 7,977

Re: Creating an alternative table from unique counts and rates from original dataset

Frequent Contributor
Posts: 77

Re: Creating an alternative table from unique counts and rates from original dataset

I got most of the code to work, but not quite sure how to group genders in. Whenenver I select agerange and gender and group by the same two variables, the table creates 31,000+ entries.

 

I am using my real dataset in here, so the variables are a bit different

 

proc sql;
  create table janfeb.Janhosptypescreeningresults as
  select  agerange,
  		  sum(case when char(hcvresult,1)="P" then 1 else 0 end) as POSITIVE,
          sum(case when hcvscreening="Yes" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          (CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
          (CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
  from    janfeb.jfgmi
  where dcdeathdate >= dhms(mdy(02,01,2017),0,0,0) and dcdeathdate <= dhms(mdy(02,28,2017),0,0,0)
  group by agerange;
quit;

How do I include the count for agerange and gender into this new table? Thanks!

Super User
Super User
Posts: 7,977

Re: Creating an alternative table from unique counts and rates from original dataset

Mabye restart your question by asking what it is your actually trying to do.  From what you have said it sounds like your creating a table based on demography data.  The simplest way to do this would be to do a proc freq on your data, then post process the resulting output from that, something like:

data SAMPLEUNIQUE;
  infile datalines dsd truncover;
  input ID:BEST. Gender:BEST. AgeRange:BEST. Screen:$3. Result:$8.;
datalines4;
1,1,1,Yes,Positive
2,2,2,No,
3,1,3,Yes,Negative
4,2,4,No,
5,1,5,Yes,Positive
6,2,6,No,
7,1,1,Yes,Negative
8,2,2,No,
9,1,3,Yes,Positive
10,2,4,No,
11,1,5,Yes,Negative
12,2,6,Yes,Positive
13,1,1,Yes,Negative
14,2,3,Yes,Positive
;;;;
run;

proc freq data=sampleunique noprint;
  tables gender * agerange * screen * result / out=want;
run;

The want table can then be transposed up if needed, and you can add screening counts to that and then calculate the percentages.

Frequent Contributor
Posts: 77

Re: Creating an alternative table from unique counts and rates from original dataset

Thanks for your suggestions. How do I transpose it and add the difference and percentages? I proc freq that table and got this dataset:

 

data WORK.WANT;
  infile datalines dsd truncover;
  input agerange:32. gender:$1. HCVScreening:$9. HCVresult:$27. COUNT:32. PERCENT:32.;
datalines4;
1,1,No,,1114,
1,1,No,Negative,41,0.1619209352
1,1,Yes,Negative,2770,10.939536353
1,1,Yes,Positive,10,0.039492911
1,2,No,,931,
1,2,No,Negative,46,0.1816673907
1,2,Yes,Negative,2319,9.1584060661
1,2,Yes,Positive,6,0.0236957466
2,1,,,1,
2,1,No,,193,
2,1,No,Negative,18,0.0710872398
2,1,No,Positive,2,0.0078985822
2,1,Yes,Negative,1147,4.5298368943
2,1,Yes,Positive,25,0.0987322776
2,2,No,,389,
2,2,No,Negative,105,0.4146755657
2,2,Yes,Negative,3076,12.148019431
2,2,Yes,Positive,22,0.0868844042
3,1,,,1,
3,1,No,,173,
3,1,No,Negative,23,0.0908336954
3,1,No,Positive,1,0.0039492911
3,1,Yes,Negative,860,3.3963903479
3,1,Yes,Positive,91,0.3593854903
3,2,No,,314,
3,2,No,Negative,72,0.2843489594
3,2,Yes,Negative,1796,7.0929268196
3,2,Yes,Positive,39,0.154022353
4,1,,,4,
4,1,No,,221,
4,1,No,Negative,18,0.0710872398
4,1,No,Positive,1,0.0039492911
4,1,Yes,Negative,884,3.4911733344
4,1,Yes,Positive,189,0.7464160183
4,2,No,,195,
4,2,No,Negative,50,0.1974645551
4,2,Yes,Negative,918,3.6254492319
4,2,Yes,Positive,27,0.1066308598
5,1,,,3,
5,1,No,,305,
5,1,No,Negative,27,0.1066308598
5,1,Yes,Negative,1388,5.4816160499
5,1,Yes,Positive,198,0.7819596382
5,2,No,,241,
5,2,No,Negative,40,0.1579716441
5,2,Yes,Negative,1090,4.3047273014
5,2,Yes,Positive,42,0.1658702263
6,1,,,2,
6,1,No,,825,
6,1,No,Negative,62,0.2448560483
6,1,Yes,Negative,3725,14.711109356
6,1,Yes,Positive,179,0.7069231073
6,2,,,1,
6,2,No,,846,
6,2,No,Negative,82,0.3238418704
6,2,Yes,,1,
6,2,Yes,Negative,3757,14.837486671
6,2,Yes,Positive,175,0.6911259429
;;;;

Is there no way to do this on proc sql? I tried to select and group by agerange and gender, but it gave me 31,000 entries. Not quite sure what happened.

Super User
Super User
Posts: 7,977

Re: Creating an alternative table from unique counts and rates from original dataset

Well, you could do it in SQL, it just becomes a bit fiddly see below.  Thats why its easier to do a proc freq, then in datastep move your data up (see methods to reshape data, a ton of results on Google).  Note your test data doesn't seem to have any differences in it or maybe I misunderstand the total?

data SAMPLEUNIQUE;
  infile datalines dsd truncover;
  input ID:BEST. Gender:BEST. AgeRange:BEST. Screen:$3. Result:$8.;
datalines4;
1,1,1,Yes,Positive
2,2,2,No,
3,1,3,Yes,Negative
4,2,4,No,
5,1,5,Yes,Positive
6,2,6,No,
7,1,1,Yes,Negative
8,2,2,No,
9,1,3,Yes,Positive
10,2,4,No,
11,1,5,Yes,Negative
12,2,6,Yes,Positive
13,1,1,Yes,Negative
14,2,3,Yes,Positive
;;;;
run;

proc sql;
  create table WANT as
  select  AGERANGE,
          GENDER,
          count(case when char(RESULT,1)="P" then 1 else 0 end) as POSITIVE,
          count(case when char(SCREEN,1)="Y" then 1 else 0 end) as SCREENED,
          count(*) as ALL,
          CALCULATED ALL - CALCULATED SCREENED as ALL_DIFF,
          CALCULATED SCREENED - CALCULATED POSITIVE as SCREEN_POS_DIFF,
          (CALCULATED SCREENED / CALCULATED ALL) * 100 as SCREEN_RATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POS_PREV
  from    SAMPLEUNIQUE
  group by AGERANGE,
           GENDER;
quit;
Frequent Contributor
Posts: 77

Re: Creating an alternative table from unique counts and rates from original dataset

There are no differences in the test data. The difference variables are basically the following:

 

All_Screened_Diff= Distinct count of EVERYONE in that agerange and gender minus distinct count of those SCREENED in that agerange and gender

 

Screened_Pos_Diff= Distinct count of SCREENED in that agerange and gender minus distinct count of those POSITIVE in result in that agerange and gender

Solution
‎04-25-2017 03:35 AM
Frequent Contributor
Posts: 77

Re: Creating an alternative table from unique counts and rates from original dataset

I did a roundabout way of getting to your solution. I separated genders into two different tables and then merged them together and proc sort them according to age range and gender. I'm not sure what exactly happened though. The output of your code and mine are off. Do you know what is wrong with the code? I added a bigger dataset so that all age ranges for all genders would have some kind of entry.

 

/*extended sample dataset of 30 entries*/

data sampleunique;
  infile datalines dsd truncover;
  input ID:BEST. Gender:BEST. AgeRange:BEST. Screen:$3. Result:$8.;
datalines4;
1,1,1,Yes,Positive
2,2,2,No,
3,1,3,Yes,Negative
4,2,4,No,
5,1,5,Yes,Positive
6,2,6,No,
7,1,1,Yes,Negative
8,2,2,No,
9,1,3,Yes,Positive
10,2,4,No,
11,1,5,Yes,Negative
12,2,6,Yes,Positive
13,1,1,Yes,Negative
14,2,3,Yes,Positive
15,1,1,Yes,Negative
16,1,2,Yes,Negative
17,1,3,Yes,Negative
18,1,4,No,
19,1,5,No,
20,1,6,No,
21,2,1,No,
22,2,2,No,
23,2,3,No,
24,2,4,No,
25,2,5,Yes,Positive
26,2,6,Yes,Positive
27,2,1,Yes,Positive
28,2,6,Yes,Negative
29,1,3,Yes,Negative
30,2,1,Yes,Positive
;;;;

/*this was your code*/

proc sql;
  create table WANT as
  select  AGERANGE,
          GENDER,
          count(case when char(RESULT,1)="P" then 1 else 0 end) as POSITIVE,
          count(case when char(SCREEN,1)="Y" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          CALCULATED TOTAL - CALCULATED SCREENED as TOTAL_SCREEN_DIFF,
          CALCULATED SCREENED - CALCULATED POSITIVE as SCREEN_POS_DIFF,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREEN_RATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POS_PREV
  from    SAMPLEUNIQUE
  group by AGERANGE,
           GENDER;
quit;

/*this is mine*/

proc sql;
  create table sampleMALE as
  select  agerange,
  		  sum(case when char(result,1)="P" then 1 else 0 end) as POSITIVE,
          sum(case when screen="Yes" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          (CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
          (CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
  from    sampleunique
  where gender=1
  group by agerange;
quit;

data samplemale;
set samplemale;
gender=1;
run;

proc sql;
  create table sampleFEMALE as
  select  agerange,
  		  sum(case when char(result,1)="P" then 1 else 0 end) as POSITIVE,
          sum(case when screen="Yes" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          (CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
          (CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
  from    sampleunique
  where gender=2
  group by agerange;
quit;

data sampleFEMALE;
set sampleFEMALE;
Gender=2;
run;

data sampleMERGE;
retain agerange gender positive;
set sampleMALE sampleFEMALE;
run;

proc sort data=sampleMERGE;
by agerange gender;
run;
Super User
Super User
Posts: 7,977

Re: Creating an alternative table from unique counts and rates from original dataset

You should not put where clauses with group by's.  If you need to restrict it then (select * from have where xyz) subquery it.  As for different counts, sure your doing different grouping, you will get different output?

Frequent Contributor
Posts: 77

Re: Creating an alternative table from unique counts and rates from original dataset

Thanks. How would you put the where statement here?

 

proc sql;
  create table janfeb.Febscreeningresults as
  select  agerange, gender,
  		  sum(case when char(hcvresult,1)="P" then 1 else 0 end) as POSITIVE,
          sum(case when hcvscreening="Yes" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          (CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
          (CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
  from    janfeb.jfgmi
  where dcdeathdate >= dhms(mdy(02,01,2017),0,0,0) and dcdeathdate <= dhms(mdy(02,28,2017),0,0,0)
  group by agerange, gender;
quit;

It seems to work fine on sas when i put the where statement below the from statement. Thanks again.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 207 views
  • 0 likes
  • 2 in conversation