BookmarkSubscribeRSS Feed
soumri
Quartz | Level 8

Hi,
I want to share individuals in classes by respecting certain conditions. I have the following data:
ID: identifier;
G: group;
W: work
TD: date of the test;
OBS: the measure.
Each ID normally belongs to a single group (G) but can have 1, 2, 3, 4 or 5 Works (W) and in each Work, it has been verified 3, 4, ... or 10 times (number of TD):

 

ID     G           W            TD                   OBS

b1     H1         1             28/02/2008      13.5

b1     H1         1             01/04/2008      17.2

b1     H1         1             17/05/2008      16.8

b1     H1         1             05/08/2008      12.5

b1     H1         1             22/09/2008      10.0

b1     H1         3             27/03/2009      22.3

b1     H1         3             23/02/2009      20.1

b1     H1         3             11/05/2009      18.3

b1     H1         3             19/06/2009      18.9

b1     H1         3             29/07/2009      16.9

b2     H1         1             02/10/2009      20.0

b2     H1         1             28/11/2009      22.2

b2     H1         1             20/12/2009      24.6

b2     H1         1             31/01/2010      23.0

b2     H1         1             27/02/2010      20.5

....

(this is just an example, you can't test the code (it requires more data)
What I am trying to do is to form classes of [Group * Year * Month of the test date] (GYM ).
In the new data I am looking for I only want to keep observations that respect the following conditions::
- Each GYM class must contain at least 6 observations.
- Each group G kept must have at least 4 different IDs.
- Each selected ID must be present with at least its first 3 observations for a given Work.

Please note that we may lose whole works for a given ID, or some IDs for a given group and all their controls, or even entire groups with all the IDs they contain.

data new; set have;
Y = year (TD);
M = month (TD);
GYM = compress (G || Y || M);
proc sort; by ID W;
run;


proc sql; 
create table new2 as
select *, count (distinct(TD)) as N_TD
from new
group by GYM;
quit;

/ ****** Here, I noticed that the number of observations by GYM classes varies from 1 to hundreds. (I want to keep only the groups with at least 6 observations, by respecting the conditions above) ***/

 

Can someone help me formulate this code or tell me if there is another way?
Thank you for your expertise and insight.

4 REPLIES 4
Reeza
Super User

I don't think SQL will work for this condition:

- Each selected ID must be present with at least its first 3 observations for a given Work.

 

Your best bet is to summarize the data, so calculate each of your statistics (# of Gym, # of distinct ID, first three obs) and then you can filter that data sets. The # of Gym and # of distinct ID can be calculated inline with SQL and then you can check the first three obs with a data step. I'm also assuming that the 'first three' is when sorted by date?

 

This won't work, because I don't know what a GYM is, or how you identify it in your data but it should give you the idea of how to set up your data. 

 

proc sql;
create table summary_stats as
select *, count(G) as N_G, count(distinct ID) as unique_ID
from have
group by GYM;
quit;

data want;
set summary_stats;
by gym;
if first.gym then do; count=0; nmissing=0; end;
count+1;
if not missing(obs) then nmissing+1;
if count=3 and nmissing=0 then output ID;
run;
soumri
Quartz | Level 8

Yes, I understood what you did, but it did not work by adjusting it to my actual data. find attached a part of my file in csv format that can still help (dates are given in SAS format).

Reeza
Super User

What is a 'GYM' and which variable represents a GYM?

Saying 'it did not work' doesn't really explain what didn't work. Please show your code and explain how your results don't match what you want - which I may be interpreting incorrectly because that was never specified either. 

 

I'm sorry, I don't have time to download a file, write an import and then answer your question. If you post it as an example with full code I'll take another look, otherwise I'm relatively sure someone else will help out shortly. 

 

 

soumri
Quartz | Level 8

a GYM is a new variable created by putting (Group || Test Year || Test Month), Test year  (Y) and Test Month (M) were taken from TD variable by Y = year (TD) and M = month (TD)

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1788 views
  • 0 likes
  • 2 in conversation