BookmarkSubscribeRSS Feed
msf2021
Fluorite | Level 6

I am new to SAS and would like any help for the following:

I have one dataset "table1":

 

data table1;
  input id $ yearmonth:yymmn6. var1 $ var2 var3 Target;
  format yearmonth yymmn6.;
datalines;
A 202002 xxx 1 2 1
B 202004 zz 2 23 1
C 202011 ee 4 4 1
D 202011 ff 5 5 1
E 202103 tt 6 6 1
F 202103 aa 9 89 1
;

And other dataset "table2":

 

data table2;
  input id $ yearmonth:yymmn6. var1 $ var2 var3 Target;
  format yearmonth yymmn6.;
datalines;
E 202002 aac 23 13 0
F 202004 jsk 4 14 0
A 202011 skal 5 18 0
F 202011 aks 9 13 0
B 202103 xka 1 7 0
G 202103 als 12 9 0
;

And I would like to get a stratified sample from table2 by yearmonth ("Want" table) but with following restrictions:

  • I cannot have same id in table2 that i have in table1
  • I cannot have duplicated id's in want table from table 2 after stratified sample (in table 2 they are duplicated because they can be in two different months)
  • I would like to choose the dimension of the stratified sample (n) to achieve the following:
    • appending the dataset1 and dataset2 the count of id of Target 0 data (coming from dataset2) should be 80% of the total data

The output would then be :

id  yearmonth   Var1    Var2    Var3    Target
A   202002      xxx     1        2       1
B   202004      zz      2       23       1
C   202011      ee      4        4       1
D   202011      ff      5        5       1
E   202103      tt      6        6       1
F   202103      aa      9       89       1
G   202103      als     12       9       0

Any help would be very much appreciated!

Thanks!!

4 REPLIES 4
Oligolas
Barite | Level 11

Hi,

you get the output with this:

PROC SQL;
   CREATE TABLE want AS
      SELECT *
      FROM table1 
      UNION CORRESPONDING 
      (SELECT * 
       FROM table2 
       WHERE ID not in
         (SELECT ID 
          FROM table1)
      )
   ;
QUIT;

I do not understand what you mean with:

@msf2021 wrote:

I would like to choose the dimension of the stratified sample (n) to achieve the following:

appending the dataset1 and dataset2 the count of id of Target 0 data (coming from dataset2) should be 80% of the total data

________________________

- Cheers -

msf2021
Fluorite | Level 6
Hi! In that way i am not having a stratified sample. 😞
What i mean is i want to get a new dataset "want" that basically is the appendix of dataset1 and the sample (stratified) from dataset2. And i want that the rows in the appendix that come from dataset2 represent 20% of total rows. I hope that i was clear enough, if not, let me know 🙂
Oligolas
Barite | Level 11

Hi,

nope it's not clear to me. Try to post a most explicit example of what you would like to achieve and how you would select the rows to display if the rows from dataset2 exceed the 20% of total rows.

________________________

- Cheers -

ballardw
Super User

Are you going to add selection probabilities or weights to this sample?

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1389 views
  • 0 likes
  • 3 in conversation