BookmarkSubscribeRSS Feed
msf2021
Fluorite | Level 6

I am new to SAS and would like any help for the following:

I have one dataset "table1":

 

data table1;
  input id $ yearmonth:yymmn6. var1 $ var2 var3 Target;
  format yearmonth yymmn6.;
datalines;
A 202002 xxx 1 2 1
B 202004 zz 2 23 1
C 202011 ee 4 4 1
D 202011 ff 5 5 1
E 202103 tt 6 6 1
F 202103 aa 9 89 1
;

And other dataset "table2":

 

data table2;
  input id $ yearmonth:yymmn6. var1 $ var2 var3 Target;
  format yearmonth yymmn6.;
datalines;
E 202002 aac 23 13 0
F 202004 jsk 4 14 0
A 202011 skal 5 18 0
F 202011 aks 9 13 0
B 202103 xka 1 7 0
G 202103 als 12 9 0
;

And I would like to get a stratified sample from table2 by yearmonth ("Want" table) but with following restrictions:

  • I cannot have same id in table2 that i have in table1
  • I cannot have duplicated id's in want table from table 2 after stratified sample (in table 2 they are duplicated because they can be in two different months)
  • I would like to choose the dimension of the stratified sample (n) to achieve the following:
    • appending the dataset1 and dataset2 the count of id of Target 0 data (coming from dataset2) should be 80% of the total data

The output would then be :

id  yearmonth   Var1    Var2    Var3    Target
A   202002      xxx     1        2       1
B   202004      zz      2       23       1
C   202011      ee      4        4       1
D   202011      ff      5        5       1
E   202103      tt      6        6       1
F   202103      aa      9       89       1
G   202103      als     12       9       0

Any help would be very much appreciated!

Thanks!!

4 REPLIES 4
Oligolas
Barite | Level 11

Hi,

you get the output with this:

PROC SQL;
   CREATE TABLE want AS
      SELECT *
      FROM table1 
      UNION CORRESPONDING 
      (SELECT * 
       FROM table2 
       WHERE ID not in
         (SELECT ID 
          FROM table1)
      )
   ;
QUIT;

I do not understand what you mean with:

@msf2021 wrote:

I would like to choose the dimension of the stratified sample (n) to achieve the following:

appending the dataset1 and dataset2 the count of id of Target 0 data (coming from dataset2) should be 80% of the total data

________________________

- Cheers -

msf2021
Fluorite | Level 6
Hi! In that way i am not having a stratified sample. 😞
What i mean is i want to get a new dataset "want" that basically is the appendix of dataset1 and the sample (stratified) from dataset2. And i want that the rows in the appendix that come from dataset2 represent 20% of total rows. I hope that i was clear enough, if not, let me know 🙂
Oligolas
Barite | Level 11

Hi,

nope it's not clear to me. Try to post a most explicit example of what you would like to achieve and how you would select the rows to display if the rows from dataset2 exceed the 20% of total rows.

________________________

- Cheers -

ballardw
Super User

Are you going to add selection probabilities or weights to this sample?

sas-innovate-white.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Early bird rate extended! Save $200 when you sign up by March 31.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1032 views
  • 0 likes
  • 3 in conversation