BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hello Folks,


I'm trying to subset a large data to fourty smaller datasets based on the "location"  AND "code" variables.

Codes are taking different values in datasteps depending on whether sites are A or B or C. 

I'd like to reduce human error by automating this subsetting process.

Is there any way? I'm familiar to macro/mend process. But I don't know how when character variable is involved taking different values. 

 

DATA HAVE; 
INPUT ID CODE $ LOCATION $;
CARDS;
1	154	A
1	157	A
1	C25	A
1	153	A
2	183	B
2	185	B
2	196	B
2	196	B
3       200     C
3       100     C 
;

DATA SUBSET_A(COMPRESS=YES);
SET HAVE(WHERE=(LOCATION="A"));
IF CODE IN ('153','154') THEN SITE=1; ELSE SITE=0;
IF CODE IN ('153','154') THEN WEIGHT=1; ELSE
IF CODE IN ('157') THEN WEIGHT=2; ELSE WEIGHT=99;
RUN;

DATA SUBSET_B(COMPRESS=YES);
SET HAVE(WHERE=(LOCATION="B"));
IF CODE IN ('183','185') THEN SITE=1; ELSE SITE=0;
IF CODE IN ('183','185') THEN WEIGHT=1; ELSE
IF CODE IN ('196') THEN WEIGHT=2; ELSE WEIGHT=99;
RUN;

DATA SUBSET_C(COMPRESS=YES);
SET HAVE(WHERE=(LOCATION="C"));
IF CODE IN ('200') THEN SITE=1; ELSE SITE=0;
IF CODE IN ('200') THEN WEIGHT=1; ELSE
IF CODE IN ('100') THEN WEIGHT=2; ELSE WEIGHT=99;
RUN;

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

May be you can define user defined formats using proc format which would be a one time effort to type carefully. In any case, I really do think that's gonna take some boring effort at least once. 

View solution in original post

4 REPLIES 4
novinosrin
Tourmaline | Level 20

splitting into many data sets based on LOCATION is pretty straight forward  however in your case I wonder how would you automate the condition of the CODE values that constitutes to SITE variables as each of seems to be mutually exclusive 

Cruise
Ammonite | Level 13

That's the problem. I wonder if it's possible to define a specific set of 'codes' for the specific combinations of 'Location-site-weight'? as a dictionary to refer? and SAS subsets data with the reference of info in that dictionary? something like that?

novinosrin
Tourmaline | Level 20

May be you can define user defined formats using proc format which would be a one time effort to type carefully. In any case, I really do think that's gonna take some boring effort at least once. 

Astounding
PROC Star

It's certainly possible to set up the known translations as a data set.  For example:

 

data master_list;
input location $ code $ site weight;
datalines;
A 153 1 1
A 154 1 1
A 157 0 2
B 183 1 1
B 185 1 1
B 196 0 2
C 200 1 1
C 100 0 2
;

Then the approach would be to use this as a hash table, where the key is the combination of CODE and LOCATION, and the data is the SITE and WEIGHT.

 

However, note that the logic is not just a matter of look-up.  You have to check whether the table actually contains the combination you are searching for.  If it does, great you can use the values found via look-up.  But if it doesn't, you still need to set SITE to 0 and WEIGHT to 99.

 

This is something that @novinosrin can do in his sleep, so I will abandon the coding efforts.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 675 views
  • 3 likes
  • 3 in conversation