About r4321

r4321 · ‎06-07-2016

PG, Sorry for the delayed reply. Ive actually been pouring over this code / dataset trying to figure out how to get things correct. So in reference to the first piece code that you helped me with. I used the following: data adtc.ss7 (drop = i); set adtc.ss6; count1 = 1; dm1 = sic1; array items{35} sic36-sic70; do i = 1 to 35; if count1 then if items{i} = sic1 then call missing(count1, dm1); end; run; data adtc.ss7 (drop = i); set adtc.ss7; count2 = 1; dm2 = sic2; array items{35} sic36-sic70; do i = 1 to 35; if count2 then if items{i} = sic2 then call missing(count2, dm2); end; run; And I repeated the code for each SIC I need to compare for sic codes 1-35 (sic1-sic35). However, when Im done and add things up across count1-35, Im not getting an accurate count. The reason is because the array (e.g., sic36-sic70) may have missing cells (sometimes only some missing, sometimes all) and in that case, any value (sic1-sic35) that is compared is treated as though it is not in the array and is counted. Second, the values Im comparing (sic1-sic35) may have duplicates. For example, sic1 may be 5149, sic2 may be 5149, sic3 may be 5149, etc. So, I need to modify the code so that it takes into account missing values in the array (sic37-sic70) when comparing and I need to have it only count the unique sic's that didn't match. It seems like I need more complicated code and would need to do this in one step as opposed to the two steps Im currently doing it in. Or maybe, there's a better two step approach. However, Im not sure how to expand upon my code.

r4321 · ‎06-03-2016

PG Stats, thanks so much for your quick reply. This has worked great. Another question for you... What if I now wanted to figure out if a code (e.g., sic71) matched ANY of the codes within a range of columns (sic1-sic35)? I want to create a count variable if it does match (in a count variable) and I want it to regurgitate the code that it matched (in another variable) (just to double check the code it matched on). Furthermore, there code be blanks within the range of columns (sic1-sic35) and there could also be blanks on some of the variable's (sic71) observations. I tried to slightly modify the code you gave me. Let me know what you think. And thanks again! data adtn.match1 (drop = i); set adtn.match; matchcount1 = 1; sicmatch = sic71; array items{35} sic1-sic35; do i = 1 to 35; if matchcount1 then items{i} = sic71 then call missing(matchcount1, sicmatch); end; run;

r4321 · ‎06-03-2016

Ballard, thanks a lot, this code is great. I appreciate your reply as well! But it was not exactly what I was looking for...I wish I could accept both as a solution though! Sorry for the late reply as well.

r4321 · ‎06-01-2016

Hello everyone, I have a dataset with many columns, where I would like to compare the contents of each column with some other columns and create a variable if things do not match up. Specifically, create both a count variable and create a new variable that has the specific code that did not match up. This is the code im currently using... data adtn.ss7 (drop = i); set adtn.ss6; count=.; check =.; array items{35} sic36-sic70; do i=1 to 35; if items{i}~=sic1 then count=1; if items{i}~=sic1 then check=sic1; end; run; The problem Im having is that SAS counts a '1' if ANY of the variables in my items array (sic36-sic70) do not match sic1. However, I only want it to count a '1' and for my 'check' variable to populate if there isn't a match in my entire array. To state it differently, I only want my counter variable to count if there is no match anywhere sic36-sic70. As in, it is ok if only one column within the array matches onto sic1. Furthermore, my array sic36-sic70 has a lot of missing values. Your help is greatly appreciated! Thanks!

r4321 · ‎05-19-2016

At least a few letters, but the important thing is that it's matching on the main part of the name. e.g., Ford = ford motor

r4321 · ‎05-19-2016

Thanks for your reply. I will be matching on both of the variables eventually, but only one at a time. I now have two datasets, each have a firm name and a year. And I need to merge them on those two variables because I want to merge in the rest of the information that each observation contains. They are firm names, so the names are similar, but not always exact. Something like Data set 1 Data set 2 Firm Firm Google Inc Google Ford Motor Ford Motorola holdings Motorola solutions Sometimes they will be an exact match, sometimes they will not be. I would like a reasonable error variance, but obviously not too much where it's creating too many unnecessary matches. And yes, ideally it would be iterative. Bear in mind that I am by no means an advanced programmer, but do use sas to get around (probably often using simpler code in more steps than someone else could do in one step). I usually use a command like this to merge proc sql; create table adtr.a1 as select a.*, b.address1, b.address2 from adtr.stage as a left join adtr.test as b on a.code= b.code and a.year=b.year; quit; I would like to fuzzy match on firm name and also on year. Thanks for your help!

r4321 · ‎05-19-2016

Took a very long time to run ..Maybe an hour... and screwed up on an error...Kinda stuck on this. 3092 proc sql; 3093 select a.*,b.firm1 as matched_stores 3094 from adtr.stage as a,adtr.test as b 3095 group by a.conm 3096 having spedis(a.conm,b.firm1)=min(spedis(a.conm,b.firm1)); NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. NOTE: The query requires remerging summary statistics back with the original data. NOTE: Writing HTML Body file: sashtml.htm ERROR: Sort execution failure. 3097 quit; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE SQL used (Total process time): real time 1:08:53.67 cpu time 26:32.45

r4321 · ‎05-18-2016

Yes, I did some reading about those.. but I dont know how to go about this / how it looks. Im fairly used to these type of merge functions: proc sql; create table adtr.a1 as select a.*, b.acode, b.bcode from adtr.bld2 as a left join adtr.all4 as b on a.year_code = b.year_code; quit; However, I would want to merge on the firm name (which wouldnt always be identical) and the year.

r4321 · ‎05-18-2016

Thanks, Ksharp - that part worked great! Still struggling with the fuzzy match part, however.

r4321 · ‎05-18-2016

r4321 · ‎05-18-2016

Hello everyone, I have an excel that I imported into SAS. The excel file contains information on companies that I need to separate into columns and then I want to match the observation on those company names that I have in another data file. There are at least 2 company names in each cell and can be up to 19 company names. Ex; cname row1 Microsoft Inc Google Inc row 2 General electric General motors Row 3 General motors Ford General electric What I want is ----> cname1 cname2 cname3 row1 Microsoft Inc Google Inc row2 General electric General Motors row3 General motors Ford General electric AND THEN what I want to do is match the observation on company names stored in another file. However, the company names in the other file wont always be exact matches per se..... e.g., Microsoft Inc = Microsoft or Google inc = Google or Ford = Ford motors or Motorola Inc = motorola communications or du pont = ei dupont de nemours Ive attached some screenshots so that you can get a sense of what Im working with..Thanks for any help in advance!

r4321 · ‎05-18-2016

Awesome ...This worked great! So Im getting something out of this, would you mind checking my understanding? data adtr.as8; set adtr.as7; array col {131} nation_1-nation_131; ****creates an array for nation_1-nation_131 array new {131} $20 _temporary_; ****creates an identical array for nation_1-nation_131 in order to do comparisons do _n_=1 to 131; ***tells it go through 131 iterations new{_n_} = col{_n_}; ***comparing the cells end; call sortc(of new{*}); **sorts the log to facilitate comparisons count = (new{1} > ' '); **only count if not missing do _n_=2 to 131; **this array tells it to count the number of unique values in the array. if new{_n_} ne new{_n_-1} then count + 1; end; run;

r4321 · ‎05-18-2016

r4321 · ‎05-18-2016

It seems as though the code runs.. But it doesn't quite return the desired result, the count is inaccurate.

r4321 · ‎05-18-2016

Thank you... I tried this: data adtr.as8; set adtr.as7; array Country(*) nation_1-nation_131; CntryCount=0; do i=2 to 131; if Country(i) ne . and Country(i) ne Country(i-1) then CntryCount+1; end; run; However, I get a number of errors. sample: NOTE: Invalid numeric data, 'France' , at line 2142 column 8. NOTE: Invalid numeric data, 'Switzerland' , at line 2142 column 8. NOTE: Invalid numeric data, 'United States' , at line 2142 column 8 Thanks

Online Status	Offline
Date Last Visited	‎05-11-2021 12:21 AM

Re: Enhancing code that reorganizes data around time based events

Re: Enhancing code that reorganizes data around time based events

Enhancing code that reorganizes data around time based events

Re: Computing the change in values across time and state

Re: Computing the change in values across time and state

Re: Computing the change in values across time and state

Re: Computing the change in values across time and state

Computing the change in values across time and state

Re: Looking to create indicator variables that help me organize a matc...

Re: Looking to create indicator variables that help me organize a matc...

Re: Enhancing code that reorganizes data around time based events

Re: Enhancing code that reorganizes data around time based events

Re: Computing the change in values across time and state

Re: Computing the change in values across time and state

Re: Looking to create indicator variables that help me organize a matc...

Re: How to fill in current year's variable with last year's variable v...

Re: comparing cells in sas

Re: comparing cells in sas

Re: comparing cells in sas

comparing cells in sas

Re: Separating names in a cell and then fuzzy merging

Re: Separating names in a cell and then fuzzy merging

Re: Separating names in a cell and then fuzzy merging

Re: Separating names in a cell and then fuzzy merging

Re: Separating names in a cell and then fuzzy merging

Re: Separating names in a cell and then fuzzy merging

Separating names in a cell and then fuzzy merging

Re: count unique variable values across columns (within same row)

Re: count unique variable values across columns (within same row)

Re: count unique variable values across columns (within same row)

Re: count unique variable values across columns (within same row)