Getting Different values in R and in MicroStrategy Metric

Feb 25, 2015 at 8:15 AM
Edited Feb 25, 2015 at 8:19 AM
We have written a R Script for Scoring(RFM). In the script we have try catch functionality enabled. Also, a R workspace is being saved. Deployed the script using DeployR utility and created metric in MicroStrategy using the metric expression generated.Executed the report with this metric. This metric will display the Total Score a customer gets as per the RFM model.

Loaded the R workspace created after report execution. Values displayed for a particular Customer varies in MicroStrategy Report and the R workspace.

Any idea why this should be happening?
Feb 26, 2015 at 12:54 PM
Hi Nidhi,

That's interesting. Can you provide some more information on how the two values are differing, maybe with a few examples? Also, which script are you using? Is it off the shelf or a homemade R script? What packages are called for in the script?

Thank you!
Mar 4, 2015 at 9:40 AM
Edited Mar 4, 2015 at 9:40 AM
Hi Erik,

We tend to perform Scoring(RFM Analysis) on the customer data. Shelf doesn't have script for same hence using Homemade script(below).
When i execute the script in R using RStudio, it gives me same score for a customer even if the order in which input is passed changes. But with MicroStrategy, when i change the order of input in DeployR utility and then execute the MicroStrategy report the Customer Scores differ.

As an example, Customer ID=1001128-
Script executed through R, Score=555
Script executed through MicroStrategy=155
################################################################################
#To Create the RFM Score

################################################################################


################################################################################
# Function
#   scoring(df,column,r=5)
#
# Description
#   A function to be invoked by the getIndepandentScore function
#######################################
scoring <- function (df,column,r=5){

#get the length of rows of df
len <- dim(df)[1]

score <- rep(0,times=len)

# get the quantity of rows per 1/r e.g. 1/5
nr <- round(len / r)
if (nr > 0){

    # seperate the rows by r aliquots
    rStart <-0
    rEnd <- 0
    for (i in 1:r){
    
        #set the start row number and end row number
        rStart = rEnd+1
        
        #skip one "i" if the rStart is already in the i+1 or i+2 or ...scope.
        if (rStart> i*nr) next

        if (i == r){
            if(rStart<=len ) rEnd <- len else next
        }else{
            rEnd <- i*nr
        }

        # set the Recency score
        score[rStart:rEnd]<- r-i+1

        # make sure the customer who have the same recency have the same score
        s <- rEnd+1
        if(i<r & s <= len){
            for(u in s: len){
                if(df[rEnd,column]==df[u,column]){
                    score[u]<- r-i+1
                    rEnd <- u
                }else{
                    break;
                }
            }
            
        }

    }

}
    return(score)

} #end of function Scoring

################################################################################
# Function
#   getIndependentScore(df,r=5,f=5,m=5)
#
# Description
#   Scoring the Recency, Frequency, and Monetary in r, f, and m in aliquots independently
#
# Arguments
#   df - A data frame returned by the function of getDataFrame
#   r -  The highest point of Recency
#   f -  The highest point of Frequency
#   m -  The highest point of Monetary
#
# Return Value
#   Returns a new data frame with four new columns of "R_Score","F_Score","M_Score", and "Total_Score".
#################################################################################

getIndependentScore <- function(df,r=5,f=5,m=5) {

if (r<=0 || f<=0 || m<=0) return

#order and the score
df <- df[order(df$Recency,-df$Frequency,-df$Monetary),]
R_Score <- scoring(df,"Recency",r)
df <- cbind(df, R_Score)

df <- df[order(-df$Frequency,df$Recency,-df$Monetary),]
F_Score <- scoring(df,"Frequency",f)
df <- cbind(df, F_Score)

df <- df[order(-df$Monetary,df$Recency,-df$Frequency),]
M_Score <- scoring(df,"Monetary",m)
df <- cbind(df, M_Score)

#order the dataframe by R_Score, F_Score, and M_Score desc
df <- df[order(-df$R_Score,-df$F_Score,-df$M_Score),]

# caculate the total score
Total_Score <- c(100*df$R_Score + 10*df$F_Score+df$M_Score)

df <- cbind(df,Total_Score)

return (df)

} # end of function getIndependentScore



df1
df1 <-getIndependentScore(df)
head(df1[-(2:3)])
Mar 4, 2015 at 9:40 AM
Edited Mar 4, 2015 at 9:41 AM
Let me know for any details you require
Mar 5, 2015 at 2:06 PM
Edited Mar 5, 2015 at 2:09 PM
Hi Nidhi,

Thanks for your reply! Really impressive custom script! One thing that could be changing the results would be the SortBy functionality that MSTR uses when passing data to and from R. If you don't enable Sort By in deployR() and designate a specific field, the deployR() utility assigns the first parameter input as the Sort By. Changing the order of parameters would then change the way the rows are sorted and input into R. There is some good documentation on this in the User Guide.

One thing that I've found helpful is to create a synthetic row index as a unique 'key', by creating Metric 1: Max(1) and then Metric 2: RunningSum(Metric 1). I then use that as the SortBy for inputting into R. You might be able to use CustomerID for the same purpose.

I also noticed quite a few order by statements towards the end of your R code when you're appending the scores. Could one of these be changing the order of the entire data frame? Also, do you need to return the entire df or can you just get a vector for Total Score, or one metric each for the four return variables? I've had luck in splitting out results like that.

I find I get the best results when my return statement to MSTR is the last thing in my R script. I'm not sure if the last three lines are hurting anything, but you might want to take them out if they're extraneous to running the code in MSTR.

One last bit of advice would be to create metrics to test your input/output assumptions explicitly. For example, you could create a simple metric that returns the length of a vector, or the class of an object, or put one of your functions into a separate metric, and make sure that what you think is happening when MSTR runs the code is actually what's happening. This is the only real way that I've found to look under the hood and take a peek at what is going on when MSTR talks to and listens to R. It's time-consuming to debug this way, but keep in mind the great thing about this R Integration is that when you do get it working -- you've just created a powerful, automated way to score your entire customer base! :)

I hope these help--please let me know what you find out!

Erik