Marketing-QA159

Marketing-QA159 Online Services

 

1. Business Background

 

About Channel Marketing Solutions (CMS)
 

Channel Marketing Solutions (CMS) provides to national and regional brands an online channel-marketing automation platform to support the creation, execution and tracking of marketing programs through local partners/retailers.

 

CMS clients typically are national and regional brands who leverage local partners to create local awareness about the brand and increase product sales.
 

Local partners typically are small, independent and usually family owned small businesses selling, exclusively or not, the brand products. Some local partners, however, can be mid size business owning multiple local stores.

 

CMS channel-marketing automation platform provides online tools for brands and their local partners to solve the inherent problems with traditional local channel marketing programs
 

  • Lack of channel insights. A channel marketing program could have hundreds or thousands of partners each executing multiple independent marketing tactics during a period of time. This decentralization has been one of the primary reasons why traditionally getting insights out of a channel programs is so complicated.
  • Unknown marketing spend. Because of the previous, brands do not have a clear understanding on their marketing spend through channel programs.
  • Low partner participation. Most local partners are small, independent and usually family owned stores who usually are more concern about running their business than promoting the brand.
  • Poor local marketing execution. Local partners, in most cases, are not marketing experts, so their marketing development funds are often sub utilized in marketing tactics with poor ROI.
  • Brand compliance and standardization. Local partners usually don’t comply to  the brand marketing guidelines, resulting on tactics with distorted or conflicting message.


 

Through CMS platform, brands can manage their channel programs, standardize and enforce brand compliance, offer a comprehensive set of marketing tactics across different media, provide marketing development funds, consolidate tactic execution and track the marketing spend and results of their programs.
 

CMS’ goal is to simplify local channel marketing execution, optimize marketing spend and accelerate local channel sales, while making it easy for local partners to market the brand, products or services.

​​ 

The Problem

 

As stated previously, local partners are small businesses usually more concerned about running their businesses than participate on the brad’s channel programs. To incentivize participation brands provide incentives in the form of marketing development funds (MDF), that local partners can use to pay for their marketing tactic
 

While this is very effective to engage local partners, the problem with this approach is that in most cases those funds are sub utilized and in other cases not utilized at all. Some of the reasons for this behavior are

  • Local partners are not marketing experts.
  • Local partners have get used to run the same tactics all the time.
  • Local partners are more comfortable running traditional tactics (i.e. direct mail) than digital tactics because of previous experience.
  • Local partners do not know they have access to MDF.

CMS wants to provide recommendations to local partners using the platform to lower the barriers they face while deciding where, when and how to use and get the most of their MDF.

​​ 

Current Solution
 

The current solution in place involves CMS’ customer service department. Here “marketing assistants” provide suggestions either by phone or the company online support system to local partners.
 

The current solution in place helps to provide guidance to those local partners who proactively reach the customer service department, however, the solution is far from optimal because of the following reasons
 

  • There is a limited number of marketing assistants, so it is not a scalable solution.
  • Only a few local partners reach for marketing assistance.
  • Marketing assistants do not have decision making tools based on data.
  • Increases the cost operation for CMS and brands.

 

Business Objective
 

The primary goal of this project is to develop an automated recommendation system able to provide insightful suggestions to local partners on where, when and how they should spend their marketing development funds to obtain the best ROI.

​​ 

Secondary Goals
 

In addition to the primary goal, CMS has established the following secondary goals for this project

  1. Increase the platform value for brands by helping them to maximize their marketing spend.
  2. Increase the platform value for local partners, by providing insightful suggestions on how to improve their marketing efforts.
  3. Create a fast, reliable and scalable solution, able to provide suggestion to the thousands of local partners currently using the platform.
  4. Lower the customer service cost for brands.
  5. Lower the operational cost for CMS.

 
Success Criteria
 
The following criteria will be used to measure the success of the implemented solution

     

  • Increase the return of investment for local partners by 10%.
  • Increase the adoption of digital tactics by 10%.
  • Decrease the amount of unused MDF to 25%.
  • Reduce the time the customer service department spends answering calls or tickets related to marketing spend by 50%.
  • Increase the platform’s Net Promoter Score (NPS) by 10 points.

 

Resources
 
Personnel
 
The data mining project is an initiative of the Executive Team to maintain a competitive advantage over other channel-marketing platforms.
 
The owner of the project is the Product Manager, who is responsible to lead, monitor and ensure the goals for this project are met.

 

A cross functional team, including members of the IT, Client Management, Customer Service, Media Services and Finance function as stakeholders for the project. This team works closely with the Product Manager on the development of the solution.
 
A group comprised of Data Analysts, BI Developers and Software Developers will be in charge to to collect, prepare and process data to build and validate models that can be used to solve the problem statement.
 
The provision of resources and management of the infrastructure required for this project is responsibility of the DevOps team.
 
You can read more about our case study assignment help services here.
 

How it Works

How It works ?

Step 1:- Click on Submit your Assignment here or shown in left side corner of every page and fill the quotation form with all the details. In the comment section, please mention Case Id mentioned in end of every Q&A Page. You can also send us your details through our email id support@assignmentconsultancy.com with Case Id in the email body. Case Id is essential to locate your questions so please mentioned that in your email or submit your quotes form comment section.

Step 2:- While filling submit your quotes form please fill all details like deadline date, expected budget, topic , your comments in addition to Case Id . The date is asked to provide deadline.

Step 3:- Once we received your assignments through submit your quotes form or email, we will review the Questions and notify our price through our email id. Kindly ensure that our email id assignmentconsultancy.help@gmail.com and support@assignmentconcultancy.com must not go into your spam folders. We request you to provide your expected budget as it will help us in negotiating with our experts.

Step 4:- Once you agreed with our price, kindly pay by clicking on Pay Now and please ensure that while entering your credit card details for making payment, it must be done correctly and address should be your credit card billing address. You can also request for invoice to our live chat representatives.

Step 5:- Once we received the payment we will notify through our email and will deliver the Q&A solution through mail as per agreed upon deadline.

Step 6:-You can also call us in our phone no. as given in the top of the home page or chat with our customer service representatives by clicking on chat now given in the bottom right corner.

Case Approach

Scientific Methodology

We use best scientific approach to solve case study as recommended and designed by best professors and experts in the World. The approach followed by our experts are given below:

Defining Problem

The first step in solving any case study analysis is to define its problem carefully. In order to do this step, our experts read the case two three times so as to define problem carefully and accurately. This step acts as a base and help in building the structure in next steps.

Structure Definition

The second step is to define structure to solve the case. Different cases has different requirements and so as the structure. Our experts understand this and follow student;s university guidelines to come out with best structure so that student will receive best mark for the same.

Research and Analysis

This is the most important step which actually defines the strength of any case analysis. In order to provide best case analysis, our experts not only refer case materials but also outside materials if required to come out with best analysis for the case.

Conclusion & Recommendations

A weak conclusion or recommendations spoil the entire case analysis. Our expert know this and always provide good chunks of volume for this part so that instructors will see the effort put by students in arriving at solution so as to provide best mark.

Related Services


 
Personnel Resources
 
The following table shows the personnel resources considered for the development and first two of operation of the project.

 

Resource Year 1 Year 2 Year 3
Project Leader 1 0 0
Project Manager 1 0 0
Data Miner 1 1 1
Data Expert 1 1 0
BI Developers 4 2 0
Software Developers 4 2 1
Dev Ops Engineers 2 1 1

 

Internal Data Repositories
 
The CMS platform leverages two main data repositories to keep track of the transactions happening in the platform as well as provide descriptive analytics to local partners and brands. The two main repositories are described below.

 

CMS Transactional DB
 
CMS transactional database uses a Relational Database to keep a record of all transactions performed through the platform. More specifically this database contains information about

     

  • The brand
  • i>The brand’s marketing programs

  • The brand’s marketing message/design template
  • The brand’s local partners
  • The local partner’s marketing tactics
  • The local partner’s MDF

 

CMS Data Warehouse
 
The data warehouse consolidates the tracking data for all the local partner marketing tactics executed through the platform. Marketing performance metrics, from different marketing service providers, are retrieved, processed and loaded into the data warehouse every day. These metrics are tied to each marketing tactic to provide an overview of the performance of such tactics to both local partners and brads
 
The metrics collected depend on the media used for a particular tactic. Some of the most common metrics include

     

  • Number of calls (call tracking for traditional media)
  • Number of impressions
  • Number of views
  • Number of conversions
  • Total reach

 
External Data Repositories
 
In addition to the internal data sources, CMS might leverage data repositories managed by third parties. The external repositories include
 

The Brand’s System
 
Brands have their own partner management systems, where they keep track of their network of local partners. Usually these management systems have more robust information about the local partner profile and sales information.
 
In most cases brands share, through simple data pipes, the information stored  on their systems with CMS, however there could be exceptions when brands are not able to integrate with the available data pipes or they prefer to maintain this information confidential.

 

Local Partner POS
 
These are the different Point of Sale systems used by local partners. Given the many different systems in use it is not currently possible for CMS to access data from these POS consistently and reliably.Infrastructure
 
The CMS platform is a SaaS using cloud services and infrastructure to support its operations. The infrastructure is managed by a dedicated team who is in charge of provisioning, configuring, deploying and monitoring these cloud services.

 

Software
 
Open source software is largely used through the organization. The CMS platform handles mission critical operations using open source technologies such as

     

  • PostgreSQL
  • MongoDB
  • Apache Spark
  • PHP
  • Java

 

The platform currently offers embedded descriptive analytics through Looker and a dedicated data warehouse running on PostgreSQL. Internal power users and brands have access to a self-service analytics powered by the same technologies.
 
H2O has been the tool selected to support the deployment of predictive analytics. There are no official releases for predictive analytics however CMS has developed several POCs to integrate H2O into the platform’s architecture
 

Requirements, Assumptions & Constraint

 
Requirements

     

  1. In order to maintain CMS competitive advantage a functional proof of concept should be completed by the end of 2017. The initial POC will be further improved in subsequent iterations using the feedback provided by beta users.
  2. The release date for the final product is expected by the end of Q1 2018.
  3. The product will be used by local partners, so it should provide timely and relevant suggestions.
  4. The product  should be

 

 

Assumptions

     

  1. To ensure the solution works across different verticals and brands, local partners with different profiles and under different brands will be selected as beta users.
  2. The solution will be able to provide suggestions for those local partners where CMS has access to their profile and sales data.
  3. Local partners will receive suggestions on their marketing spend through the platform.
  4. Different models might be used to accommodate different vertical or brands.

 
Constraints

     

  1. Local partner’s profile and sales data might not be available for some brands using the platform.
  2. The solution should not offer duplicate suggestions. For example it should not suggest to run a TV ad, when there is a TV ad already running.
  3. The solution must use the current platform architecture and infrastructure
  4. Response time is critical, therefore the final product must be able to provide suggestions to local partners almost instantaneously.
  5. The product must handle the current platform traffic and seamlessly scale if required.
  6. The product must adhere to CMS security and privacy policies.
     
    Risks & Contingencies
     
    The following risks and corresponding contingencies have been identified for this project
     

    1. CMS’s team lacks experience on data mining projects

     
    CMS has decided to follow the CRISP-MD model, a proven and popular blueprint for data mining projects to reduce the risk associated with this project including the company’s lack of previous experience with data mining projects.

    1. Not all brands are similar

    CMS platform is used by brands in different verticals, with different budget sizes, with very diverse local partners. The final solution will require to generate dedicated models using relevant data for each brand.

    To simplify the problem scope, the initial POC will focus in one single brand. The learnings of this POC will be used to develop a process to cater custom models for different types of brands.

    1. Brands cannot share profile data due technical constraints

    CMS team will work with brands to provide a robust set of data pipes that allows brands to share data by different methods (text files, data streams, web services, etc).

    The brand selected for the POC will be one of the brands already sharing partner’s profile data with CMS.

    1. Brands cannot share profile data due contractual reasons

    Where possible, a substitute profile will be agreed upon on. In cases where this is not possible the feature will not be activated for these brands and their local partners.

    1. Not enough performance data for some tactics

    Through CMS platform, brands made available to their network of local partners a large catalog of marketing tactics. Some of these tactics are widely used by the network, while other are not.

    To tackle this situation two sets of models will be created, one including the entire data set, while the other will have only the most popular items. During the validation phase the team will determine what models provide the best results.

    A more detailed list of risks and contingencies for this project are listed under the Project Plan section in this document.

     

    Terminology

     
    Business Terminology

       

    • Brand

    A national brand who leverage local partners to create local awareness about the brand and increase product sales.

    • Channel Marketing

    Channel marketing involves finding new partners to help transfer goods from producers to consumers.

    • Local Partner

    A small, independent and usually family owned small businesses selling, exclusively or not, the brand products or services.

    • Local Partner Profile

    A set of attributes, defined and maintained by the brand, used to categorize and sometimes identify the local partner lifetime value.

    • Marketing Budget

    The annual budget allocated by the brand to execute channel marketing programs.

    • Marketing Program

    A collection of marketing tactics following a preset schedule or triggered in response to predefined events. Marketing programs are defined by brands and they are used by local partners to acquire new customers and/or keep existing ones.

    • Marketing Spend

    The monetary amount used to pay for the execution of marketing tactics.

    • Marketing Tactic

    The method used to promote the goods and services of a brand with the goal of increasing sales and maintaining a competitive product.

    • Marketing Development Funds

    Market development funds or MDF are used in an indirect sales channel where funds are made available by a manufacturer or brand to help affiliates, channel partners, resellers, VARs, or distributors, etc. sell its products and create local awareness about the national brand.

    • Media Type

    All the modes of advertisement that are used to reach out to the consumer are called media channels, e.g., print media, radio, television, and internet.

    • Tactic Cost

    The tactic cost is the total amount payed for the execution of a marketing tactic.

    • Tactic Creative (Template)

    A template is a predefined creative design or message provided by brands to their local partners. These templates can be customized with localized messages to fine tune the tactic to the local market.

    • Tactic Performance

    A series of metrics providing insights on the execution of the marketing tactic. Performance metrics are used to measure the effectiveness and the ROI of the marketing tactic.

     

    Marketing Performance Terminology

       

    • Clickthrough Rate (CTR)

    CTR is measured by calculating the number of clicks PPC ads receive based on the total number of impressions served. The higher the CTR the lower PPC costs are.

    • Cost Per View (CPV)

    The Cost Per View (CPV) is measured by calculating the number of views video ads receive based on the total cost of the ads.

    • Cost Per Click (CPC)

    The Cost Per Click (CPC) is measured by calculating the number of clicks PPC ads receive based on the total cost of the ads.

    • Pay Per Call (PPC)

    The Pay Per Call (PPC) is measured by calculating the number of calls ads receive based on the total cost of the ads.

    • Cost Per Lead (CPL)

    CPL defines the lead conversion ratio of a particular marketing tactic and corresponding cost, giving insights to the business owner or marketer on how profitable their tactic is.

    • Conversion Rate (CVR)

    This is the percentage of users who take the desired action after viewing an ad.

    • Return On Investment (ROI)

    This metric is measured by the total marketing cost that results in the conversion into new paying customers, or leads.

     

    Technical Terminology

    • Affinity Index

    The affinity index is an indicator in media that shows the relative weight of a target audience compared to the total population for an specific program or tactic.

    • Data Modeling

    Data modeling the process of creating a data model for an information system by applying data mining techniques techniques.

    • Classification

    A data modeling process that attempts to predict, for each individual in a population, to which class does this individual belongs to.

    • Clustering

    A data modeling process that attempts to group a set of individuals in such a way that individuals in the same group are more similar (in some sense or another) to each other than to those in other groups (clusters).

     

    Cost &Benefits
     
    The following table lists the costs and benefits associated with this project, including the first three years of operations
     

    YEAR 1 2 3
    Benefits
    Customer support reduction $100,000 $250,000 $350,000
    Increase transactions in platform $700,000 $1,500,000 $2,000,000
    Intangible benefits $0 $500,000 $500,000
    Total Benefits $800,000 $2,250,000 $2,850,000
    Costs
    Development $915,000 $0 $0
    Operational $515,000 $550,000 $475,000
    Software & Equipment $50,000 $55,000 $60,000
    Training $100,000 $75,000 $50,000
    Total Cost $1,930,000 $680,000 $585,000
    Cost-Benefit
    Discount Factor (15% p.a.) 100% 87% 76%
    PV Benefits $800,000 $1,957,500 $2,166,000
    PV Costs ($1,930,000) ($591,600) ($444,600)
    Net PV (Benefits+Costs) ($1,130,000) $1,365,900 $1,721,400
    Cumulative PV Benefits $800,000 $2,757,500 $4,923,500
    Cumulative PV Costs ($1,930,000) ($2,521,600) ($2,966,200)
    Cumulative Net PV ($1,130,000) $235,900 $1,957,30

     

    Data Mining Goals
     
    The main goal is to leverage the data collected in the CMS platform to generate a data model capable of accurately calculate the affinity score between a local partner and top performing marketing tactics that would result on the best ROI for the brand’s marketing spend.

     

    Data Mining Process
     
    The solution requires to segment a brand’s network of local partners into groups who share similar profiles using clustering algorithms.

    The resulting groups in combination with the local partners profiles are used to create a model that can predict the group for a new local partner joining the platform. For this a classification algorithm will be used.

    Historic tactic performance data is used to build a list of top performance tactics that have been used by the local partners groups in order to generate an affinity score between the local partner groups and marketing tactics using an association mining rules algorithm.

    The later model is used to recommend in real time marketing tactics to local partners based on the performance of such tactics by other partners in the same group.
     
    Data Mining Success Criteria

       

    • The target number of groups (clusters) resulting of applying the cluster algorithm to the local partner population is between four to eight.
    • The target purity of the clusters is 80%. Additional techniques using contingency tables will be applied to further evaluate the clusters.
    • The target accuracy for the classification algorithm in charge of determine the group (cluster) of ner local partners is 85%. Additionally RMSE and Gain & Lift charts will be used to further evaluate the model.

     

    Project Plan
     
    The Channel Marketing Solutions (CMS) Project Plan will provide a definition of the project, including the project’s goals and objectives. Additionally, the Plan will serve as an agreement between the following parties: Project Sponsor, Steering Committee, Project Manager, Project Team, and other personnel associated with and/or affected by the project.
     
    The Project Plan defines the following

       

    • Project purpose
    • Business and project goals and objectives
    • Scope and expectations
    • Roles and responsibilities
    • Assumptions and constraints
    • Project management approach
    • Ground rules for the project
    • Project budget
    • Project timeline
    • The conceptual design of new technology

     

    Project Approach
     
    The project will be rolled out in a phased approach, as listed below
     

    • Phase I:     Assessment
    • Phase II:    Data Mining, Analysis, and Testing
    • P hase III:    Strategy and System Implementation
    • Phase IV:    Training and Education

     
    Phase I: Assessment
     
    First, CMS will utilize resources from internal and external data repositories to understand and assess the current data environment for existing brands.
     
    Internal repositories include the following

       

    • CMS Transactional Database
    • CMS Data Warehouse

     
    External repositories include the following

       

    • The Brand’s System
    • Local Partner POS

     
    Phase II: Data Mining, Analysis, and Testing
     
    CMS will research and utilize a variety of data modeling techniques and strategies and apply to existing brand platforms. The new initiatives will be tested and the outputs will be analyzed according to different metrics or success criteria as defined in the business plan.

     

    Phase III: Strategy and System Implementation
     
    Successful strategies will be implemented into the proof of concept, or POC. Metrics will be pulled from the working dataset to confirm success criteria, and a visualization to showcase the effective information output will be created.

     

    Phase IV: Training and Education
     
    CMS will train and educate brands on the implemented solution and fill in any information gaps on how new data and marketing concepts provide effective and potentially cost-saving recommendations.
     
    ​​

    Goals And Objectives

     
    Business Goals and Objectives
     
    The business goals and objectives for this project will focus on developing an automated recommendation system able to provide insightful suggestions to local partners on where, when and how they should spend their marketing development funds to obtain the best ROI.

     

    Project Goals and Objectives
     
    In addition to the primary goal, CMS has established the following secondary goals for this project

       

    1. Increase the platform value for brands by helping them to maximize their marketing spend.
    2. Increase the platform value for local partners, by providing insightful suggestions on how to improve their marketing efforts.
    3. Create a fast, reliable and scalable solution, able to provide suggestion to the thousands of local partners currently using the platform.
    4. Lower the customer service cost for brands.
    5. Lower the operational cost for CMS.

     
    Project Scope
     
    Scope Definition
     
    The Project will incorporate effective data modeling techniques to the local partners’ current platforms to provide directional data and a scalable solution to optimize their marketing dollars.
     
    A complete and functional proof of concept will be developed as the deliverable to optimize the current marketing platform.

     

    Items Beyond Scope
     
    The project does not include the following

       

    • Acquisition of new technology or infrastructure
    • Updates to existing marketing tactics

     
    Projected Budget
     
    The table below outlines the cost information and projected budget associated with the project, including the first 3 years of operations.

     

    Costs Year 1 Year 2 Year 3
    Development $915,000 $0 $0
    Operational $515,000 $550,000 $475,000
    Software & Equipment $50,000 $55,000 $60,000
    Training $100,000 $75,000 $50,000
    Total Budget $1,930,000 $680,000 $585,000

     

    Risk Assessment
     
    The initial Risk Assessment (following page) attempts to identify, characterize, prioritize and document a mitigation approach relative to those risks, which can be identified prior to the start of the project.

    The Risk Assessment will be continuously monitored and updated throughout the life of the project, and open to amendment by the Product Manager.

    Because mitigation approaches must be agreed upon by project leadership (based on the assessed impact of the risk, the project’s ability to accept the risk, and the feasibility of mitigating the risk), it is necessary to allocate time into each Steering Committee meeting, dedicated to identifying new risks and discussing mitigation strategies.

    The Product Manager will convey amendments and recommended contingencies to the Steering Committee monthly, or more frequently, as conditions may warrant.
     
    Initial Project Risk Assessment

     

    Risk Risk Level

    L/M/H

    Likelihood of Event Mitigation Strategy
    Project Size
    Estimated Project Schedule H: 3 months Certainty Created comprehensive project timeline with frequent baseline reviews
    Project Definition
    Narrow knowledge level of users M: Knowledgeable of user area only Likely Assigned Project Manager(s) to assess global implications
    Project Scope Creep L: Scope generally defined, subject to revision Unlikely Scope initially defined in project plan, reviewed monthly by Project Manager and Steering Committee to prevent undetected scope creep
    CMS project deliverables unclear L: Well defined Unlikely Included in project plan, subject to amendment
    Cost estimates unrealistic L: Thoroughly discussed with local partners Unlikely Included in project plan, subject to amendment as new details regarding project scope are revealed
    Timeline estimates unrealistic M: Timeline assumes no derailment Somewhat likely Timeline reviewed monthly by three groups (Product Manager and Steering Committee) to prevent undetected timeline departures
    Local partners not well versed in marketing strategies L: Team well versed in business operations impacted by technology Unlikely Product Manager and project team members to identify knowledge gaps and provide education and training, as necessary
    Project Leadership
    Steering Committee existence L: Identified and enthusiastic Unlikely Frequently seek feedback to ensure continued support
    Absence of commitment level/Attitude of management L: Understands value & supports project Unlikely Frequently seek feedback to ensure continued support
    Absence of commitment level/Attitude of users L: Understands value & supports project Unlikely Frequently seek feedback to ensure continued support
    Project Team Availability M: Distributed team makes availability questionable Somewhat likely Continuous review of project momentum by all levels. Consultant to identify any impacts caused by unavailability. If necessary, increase commitment by participants to full time status
    Physical location of team prevents effective management M: Team is dispersed among several sites Likely Use of Intranet project website, comprehensive Communications Plan
    Number of Times Team Has Done Prior Work with Partners Creates Foreign Relationship L: Existing local partners Unlikely The POC is to provide enhancements based on the platforms of existing local partners
    Team lack experience on data mining projects L: Conceptual understanding; Following CRISP-DM model Somewhat likely CMS has decided to follow the CRISP-MD model, a proven and popular blueprint for data mining projects to reduce the risk associated with this project including the company’s lack of previous experience with data mining projects.
    Not all brands are similar H: CMS has diverse local partners Certainty The initial POC will focus in one single brand. The learnings of this POC will be used to develop a process to cater custom models for different types of brands.
    Brands cannot share profile data due technical constraints M: Brands will use different data methods Certainty The brand selected for the POC will be one of the brands already sharing partner’s profile data with CMS.
    Brands cannot share profile data due contractual reasons L: Understanding of contract restrictions by both parties Unlikely Where possible, a substitute profile will be agreed upon on. In cases where this is not possible the feature will not be activated for these brands and their local partners.
    Not enough performance data for some tactics M: Varies Somewhat likely Through CMS platform, brands made available to their network of local partners a large catalog of marketing tactics. Some of these tactics are widely used by the network, while other are not.

     

    Project Management Approach
     
    Project Roles and Responsibilities

     

    Role Responsibilities
    Project Sponsor
    • Ultimate decision-maker and tie-breaker
    • Provide project oversight and guidance
    • Review/approve some project elements
    Steering Committee
    • A cross functional team includes members of IT, Client Management, Customer Service, Media Services and Finance
    • Commits department resources
    • Approves major funding and resource allocation strategies, and significant changes to funding/resource allocation
    • Resolves conflicts and issues
    • Provides direction and feedback to the Product Manager around solution development
    • Review project deliverables
    Project Manager
    • Lead, monitor and ensure the goals for this project are met
    • Serves as liaison to the Steering Committee
    • Receive guidance from Steering Committee
    • Supervises project team
    • Provide overall project direction
    • Handle problem resolution
    • Manages the project budget
    Project Team:

    Data Analysts
    BI Developers
    Software Developers

    • Understand the user needs and business processes of their area
    • Communicate project goals, status and progress throughout the project to personnel in their area
    • Collect, prepare, and process data
    • Provide knowledge and recommendations
    • Helps identify and remove project barriers
    • Assure quality of products that will meet the project goals and objectives
    • Identify risks and issues and help in resolutions
    DevOps Engineers
    • Provision resources and management of the infrastructure required for this project

     
    Issue Management
     
    The information contained within the Project Plan will likely change as the project progresses. While change is both certain and required, it is important to note that any changes to the Project Plan will impact at least one of three critical success factors: Available Time, Available Resources (Financial, Personnel), or Project Quality. The decision by which to make modifications to the Project Plan (including project scope and resources) should be coordinated using the following process

       

    • Step 1:  As soon as a change which impacts project scope, schedule, staffing or spending is identified, the Project Manager will document the issue.
    • Step 2: The Project Manager will review the change and determine the associated impact to the project and will forward the issue, along with a recommendation, to the Steering Committee for review and decision.
    • Step 3: Upon receipt, the Steering Committee should reach a consensus opinion on whether to approve, reject or modify the request based upon the information contained within the project website, the Project Manager’s recommendation and their own judgment. Should the Steering Committee be unable to reach consensus on the approval or denial of a change, the issue will be forwarded to the Project Sponsor, with a written summation of the issue, for ultimate resolution.
    • Step 4: If required under the decision matrix or due to a lack of consensus, the Project Sponsor shall review the issue(s) and render a final decision on the approval or denial of a change.
    • Step 5: Following an approval or denial (by the Steering Committee or Project Sponsor), the Project Manager will notify the original requestor of the action taken. There is no appeal process.

     

    2. Data Understanding

     

    Initial Data Collection
     
    The CMS platform leverages two main data repositories to keep track of the transactions happening in the platform as well as provide descriptive analytics to local partners and brands. An ERD of each internal data repository is displayed below.

     

    Transactional Database
     
    The transactional database is the main repository of transactional information such as marketing programs, and marketing tactics executed by local partners. This repository will be primarily used for getting programs and tactics that will be suggested to the local partners.
     
    An ERD diagram of the transactional DB can be found next
     
    Fig 1. CMS Transactional DB ERD

     

    Data Warehouse
     
    CMS’s data warehouse is used to aggregate performance and tracking data provided by third party service providers such as USPS, SalesForce Marketing Cloud, Google, Bing, Yelp, YellowPages, Facebook, Twitter, among others. This repository will be used to gather performance information on the tactics and utilize that to identify the top-performing tactics.
     
    An ERD diagram of the data warehouse can be found next

    Fig 2. CMS Data Warehouse ERD
     
    External Sources
     
    In addition to the internal data sources, CMS might leerage data repositories managed by third parties. The external repositories include the brand’s system and local partner POS. A brand’s system stores useful data like sales information or local partner profiles, as seen below as a CSV file.

     

     

    The information gathered through these sources is then extracted into a single file type, either Excel or CSV format (figure 3), where data is then scrubbed and massaged to serve as the baseline for data mining techniques like data classification or clustering.
     
    The cleansed and processed data is loaded into the data warehouse from where it will be incorporated by descriptive and predictive modeling techniques to gain insights and draw conclusions.

    ​​ 

    Data Description
     
    The data to do analysis using CRISP DM Model will be taken from historical datasets from transactional database. The data that is required for analysis to find out affinity score between a local partner and top performing marketing tactics that would result on the best ROI for the brand’s marketing spend are as follows:

    • Local Partner’s Demographic and Profile Information
    • Local Partner’s Products
    • Target Market Segment
    • Sales figure Before Branding
    • Sales After Branding
    • Marketing Tactics Used
    • Advertising Budgets

     

    All the data will be collected in CSV format so that it will help in data exploration and examination to improve its quality. The above data will be analyzed to find out the marketing tactics that provide maximum return to the local brand partners. The data described above will help in meeting the required objectives as the local partner’s profile, demographic, products and target segment data help in clustering the similar profiles for local partner whereas sales figure before and after run of marketing tactics help in calculating the improvement due to solutions provided by CMS. The marketing tactics used data help in finding the impact of various marketing solutions on local partners. Additionally, cluster analysis will be performed to cluster local brand partners sharing common profiles and similar results so that algorithm will predict similar marketing tactics for new local brand customers looking for marketing solutions. The flowchart showing the entire process is shown below

     

    Data Exploration
     
    The next step in this section is Data Exploration. In this stage we will do analysis to find out whether all the profiles and transactions meet our eligibility requirements or not. We will consider only those profiles and transactions and analysis which meets this eligibility criteria. The first important criteria that we will consider is that Local partner must have monthly budget greater than $1000, anything less than that will not be considered for analysis. Besides these we will consider only those transactions where amount spent is greater than 1000 i.e. any campaign with spent less than $100 will not be considered for analysis. The data will be then clustered using K-mean algorithm as it is deployed widely because of its ability to identify different clusters in data set. Finally, we will use Decision trees, neural networks, Logistic regression and polynomial stepwise and support vector machines for detailed modeling

     
    Data Verification & Quality
     
    Once we explore the data, the next step is to check the data for errors and inconsistencies. In this stage we will do the following checks to ensure accuracy, completeness and reliability of data.

       

    • To check whether all fields in performance data sets are filled or not. In case of any empty field, it must be filled with average performance data for that particular field.
    • To check for missing profile’s data for local partner and in case of any missing data it must be filled with its competitor’s dataset for similar field.
    • To check for outliers in transactions i.e. any inconsistencies like sudden increase in budget etc.  It will be rechecked to ensure its accuracy.

     
    Once all the checks are done we will check whether the data sets are proper and cover the required no. of transactions and cover enough period for analysis or not. This will help to ensure that the analysis is done for the relevant periods with enough no. of transactions and profiles.

     

    3​ ​–​ ​Data​ ​Preparation 

     

    Files extracted from the database will be utilized as directional data for the demo. Specifically, a profiles file will outline the customer accounts and their associated spend. This will be coupled with the transaction data file, which tracks customers’ specific branding, media, and the associated program, along with other cost information and descriptive data. The association will be the Account ID in the profiles data file and the Partner ID in the transactions data file. The integration of these data sets will create the c onnection to appropriately track the effectiveness of different marketing tactics utilized for different brands and customers.

     

    Profiles CSV

     

     

    Transaction CSV

     

    Data will be reviewed for accuracy and completeness, ensuring all accounts include their associated transaction line items. For the purpose of the demo, a sampling of customers will be utilized rather than total population. A random sampling method will be used as part of this effort. Number and text formats will be screened for consistency – formatting for dates, converting decimals to percentages as appropriate, numeric texts converted to number, and so forth. The transaction file also includes order statuses of Cancelled, Complete, and Production. Any cancelled orders will be removed prior to sampling to ensure we are purely looking at a complete dataset of launched programs. Essentially, the two data files will be merged, linked by the Account ID/Partner ID. The new file will retain all metadata and be ordered by Account/Partner and then by Brand.

     

    3.3. ​ ​Data​ ​Cleansing

     

    Both​ ​datasets​ ​are​ ​reviewed​ ​for​ ​accuracy​ ​and​ ​completeness, ​ ​ensuring​ ​all​ ​accounts​ ​include their​ ​associated​ ​transaction​ ​line​ ​items. ​ ​For​ ​the​ ​purpose​ ​of​ ​the​ ​proof​ ​of​ ​concept, ​ ​a​ ​sampling of​ ​customers​ ​is​ ​utilized​ ​rather​ ​than​ ​total​ ​population. ​ ​A​ ​random​ ​sampling​ ​method​ ​is​ ​used​ ​as part​ ​of​ ​this​ ​effort.

     
    Number​ ​and​ ​text​ ​formats​ ​are​ ​screened​ ​for​ ​consistency​ ​–​ ​formatting​ ​for​ ​dates, ​ ​converting decimals​ ​to​ ​percentages​ ​as​ ​appropriate, ​ ​numeric​ ​texts​ ​converted​ ​to​ ​number, ​ ​and​ ​so​ ​forth.  Date​ ​fields​ ​are​ ​broken​ ​down​ ​into​ ​year, ​ ​month, ​ ​day, ​ ​day​ ​of​ ​the​ ​week, ​ ​day​ ​of​ ​the​ ​month​ ​and hour​ ​of​ ​the​ ​day.

     

    The​ ​transaction​ ​file​ ​also​ ​includes​ ​order​ ​statuses​ ​of​ ​Cancelled, ​ ​Complete, ​ ​and​ ​Production.  Any​ ​cancelled​ ​orders​ ​will​ ​be​ ​removed​ ​prior​ ​to​ ​sampling​ ​to​ ​ensure​ ​we​ ​are​ ​purely​ ​looking​ ​at a​ ​complete​ ​dataset​ ​of​ ​launched​ ​programs.

     
    Essentially, ​ ​the​ ​two​ ​data​ ​files​ ​will​ ​be​ ​merged, ​​linked​ ​by​ ​the​ ​Account​ ​ID/Partner​ ​ID.​ ​The​ ​new file​ ​will​ ​retain​ ​all​ ​metadata​ ​and​ ​be​ ​ordered​ ​by​ ​Account/Partner​ ​and​ ​then​ ​by​ ​Brand.

     

    4.​ ​Modeling

     
    The​ ​data​ ​mining​ ​solution​ ​would​ ​require​ ​the​ ​following​ ​modeling​ ​techniques:

     

    1. Clustering
    2. Classification
    3. Association​ ​Rules

     

    4.1.​ ​Clustering

     

    Clustering​ ​is​ ​unsupervised​ ​learning​ ​technique​ ​used​ ​to​ ​analyze​ ​a​ ​population​ ​and​ ​divide​ ​into  a​ ​number​ ​of​ ​groups​ ​where​ ​points​ ​are​ ​more​ ​similar​ ​to​ ​other​ ​points​ ​in​ ​the​ ​same​ ​group  rather​ ​than​ ​point​ ​in​ ​other​ ​groups.

     

    Two​ ​well​ ​known​ ​hard​ ​clustering​ ​algorithms​ ​(K-means​ ​and​ ​Hierarchical​ ​Clustering)​ ​are​ ​used  to​ ​analyze​ ​the​ ​profile​ ​dataset​ ​to​ ​find​ ​commonalities​ ​between​ ​local​ ​partners​ ​and​ ​group  them​ ​into​ ​well​ ​defined​ ​groups.

     

    Both​ ​clustering​ ​algorithms​ ​have​ ​a​ ​target​ ​of​ ​five​ ​groups,​ ​as​ ​specified​ ​by​ ​Channel​ ​Marketing  Solutions.

     

    4.1.1​ ​K-Means

     

    This​ ​is​ ​a​ ​very​ ​popular​ ​algorithm​ ​because​ ​its​ ​simplicity​ ​and​ ​effectiveness. ​​The​ ​algorithm expects​ ​a​ ​target​ ​number​ ​of​ ​clusters​ ​(k)​ ​as​ ​input​ ​and​ ​the​ ​training​ ​data.​​The​ ​entire​ ​partner  profile​ ​dataset,​ ​containing​ ​only​ ​real-valued​ ​features,​ ​​ ​is​ ​used​ ​as​ ​training​ ​data.

     

    At​ ​runtime, ​​k-means​ ​selects​ ​k​ ​random​ ​points​ ​as​ ​centroids​ ​of​ ​the​ ​clusters. ​Then​ ​repeats​ ​the  following​ ​two​ ​steps​ ​iteratively:

     

    1. Each​ ​point​ ​in​ ​the​ ​dataset​ ​is​ ​assigned​ ​to​ ​a​ ​cluster​ ​represented​ ​by​ ​the​ ​closest centroid.

     

    1. For​ ​each​ ​cluster​ ​a​ ​new​ ​centroid​ ​is​ ​selected​ ​using​ ​the​ ​mean​ ​of​ ​all​ ​points​ ​in​ ​the cluster.

     

    One​ ​of​ ​the​ ​downsides​ ​of​ ​K-means​ ​is​ ​that​ ​it​ ​could​ ​create​ ​different​ ​clusters​ ​depending​ ​on​ ​the selection​ ​of​ ​the​ ​initial​ ​centroids. ​ ​For​ ​this​ ​reason​ ​the​ ​dataset​ ​is​ ​analyzed​ ​ten​ ​times​ ​using​ ​the K-means++​ ​variant, ​​which​ ​initially​ ​still​ ​selects​ ​random​ ​centroids​ ​however​ ​with​ ​probability  proportional​ ​to​ ​square​ ​distance​ ​from​ ​the​ ​previously​ ​assigned​ ​centroids.

     

    For​ ​each​ ​run, ​​the​ ​resulting​ ​clusters​ ​are​ ​inspected​ ​for​ ​compactness​ ​and​ ​separation. ​​The Silhouette​ ​Coefficient ​ ​is​ ​used​ ​to​ ​compare​ ​each​ ​of​ ​the​ ​ten​ ​models.

     

    2 4.1.2​ ​Hierarchical​ ​Agglomerative​ ​Clustering

     

    As​ ​the​ ​name​ ​suggests​ ​this​ ​algorithm​ ​builds​ ​a​ ​hierarchy​ ​of​ ​clusters​ ​either​ ​from​ ​top-down​ ​or  bottom-up.​ ​In​ ​the​ ​bottom-up​ ​approach​ ​each​ ​point​ ​in​ ​the​ ​dataset​ ​starts​ ​as​ ​a​ ​cluster,​ ​the​ ​two  nearest​ ​clusters​ ​are​ ​merged​ ​recursively​ ​until​ ​all​ ​clusters​ ​have​ ​been​ ​merged​ ​into​ ​one.

     

    The​ ​algorithm​ ​is​ ​trained​ ​using​ ​the​ ​complete​ ​partner​ ​profile​ ​dataset. ​ ​HAC​ ​does​ ​not​ ​require to​ ​specify​ ​the​ ​target​ ​number​ ​of​ ​clusters,​ ​and​ ​in​ ​contrast​ ​to​ ​k-means,​ ​produces​ ​reproducible  results​ ​every​ ​time.

     

    A​ ​dendrogram​ ​is​ ​used​ ​to​ ​visualize​ ​how​ ​clusters​ ​are​ ​merged​ ​as​ ​well​ ​as​ ​to​ ​decide​ ​when​ ​to stop​ ​merging​ ​clusters​ ​and​ ​keep​ ​the​ ​most​ ​relevant​ ​ones.

     

    The​ ​decision​ ​of​ ​merging​ ​clusters​ ​is​ ​based​ ​on​ ​a​ ​distance​ ​function. ​ ​The​ ​algorithm​ ​is​ ​executed once​ ​for​ ​each​ ​of​ ​the​ ​following​ ​functions​ ​measuring​ ​the​ ​distance​ ​between​ ​two​ ​clusters:

     

    1. Euclidean​ ​distance
    2. Squared​ ​Euclidean​ ​distance
    3. Manhattan​ ​distance

     

    The​ ​resulting​ ​clusters​ ​are​ ​inspected​ ​and​ ​compared​ ​using​ ​the​ ​silhouette​ ​coefficient. ​The result​ ​showing​ ​the​ ​best​ ​well​ ​defined​ ​clusters​ ​is​ ​selected

     

    4.2.​ ​Classification

     

    Classification​ ​algorithms​ ​will​ ​be​ ​used​ ​to​ ​predict​ ​the​ ​partner​ ​group​ ​for​ ​new​ ​members.​ ​The  target​ ​groups​ ​have​ ​been​ ​previously​ ​determined​ ​for​ ​existing​ ​partners​ ​using​ ​clustering  algorithms.​ ​The​ ​partner​ ​profile​ ​is​ ​matched​ ​with​ ​the​ ​corresponding​ ​group​ ​to​ ​create​ ​the  dataset​ ​for​ ​the​ ​classification.

     

    For​ ​this​ ​task,​ ​three​ ​classification​ ​algorithms​ ​are​ ​used​ ​to​ ​analyze​ ​the​ ​data:​ ​Naive​ ​Bayes,  Support​ ​Vector​ ​Machine​ ​and​ ​Artificial​ ​Neural​ ​Network.

     

    4.2.1.​ ​Naive​ ​Bayes

     

    This​ ​popular​ ​supervised​ ​learning​ ​algorithm​ ​for​ ​classification​ ​is​ ​based​ ​on​ ​Bayes’​ ​theorem with​ ​the​ ​“naive”​ ​assumption​ ​of​ ​independence​ ​between​ ​every​ ​pair​ ​of​ ​features.​ ​Despite​ ​its  simplicity,​ ​Naive​ ​Bayes​ ​can​ ​often​ ​outperform​ ​more​ ​sophisticated​ ​classification​ ​methods.

     

    Naive​ ​Bayes​ ​can​ ​handle​ ​numeric​ ​values,​ ​therefore​ ​can​ ​be​ ​used​ ​with​ ​the​ ​dataset​ ​without​ ​the  need​ ​to​ ​discretize​ ​features.​ ​A​ ​10-fold​ ​cross​ ​validation​ ​is​ ​performed​ ​on​ ​the​ ​dataset​ ​to  determine​ ​the​ ​average​ ​performance​ ​and​ ​accuracy​ ​of​ ​the​ ​model.

     

    4.2.2​ ​Support​ ​Vector​ ​Machine

     

    Support​ ​vector​ ​machines​ ​(SVMs)​ ​are​ ​a​ ​set​ ​of​ ​supervised​ ​learning​ ​methods​ ​used​ ​for classification,​ ​regression​ ​and​ ​outliers​ ​detection.​ ​SVMs​ ​are​ ​among​ ​the​ ​best​ ​supervised  learning​ ​algorithm​ ​that​ ​are​ ​based​ ​on​ ​the​ ​concept​ ​of​ ​decision​ ​planes​ ​defining​ ​decision  boundaries.

     

    The​ ​SVM​ ​algorithm​ ​uses​ ​the​ ​concept​ ​of​ ​a​ ​maximal-margin​ ​hyperplane​ ​where​ ​the​ ​distance between​ ​a​ ​line​ ​in​ ​a​ ​plane​ ​and​ ​the​ ​closest​ ​data​ ​points​ ​is​ ​referred​ ​to​ ​as​ ​the​ ​margin.​ ​The​ ​best  or​ ​optimal​ ​line​ ​that​ ​can​ ​separate​ ​the​ ​two​ ​classes​ ​is​ ​the​ ​line​ ​that​ ​shows​ ​the​ ​largest​ ​margin.

     

    Since​ ​our​ ​data​ ​might​ ​not​ ​be​ ​perfectly​ ​separable​ ​by​ ​hyperplanes,​​a​ ​polynomial​ ​kernel​ ​is  used​ ​to​ ​analyze​ ​the​ ​data.​ ​Linear​ ​and​ ​quadratic​ ​the​ ​polynomial​ ​functions​ ​will​ ​be​ ​used​ ​in​ ​a  10-fold​ ​cross​ ​validation​ ​process​ ​to​ ​determine​ ​the​ ​average​ ​performance​ ​and​ ​accuracy​ ​of​ ​the  models.
     

    4.2.3. ​ ​Artificial​ ​Neural​ ​Network

     

    An​ ​artificial​ ​neural​ ​network​ ​is​ ​a​ ​supervised​ ​learning​ ​model​ ​used​ ​for​ ​pattern​ ​recognition​ ​and classification. ​ ​The​ ​model​ ​is​ ​inspired​ ​by​ ​biological​ ​systems​ ​or​ ​brain​ ​nervous​ ​systems, ​ ​where a​ ​learning​ ​machine​ ​algorithm​ ​indicates​ ​how​ ​learning​ ​takes​ ​place​ ​and​ ​involves​ ​adjustments to​ ​the​ ​synaptic​ ​connections​ ​between​ ​neurons.

     

    A​ ​typical​ ​neural​ ​network​ ​contains​ ​a​ ​large​ ​number​ ​of​ ​artificial​ ​neurons​ ​called​ ​units​ ​arranged in​ ​a​ ​series​ ​of​ ​layers. ​ ​A​ ​multilayer​ ​perceptron, ​ ​a​ ​ neural​ ​network​ ​using​ ​more​ ​than​ ​one​ ​hidden layer​ ​of​ ​neurons,​ ​analyzes​ ​the​ ​data​ ​using​ ​a​ ​back-propagation​ ​learning​ ​algorithm.

     

    Data​ ​input​ ​can​ ​be​ ​discrete​ ​or​ ​real-valued​ ​while​ ​the​ ​output​ ​is​ ​in​ ​the​ ​form​ ​of​ ​a​ ​vector​ ​of values​ ​and​ ​can​ ​be​ ​discrete​ ​or​ ​real-valued​ ​as​ ​well.

     

    4.3.​ ​Association​ ​Rules

     

    Association​ ​rules​ ​mining,​ ​commonly​ ​know​ ​as​ ​Market​ ​Basket​ ​Analysis,​ ​focuses​ ​on​ ​finding  frequent​ ​co-occurring​ ​associations​ ​among​ ​a​ ​collection​ ​of​ ​items.

     

    In​ ​our​ ​solution​ ​an​ ​association​ ​rule​ ​algorithm​ ​(Apriori)​ ​is​ ​used​ ​to​ ​examine​ ​the​ ​characteristics  of​ ​the​ ​marketing​ ​tactics​ ​executed​ ​by​ ​groups​ ​of​ ​local​ ​partners​ ​​ ​to​ ​determine​ ​what  combination​ ​of​ ​tactics​ ​produce​ ​the​ ​best​ ​results​ ​(conversions).​ ​The​ ​results​ ​are​ ​used​ ​to  provide​ ​recommendations​ ​to​ ​new​ ​local​ ​partners​ ​joining​ ​the​ ​platform​ ​based​ ​on​ ​their​ ​group.

     
    4.3.1​ ​Apriori​ ​Algorithm

     

    This​ ​algorithm​ ​is​ ​commonly​ ​used​ ​in​ ​frequency​ ​itemset​ ​mining​ ​and​ ​association​ ​rules​ ​mining.  In​ ​this​ ​algorithm​ ​an​ ​itemset​ ​containing​ ​k-elements​ ​is​ ​used​ ​to​ ​explore​ ​k+1​ ​item sets​ ​in​ ​order  to​ ​identify​ ​frequent​ ​occurrences​ ​of​ ​those​ ​item sets​ ​in​ ​a​ ​dataset.

     

    Apriority​ ​identifies​ ​frequent​ ​individual​ ​items​ ​in​ ​the​ ​dataset​ ​and​ ​extend​ ​them​ ​to​ ​larger  itemsets,​ ​one​ ​item​ ​at​ ​a​ ​time​ ​in​ ​a​ ​process​ ​know​ ​as​ ​candidate​ ​generation.​ ​Items​ ​are​ ​added​ ​to  candidates​ ​as​ ​long​ ​as​ ​the​ ​itemset​ ​appears​ ​frequently​ ​in​ ​the​ ​dataset.​ ​Counts​ ​of​ ​candidate  items​ ​are​ ​performed​ ​using​ ​a​ ​breadth-first​ ​search​ ​method​ ​and​ ​a​ ​hash​ ​tree​ ​structure.

     

    The​ ​candidates​ ​are​ ​analyzed​ ​to​ ​ensure​ ​they​ ​meet​ ​the​ ​threshold​ ​defined​ ​for​ ​support​ ​and  confidence.​ ​Support​ ​is​ ​an​ ​indication​ ​of​ ​how​ ​frequently​ ​the​ ​itemset​ ​appears​ ​in​ ​the​ ​dataset  while​ ​confidence​ ​is​ ​an​ ​indication​ ​of​ ​how​ ​often​ ​the​ ​rule​ ​has​ ​been​ ​found​ ​to​ ​be​ ​true.​ ​The  support​ ​for​ ​this​ ​algorithm​ ​is​ ​0.10​ ​and​ ​the​ ​confidence​ ​0.75.

     

    5. Evaluation

     
    The proposed models for this project are evaluated on two aspects

    1. Performance Evaluation
    2. Business Success Evaluation

     
    5.1. Model Performance Evaluation
     
    The performance of each one of the proposed models is measured using relevant metrics for the problem domain they will be used for and then compared against the results of other models in the same problem domain.
     
    The model evaluation and testing is performed using Orange. Specific “workflows” for each problem domain are built using this tool to prepare the dataset and subsequently train, evaluate, test and compare the models.

     

    5.1.1. Clustering
     
    As previously stated in section 4., different variations of K-means and HAC are trained using the entire partner profile dataset.

     

    A group of subject domain experts is included in the evaluation process. The role of this group is to provide some “ground truth” by classifying local partners into five target segments based on their empirical business knowledge.

     

    The resulting clusters from K-means and HAC are evaluated and compared against the “ground truth” using the following metrics:

    • Homogeneity: each cluster contains only members of a single class.
    • Completeness: all members of the same class are assigned to the same cluster.
    • Silhouette Coefficient: measures density and separation by calculating the distance between a point and other points in the cluster as well as the distance to points in the nearest cluster.

    The model producing the most relevant and well defined clusters for the business domain is used to classify local partners into the five target clusters/groups.

    ​ 

    5.1.2. Classification
     

    The classified partner profile dataset (result from clustering) is split into a training dataset (75%) and a hold-out dataset (25%).

     

    The candidate models are trained using 10-fold cross validation with stratification. During training the “learning” of the models is monitored using fit statistics calculated for each k-fold and averaged at the end of the k-fold cross-validation. The following classification metrics are used during this phase

       

    • Accuracy: the ability of a model to correctly label a sample as positive or negative
    • Precision: the ability of a model not to label as positive a sample that is negative
    • Recall: the ability of the classifier to find all the positive samples
    • F1-score: the weighted harmonic mean of the precision and recall

     

    For the next phase a random classification model is used as a baseline for the model comparison. In the validation phase the candidate models are validated using the hold-out data. In addition to the fit statistics used during training, the following metrics are used to evaluate the predictions on the hold-out data:

       

    • Coverage Error: the average number of labels that have to be included in the final prediction such that all true labels are predicted.
    • ROC: the Receiver Operating Characteristic shows a true positive rate against a false positive rate. ROC curves are typically used with binary classifiers. In order use ROC curve in our multi-class dataset, it is necessary to binarize the output. One ROC curve is generated per class. ROC curves typically feature true positive rate on the Y axis, and false positive rate on the X axis. This means that the top left corner of the plot is the “ideal” point – a false positive rate of zero, and a true positive rate of one.
    • AUC: the Area Under the Curve assess the performance of the classifier over its entire operating range.
    • Cumulative Gain: this chart shows the percentage of the overall number of samples in a given class “gained” by targeting a percentage of the total number of samples.
    • Lift Curve: measures the performance of a model against the random classification model. The curve shows on the y axis the values corresponding to the ratio of the cumulative gain in relation to the baseline.

     

    The ROC curve is used to select the threshold that maximizes the true positives, while minimizing the false positives, for each one of the binaries-class outputs. The overall performance of a candidate model is calculated then by averaging the AUC and the selected thresholds.

     

    5.2. Business Success Evaluation
     
    Placeholder

     
    5.3. Considerations/Recommendations
     
    Placeholder

     

    Product code: Marketing-QA159

     
    Looking for best Marketing-QA159 online,please click here
     

    Summary