1. Homepage
  2. Programming
  3. INFS 630 - Data Mining Assignment 1: Association Rule Mining with RapidMiner Studio

INFS 630 - Data Mining Assignment 1: Association Rule Mining with RapidMiner Studio

Engage in a Conversation
CanadaMcGillINFS 630Data MiningAssociation Rule Mining with RapidMiner StudioMcGill University

INFS 630 – Data Mining, Winter 2023 CourseNana.COM

Assignment 1: Association Rule Mining with RapidMiner Studio CourseNana.COM

This is an individual assignment, not a group assignment. In this assignment, you will learn frequent itemset mining and association rule mining in RapidMiner Studio with a real-world transaction dataset that contains grocery purchase records. The data can be downloaded from MyCourses. After completing the following instructions for assignment 1, you will learn how to: CourseNana.COM

  • Install RapidMiner Studio and getting familiar with different UI components.
  • Use RapidMiner to read transaction data.
  • Transform transaction data to binominal data.
  • Discover frequent item sets.
  • Create association rules.

Then answer the questions, and put your answers in a Word document. Submit your Word (or PDF) file via MyCourses before 2:30pm on February 22 (Tue). The instruction may seem to be a bit lengthy, but the steps are not difficult. I just want to provide sufficient details so that you will not miss any steps. CourseNana.COM

1. Installation. CourseNana.COM

    1. Download RapidMiner Studio v9.10.011 (or later version) from

https://rapidminer.com/platform/educational/. CourseNana.COM

    1. Click on Download Studio. You will be asked to create an account.

Select Educational purposes. You have to verify the email address before logging in. You may want to use your McGill email address. Follow the instruction to get an educational license. CourseNana.COM

    1. Install RapidMiner according to the installation wizard.
    2. Open RapidMiner. Install the Text Processing package in the

RapidMiner market place. Marketplace can be accessed in RapidMiner Studio through the main menu: Extensions/ Marketplace. Search for Text Processing in the search box. CourseNana.COM

2. Introducing UI components in RapidMiner Studio:
a. The upper-left Repository panel. This is a local storage repository of CourseNana.COM

your computer where you can access you saved scripts and data.
Page 1 of 6 CourseNana.COM

  1. The central Process panel. This is the main canvas that you set up flows of operators to complete a data mining task. An operator executes a specific action. It is of shape rectangle. It has input connectors on its left and output connectors on its right.
  2. The lower-left Operators panel. Here, you can search for a specific operator and include it in your process by dragging it to your Process panel.
  3. The upper-right Parameters panel. By clicking a specific operator in the Process panel, you can configure settings for this operator. For example, you can specify the file to be read for an Open File operator.
  4. The lower-right Help panel. By clicking a specific operator in the Process panel, you can find its information regarding the specific actions, required settings, and the types of input/output data.

3. Data preparation: CourseNana.COM

  1. Use Notepad, Excel, or any text editor to open the transaction file

Assignment1 - Data - groceries.csv and take a look what it looks like. CourseNana.COM

  1. Open RapidMiner Studio.
  2. Select menu item File New Process and create a Blank Process.
  3. In the Operators panel, search for the Open File operator. Drag the

Open File operator to the Process panel. CourseNana.COM

  1. Click the Open file operator in the Process panel. In the Parameters

panel, select your input transaction file for the filename option using CourseNana.COM

the button. CourseNana.COM

  1. In the Operators panel, search for the Read CSV operator. Drag the

Read CSV operator to the Process panel. CourseNana.COM

  1. In the Process panel, connect the fil output of the Open file operator

to the fil input of Read CSV operator by dragging a line between CourseNana.COM

them. CourseNana.COM

  1. Select the Read CSV operator in the Process panel. In Parameters

panel, click the Import Configuration Wizard button. CourseNana.COM

  1. In the first step of the wizard, select your input transaction file again.

Then move to the second step by clicking Next. CourseNana.COM

  1. In the second step of the wizard, uncheck Header Row, select

Semicolon as the Column Separator. Then move to the third step by CourseNana.COM

clicking Next. CourseNana.COM

  1. In the third step, click finish to complete the wizard.

l. In the Parameters panel, scroll down and find data set meta data information and click Edit List (1).... (If you do not see it, click Show advanced parameters.) CourseNana.COM

  1. Change the type of att1 from polynomial to text. Then click Apply.
  2. In the Process panel, you can connect the out output of the Read CSV

operator to the res result connector on the right edge of Process CourseNana.COM

panel. By clicking the Run button above the Process panel, you can run your process and see the result of the Read CSV operator. At this stage, the result should be a table of two columns. The first column is Row No. which indicates the row identification number. The second one is att1, which indicates the items included in a transaction. Items are separated by comma. Switch back to your process by selecting the Design view above the Process panel. CourseNana.COM

  1. Next, in the Operators panel, search for the Process Documents from Data operator and drag it to your process. Connect the out output of the Read CSV operator to the input exa of the Process Documents from Data operator. Click the Process Documents from Data operator, and then in the Parameters panel, set vector creation option to Term Occurrences.
  2. By double clicking the Process Documents from Data operator, you go into the inside flow of this operator. Here, we need to specify how we want to create a document from a transaction. A document is defined as a list of tokens. Search for the Tokenize operator and drag it to the flow.
  3. Connect the doc connector on the left edge of the Process panel to the input of the Tokenize operator. Connect the output of the Tokenize operator to the doc connector on the right edge of the Process panel. Click on the Tokenize operator and set the option mode in the Parameters panel to specify characters. Set the option characters to comma by typing , in the input box (put a comma in the box). Go back to your main process by clicking the process link

above the Process panel. CourseNana.COM

  1. Connect the exa output of your Process Documents from Data

operator to the res connector on the right edge of the Process panel. Hit the Run button to see the result. You should have a table that consists of multiple numeric attributes. Each row represents a transaction and each column represent a grocery item. If a transaction consists of an item, the attribute corresponding to that CourseNana.COM


CourseNana.COM

item is 1, otherwise 0. Switch back to your process by selecting the CourseNana.COM

Design view on to top of the Process panel. CourseNana.COM

  1. Next, we transform the numeric table to a binominal data. Search for

the Numerical to Binominal operator and drag it to your process. Connect the exa output of your Process Documents from Data to the input of the Numerical to Binominal operator. CourseNana.COM

  1. Click the Numerical to Binominal operator and set min option to -0.5 and max option to 0.5 in the parameter panel. A numeric value falls within this range will be replaced by a binominal value false. If not, it will be replaced by true.
  2. Inspect your result by connecting the exa output of the Numeric to Binominal operator to the res connector on the right edge of the Process panel. After clicking the Run button, you should have a table that consists of multiple binominal attributes. Each row represents a transaction and each column represent a grocery item. If a transaction consists of an item, the attribute corresponding to that item is true, otherwise false. Switch back to your process by selecting the Design view above the Process panel.
  3. If you see the table with binominal data, your data is ready for frequent itemset mining and association rule mining. Otherwise, please go back to previous steps and check your process.

4.     Discover frequent item sets. CourseNana.COM

1.     Search for the FP-Growth operator in the Operators panel and drag it to CourseNana.COM

your process. CourseNana.COM

2.     Connect the exa output of the Numerical to Binominal operator to the CourseNana.COM

exa input of the FP-Growth operator. Click the FP-Growth operator and uncheck the find min number of itemset option. Start by setting the min support option to 0.01, since we have a large dataset. CourseNana.COM

3.     Connect the fre output of the FP-Growth operator to the res connector on the right edge of the Process panel. Click the Run button to see the result. You can see a list of frequent itemset. If not, go back to previous steps and check your process and configurations. CourseNana.COM

4.     Switch back to your process by selecting the Design view above the Process panel. CourseNana.COM

5.     Create association rules. CourseNana.COM

a. Search for the Create Association Rules operator in the Operators panel CourseNana.COM

and drag it to your process. CourseNana.COM

  1. Connect the fre output of the FP-Growth operator to the ite input of the Create Association Rules operator. Click the Create Association Rules operator and set the minimum confidence to 0.5 in the Parameters panel.
  2. Connect the rul output of the Create Association Rules operator to the res connector on the right edge of the Process panel. Click the Run button to see the result. You can see a list of association rules. If not, go back to previous steps and check your process and configurations.

6. You can manipulate the min support option and the minimum confidence option to see different results. CourseNana.COM

Questions CourseNana.COM

  1. Briefly describe the format of the input data. How is the data arranged? What does each row represent? What is the expected input format of the FP-Growth operator in RapidMiner?
  2. Get familiar with the rich tools provided in RapidMiner for data transformation and data cleaning. Convert the data into a table that meets the expected input format of the frequent itemset mining operator. Follow the above instructions to set up the processes for frequent itemset mining and association rule mining on top of your data transformation process. Capture your “Process” and paste it on this assignment. (Note: you can capture the process by right-clicking on the white space of the “Process” pane, and select “Print/Export image”. You may also capture the screen by pressing Alt-Print Screen or using the Snipping Tool in Windows.)
  3. Briefly describe each operator in your process in one or two sentences. List three association rules that satisfy a support value of 0.02 and a minimum confidence of 0.4. Set the Min. Criterion on the lower left corner to minimal by sliding the knob to left. Under this setting, how many association rules contain the item ‘whole milk’?
  4. Experiment with different minimum support and minimum confidence values. Describe your observation and comment on the difference of the results with different settings. What happens when you increase/decrease the minimum support value? What happens when you increase/decrease the minimum confidence value? What happens to the popular items, such as 'whole milk', when you have a low minimum support value and a high minimum confidence value? What happens to the popular items when you have a high minimum support value and a low minimum confidence value?

  CourseNana.COM

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
Canada代写,McGill代写,INFS 630代写,Data Mining代写,Association Rule Mining with RapidMiner Studio代写,McGill University代写,Canada代编,McGill代编,INFS 630代编,Data Mining代编,Association Rule Mining with RapidMiner Studio代编,McGill University代编,Canada代考,McGill代考,INFS 630代考,Data Mining代考,Association Rule Mining with RapidMiner Studio代考,McGill University代考,Canadahelp,McGillhelp,INFS 630help,Data Mininghelp,Association Rule Mining with RapidMiner Studiohelp,McGill Universityhelp,Canada作业代写,McGill作业代写,INFS 630作业代写,Data Mining作业代写,Association Rule Mining with RapidMiner Studio作业代写,McGill University作业代写,Canada编程代写,McGill编程代写,INFS 630编程代写,Data Mining编程代写,Association Rule Mining with RapidMiner Studio编程代写,McGill University编程代写,Canadaprogramming help,McGillprogramming help,INFS 630programming help,Data Miningprogramming help,Association Rule Mining with RapidMiner Studioprogramming help,McGill Universityprogramming help,Canadaassignment help,McGillassignment help,INFS 630assignment help,Data Miningassignment help,Association Rule Mining with RapidMiner Studioassignment help,McGill Universityassignment help,Canadasolution,McGillsolution,INFS 630solution,Data Miningsolution,Association Rule Mining with RapidMiner Studiosolution,McGill Universitysolution,