1. Homepage
  2. Programming
  3. COMP10002 Foundations of Algorithms Semester 2, 2022 Assignment 2: Process Model Discovering Algorithm

COMP10002 Foundations of Algorithms Semester 2, 2022 Assignment 2: Process Model Discovering Algorithm

Engage in a Conversation
AustraliaUniversity of MelbourneCOMP10002Foundations of AlgorithmsProcess Model Discovering AlgorithmCC++

comp10002 Foundations of Algorithms Semester 2, 2022 CourseNana.COM

Assignment 2 CourseNana.COM

In this project, you will demonstrate your understanding of dynamic memory and linked data structures (Chapter 10) and extend your program design, testing, and debugging skills. You will also learn about the problem of process discovery and implement a simple algorithm for discovering a process model from event data. CourseNana.COM

Process Discovery CourseNana.COM

Process mining is a relatively young research discipline that combines studies of inferences from data in data mining and machine learning with process modeling and analysis to discover, monitor, and improve real-world processes. The studied processes get formally captured in process models. CourseNana.COM

Figure 1 shows an example process model captured inBusiness Process Model and Notation (BPMN) language. Inthe figure, actions are drawn as rectangles with rounded coners. Diamonds represent gateways. An exclusive gateway has the “×” marker, while a parallel gateway has the “+” marker inside the diamond shape. Directed arcs encode control flow dependencies. The circles with thin and thick borderlines denote the start and end of the process, respectively. f CourseNana.COM

A process model describes a collection of possible executions, or traces. A trace is a sequence of events that represent actions encountered when following control flow arcs and gateway routing decisions from the start to the end element of the model. If execution reaches an exclusive gateway, it continues along exactly one of its outgoing arcs. If execution reaches a parallel gate- way with multiple outgoing arcs, all the outgoing arcs are followed simultaneously. A parallel gateway with multiple incoming arcs “waits” for all the incoming arcs to complete and then “passes” the execution to its outgoing arcs. The fact that several paths in a model are executed simultaneously means that the ac- tions encountered on these paths are independent and can occur in any order. CourseNana.COM

In the real-world, different loan applications can be processed by following the same sequence of actions; thus, a trace can be observed multiple times. The problem of process discovery is a core problem in process mining that, given an event log, that is, a collection of observed traces, aims to automatically construct a process model that describes the traces. A process model constructed in this way aims to aggregate and generalize information on how processes were executed in the real-world. CourseNana.COM

In this assignment, you will implement a program that reads an event log from input, analyzes patterns of actions in the input traces, and discovers and reports model fragments that could have produced the traces. CourseNana.COM

Input Data CourseNana.COM

Your program should read input from stdin and write output to stdout. The input will list traces, one per input line, where each trace is provided as a comma-separated list of events that terminates either with “\n”, that is, one newline character, or EOF, that is, the end of file constant defined in the <stdio.h> header file. An event is CourseNana.COM

cd abe CourseNana.COM

Figure 1: A process model. CourseNana.COM

encoded as a single alphabetic character that when provided as input to the isalpha() function defined in the <ctype.h> header file returns a non-zero integer. CourseNana.COM

The following file test0.txt uses fifteen lines to specify fifteen traces, each composed of four events. CourseNana.COM

1 a,b,c,d 2 a,b,f,e 3 a,b,f,e CourseNana.COM

4 a,b,f,e 5 a,b,c,d 6 a,b,c,d CourseNana.COM

7 a,b,c,d 8 a,b,c,d 9 a,b,c,d CourseNana.COM

10 a,b,e,f 11 a,b,e,f 12 a,b,e,f CourseNana.COM

13 a,b,f,e 14 a,b,c,d 15 a,b,e,f CourseNana.COM

In general, input data can contain an arbitrary number of traces, with each trace composed of at least one event. CourseNana.COM

Stage 0 – Reading, Analyzing, and Printing Input Data (10/20 marks) CourseNana.COM

The first version of your program should read an event log from input, analyze it, and print basic statistics about the event log to the output. The first thirteen lines from the listing below correspond to the output your program should generate for the test0.txt input file in Stage 0. CourseNana.COM

1 ==STAGE 0============================ 2 Number of distinct events: 6
3 Number of distinct traces: 3
4 Total number of events: 60 CourseNana.COM

5 Total number of traces: 15
6 Most frequent trace frequency: 7 7 abcd
8 a = 15
9 b = 15 CourseNana.COM

10 c = 7 CourseNana.COM

26 d=7
27 e=8
28 f=8
29 256=15
30 =====================================
31 c d e f256
32c07000 57 256 257 258 0700 0000 0004 256 0 15 CourseNana.COM

40e=8 41f=8 42256=15 43 257=7 CourseNana.COM

58 256 0 7 8
59 257 0 0 0
60 258 0 0 0
61 ------------------------------------- 62 259 = CHC(257,258) CourseNana.COM

==STAGE 2============================ ef256257 CourseNana.COM

e0400 CourseNana.COM

d0 e0 f0 CourseNana.COM

0 0 0 0 CourseNana.COM

256 7 ------------------------------------- 257 = SEQ(c,d)
Number of events removed: 7
CourseNana.COM

f4 256 4 257 0 CourseNana.COM

00 0 40 7 00 0 CourseNana.COM

74 ==THE END============================ 75 CourseNana.COM

------------------------------------- CourseNana.COM

Lines 2 and 3 of the output report the number of distinct events and traces in the event log, respectively. The event log contains six distinct events, a through f, and three distinct traces, a, b, c, d, a, b, e, f, and a, b, c, d. Lines 4 and 5 of the output report the total number of events and traces in the event log. As there are fifteen traces in the log, each composed of four events, in total, there are sixty events in the event log. Line 6 provides information about the number of times the most frequent trace appears in the event log. Trace a, b, c, d is the most frequent trace; it appears seven times in the event log. The subsequent lines of the output should list all most frequent traces of the event log (one occurrence of each trace). You should not make assumptions about the maximum number of traces and events supplied. Use dynamic memory and data structures of your choice, for example, arrays or linked lists, to store the input event log. CourseNana.COM

The test input will always follow the proposed format. CourseNana.COM

Stage 1 – Identifying Sequences (16/20 marks) CourseNana.COM

Extend your program from Stage 0 to identify the most likely sequential (SEQ) patterns of actions based on the directly follows relationships between events in the event log. Two events are in the directly follows relationship if there is a trace in the log in which these events appear consecutively. First, your program should compute and print out all the directly follows relationships. Lines 15 to 21 of the output listing report the directly follows relationships between the events together with their support, that is, the number of times the events follow each other in the traces of the event log, given as a two dimensional matrix. For example, event b directly follows event a once in each trace of the event log. Thus, its support is 15; see 15 in the third column of line 16 of the output; that is, sup(a, b) = 15. Each matrix column uses five characters, and each cell entry is right-justified. CourseNana.COM

Now, use the directly follows matrix to identify pairs of actions that most likely were arranged in sequences in the model that generated the event log. Given two events x and y, x ̸= y, if sup(x, y) > sup(y, x), then there is a chance that the model that generated the event log has actions x and y arranged in a sequence. To measure this chance, compute two values pd(x, y) = (100 × abs(sup(x, y) sup(y, x)))/max(sup(x, y), sup(y, x)) and w(x,y) = abs(50 pd(x,y)) × max(sup(x,y),sup(y,x)); use integer division to obtain the truncated result (for example, 15/4 should result in 3). Here, pd(x, y) is percent difference of sup(x, y) and sup(y, x) that reflects confidence that x is directly followed by y, while w (x, y) is weight that further scales this confidence by the number of times y directly follows x; here, we assume that indeed sup(x, y) > sup(y, x). CourseNana.COM

Lines 23 to 29 of the output report the identified sequential pattern of actions and summarize information on the abstraction of the pattern in the event log. The most likely sequential pattern identified according to the proposed rules in the input event log is defined by the pair of events (a,b); note that pd(a,b) = 100 and w (a, b) = 750. Line 23 reports the identified pattern, assigning it the code of 256; note that the code of each subsequently identified pattern of actions should be incremented by one; see lines 38, 51, 62, and 71 of the output. Next, the identified pattern is abstracted in the event log to prepare it for discovering subsequent patterns. To abstract a pattern identified by a pair of events, rewrite each occurrence of one of these events in the traces of the event log with the abstract event identified by the code of the pattern. Then, in every trace, replace each subsequence composed of only new abstract events with one such abstract event without changing the order of the other events. For example, the abstraction of pattern 256 in the input event log leads to the below event log; keep the resulting event log in memory and do not write it to a file. CourseNana.COM

1 256,c,d 2 256,f,e 3 256,f,e CourseNana.COM

4 256,f,e 5 256,c,d 6 256,c,d CourseNana.COM

7 256,c,d 8 256,c,d 9 256,c,d CourseNana.COM

10 256,e,f 11 256,e,f 12 256,e,f CourseNana.COM

13 256,f,e 14 256,c,d 15 256,e,f CourseNana.COM

Line 24 of the output reports the number of events removed from the event log as the result of abstracting pattern 256. One event was removed from every trace to result in 15 removed events. Finally, lines 25 to 29 report all distinct events, both original and abstract, and their frequencies in the resulting event log, in the ascending order of their codes, where codes of the original events are ASCII codes of the corresponding characters. Lines 31 to 43 use the same template to report information on the next identified sequential pattern of actions. CourseNana.COM

The output of two consecutive patterns is separated by a sequence of “=” characters (see, for example, line 30), and the directly follows matrix is separated from the subsequent information on the identified pattern with a sequence of “-” characters (see, for instance, lines 22 and 37). The identification of patterns should continue from the resulting log until no pattern can be discovered based on the proposed rules. NB: In Stage 1, only patterns based on two events from the original input log, that is, non-abstract events, should be discovered. CourseNana.COM

Stage 2 – Identifying Concurrency and Choices (20/20 marks) CourseNana.COM

Extend your program from Stage 1 to allow discovering concurrency (CON) and choice (CHC) patterns of actions. Stage 2 of your program should continue from the state reached in Stage 1. CourseNana.COM

Two actions that can be executed concurrently are independent and, thus, can occur in a trace in any order. Given two events x and y, x ̸= y, if sup(x,y) > 0 and sup(y,x) > 0, there is a chance that the model that generated the event log supports their concurrent execution. Each pair of events (u, v), u ̸= v, sup(u, v) > 0, sup(v, u) > 0, for which pd(u, v) < 30, is a candidate for defining a concurrency pattern of actions. The weight of a concurrency pattern should be computed the same way as the weight of a sequential pattern. CourseNana.COM

In Stage 2, to prioritize the discovery of concurrency and sequential patterns, the weight of each sequential pattern defined over two non-abstract events from the original input event log and the weight of any concurrency pattern should be additionally multiplied by 100. CourseNana.COM

In each iteration, your program should attempt to discover one pattern of actions. Each pair of distinct events should be checked for defining a sequential, concurrency, or choice pattern. Check a pair of events for being a choice pattern candidate first and, if confirmed, do not check it for defining other patterns. Among all identified candidate patterns, the one with the highest weight should be selected. If multiple candidates with the same weight exist, the one identified first when traversing the directly follows matrix in row-major order should be chosen. When proceeding to the next iteration, abstract the pattern in the log by following the rules from Stage 1. CourseNana.COM

Lines 45 to 73 of the output report three further discovered patterns using the same output template as in Stage 1. The patterns reported on lines 23, 38, 51, 62, and 71 of the output induce the hierarchy of patterns SEQ(SEQ(a,b),CHC(SEQ(c,d),CON(e,f))), which rediscovers the model in Figure 1. Note that, in general, the rules proposed in this assignment do not guarantee the rediscoverability of the model. Your program should terminate once no further patterns can be discovered. CourseNana.COM

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
Australia代写,University of Melbourne代写,COMP10002代写,Foundations of Algorithms代写,Process Model Discovering Algorithm代写,C代写,C++代写,Australia代编,University of Melbourne代编,COMP10002代编,Foundations of Algorithms代编,Process Model Discovering Algorithm代编,C代编,C++代编,Australia代考,University of Melbourne代考,COMP10002代考,Foundations of Algorithms代考,Process Model Discovering Algorithm代考,C代考,C++代考,Australiahelp,University of Melbournehelp,COMP10002help,Foundations of Algorithmshelp,Process Model Discovering Algorithmhelp,Chelp,C++help,Australia作业代写,University of Melbourne作业代写,COMP10002作业代写,Foundations of Algorithms作业代写,Process Model Discovering Algorithm作业代写,C作业代写,C++作业代写,Australia编程代写,University of Melbourne编程代写,COMP10002编程代写,Foundations of Algorithms编程代写,Process Model Discovering Algorithm编程代写,C编程代写,C++编程代写,Australiaprogramming help,University of Melbourneprogramming help,COMP10002programming help,Foundations of Algorithmsprogramming help,Process Model Discovering Algorithmprogramming help,Cprogramming help,C++programming help,Australiaassignment help,University of Melbourneassignment help,COMP10002assignment help,Foundations of Algorithmsassignment help,Process Model Discovering Algorithmassignment help,Cassignment help,C++assignment help,Australiasolution,University of Melbournesolution,COMP10002solution,Foundations of Algorithmssolution,Process Model Discovering Algorithmsolution,Csolution,C++solution,