1. Find the 2 missing number from an array of N element given N-2 element

Scala Code :

2. ``````def findMissingNums(datax:Array[Int], len:Integer) = {
val total = datax.sum
val actual_total = (1 to len).sum
val diff = actual_total - total
var i = diff
var firstElem=0
var secondElem=0
while (i > 0) {
if (!datax.contains(i) && diff < len) {
firstElem = i
secondElem = diff - firstElem
i = 0
}
else if(!datax.contains(i) && diff > len){
if(i>=diff){
}
else{
firstElem = i
secondElem = diff - firstElem
}
}
i = i - 1
}
println(firstElem+","+secondElem)
}``````

2. Swap the 2 numbers without using temp variable

def swapNum(num1:Int,num2:Int){
var number1=num1
var number2=num2
number1 = number2 – number1
number2 = number2 – number1
number1 = number1 + number2
println(“num1 = “+number1+”, num2 = “+number2)

}

Priority Queue – Data Structure

We know that `Queue` follows First-In-First-Out model but sometimes we need to process the objects in the queue based on the priority. That is when Java`PriorityQueue` is used.

For example, let’s say we have an application that generates stocks reports for daily trading session. This application processes a lot of data and takes time to process it. So customers are sending request to the application that is actually getting queued but we want to process premium customers first and standard customers after them. So in this case PriorityQueue implementation in java can be really helpful.

PriorityQueue is an unbounded queue based on a priority heap and the elements of the priority queue are ordered by default in natural order. We can provide a Comparator for ordering at the time of instantiation of priority queue.

Java Priority Queue doesn’t allow `null` values and we can’t create PriorityQueue of Objects that are non-comparable. We use java Comparable and Comparator for sorting Objects and Priority Queue use them for priority processing of it’s elements.

The simplest way to implement a priority queue data type is to keep an associative array mapping each priority to a list of elements with that priority. If association lists or hash tables are used to implement the associative array, adding an element takes constant time but removing or peeking at the element of highest priority takes linear (O(n)) time, because we must search all keys for the largest one. If a self-balancing binary search tree is used, all three operations take O(log n) time; this is a popular solution in environments that already provide balanced trees but nothing more sophisticated.

There are a number of specialized heap data structures that either supply additional operations or outperform the above approaches. The binary heap uses O(log n) time for both operations, but allows peeking at the element of highest priority without removing it in constant time. Binomial heaps add several more operations, but require O(log n) time for peeking. Fibonacci heaps can insert elements, peek at the maximum priority element, and decrease an element’s priority in amortized constant time (deletions are still O(log n)).

// BinaryHeap class
//
// CONSTRUCTION: empty or with initial array.
//
// ******************PUBLIC OPERATIONS*********************
// void insert( x )       –> Insert x
// Comparable deleteMin( )–> Return and remove smallest item
// Comparable findMin( )  –> Return smallest item
// boolean isEmpty( )     –> Return true if empty; else false
// void makeEmpty( )      –> Remove all items
// ******************ERRORS********************************
// Throws UnderflowException for findMin and deleteMin when empty

/**
* Implements a binary heap.
* Note that all “matching” is based on the compareTo method.
*/
public class BinaryHeap implements PriorityQueue {
/**
* Construct the binary heap.
*/
public BinaryHeap( ) {
currentSize = 0;
array = new Comparable[ DEFAULT_CAPACITY + 1 ];
}

/**
* Construct the binary heap from an array.
* @param items the inital items in the binary heap.
*/
public BinaryHeap( Comparable [ ] items ) {
currentSize = items.length;
array = new Comparable[ items.length + 1 ];

for( int i = 0; i < items.length; i++ )
array[ i + 1 ] = items[ i ];
buildHeap( );
}

/**
* Insert into the priority queue.
* Duplicates are allowed.
* @param x the item to insert.
* @return null, signifying that decreaseKey cannot be used.
*/
public PriorityQueue.Position insert( Comparable x ) {
if( currentSize + 1 == array.length )
doubleArray( );

// Percolate up
int hole = ++currentSize;
array[ 0 ] = x;

for( ; x.compareTo( array[ hole / 2 ] ) < 0; hole /= 2 )
array[ hole ] = array[ hole / 2 ];
array[ hole ] = x;

return null;
}

/**
* @throws UnsupportedOperationException because no Positions are returned
* by the insert method for BinaryHeap.
*/
public void decreaseKey( PriorityQueue.Position p, Comparable newVal ) {
throw new UnsupportedOperationException( “Cannot use decreaseKey for binary heap” );
}

/**
* Find the smallest item in the priority queue.
* @return the smallest item.
* @throws UnderflowException if empty.
*/
public Comparable findMin( ) {
if( isEmpty( ) )
throw new UnderflowException( “Empty binary heap” );
return array[ 1 ];
}

/**
* Remove the smallest item from the priority queue.
* @return the smallest item.
* @throws UnderflowException if empty.
*/
public Comparable deleteMin( ) {
Comparable minItem = findMin( );
array[ 1 ] = array[ currentSize– ];
percolateDown( 1 );

return minItem;
}

/**
* Establish heap order property from an arbitrary
* arrangement of items. Runs in linear time.
*/
private void buildHeap( ) {
for( int i = currentSize / 2; i > 0; i– )
percolateDown( i );
}

/**
* Test if the priority queue is logically empty.
* @return true if empty, false otherwise.
*/
public boolean isEmpty( ) {
return currentSize == 0;
}

/**
* Returns size.
* @return current size.
*/
public int size( ) {
return currentSize;
}

/**
* Make the priority queue logically empty.
*/
public void makeEmpty( ) {
currentSize = 0;
}

private static final int DEFAULT_CAPACITY = 100;

private int currentSize;      // Number of elements in heap
private Comparable [ ] array; // The heap array

/**
* Internal method to percolate down in the heap.
* @param hole the index at which the percolate begins.
*/
private void percolateDown( int hole ) {
int child;
Comparable tmp = array[ hole ];

for( ; hole * 2 <= currentSize; hole = child ) {
child = hole * 2;
if( child != currentSize &&
array[ child + 1 ].compareTo( array[ child ] ) < 0 )
child++;
if( array[ child ].compareTo( tmp ) < 0 )
array[ hole ] = array[ child ];
else
break;
}
array[ hole ] = tmp;
}

/**
* Internal method to extend array.
*/
private void doubleArray( ) {
Comparable [ ] newArray;

newArray = new Comparable[ array.length * 2 ];
for( int i = 0; i < array.length; i++ )
newArray[ i ] = array[ i ];
array = newArray;
}

// Test program
public static void main( String [ ] args ) {
int numItems = 10000;
BinaryHeap h1 = new BinaryHeap( );
Integer [ ] items = new Integer[ numItems – 1 ];

int i = 37;
int j;

for( i = 37, j = 0; i != 0; i = ( i + 37 ) % numItems, j++ ) {
h1.insert( new Integer( i ) );
items[ j ] = new Integer( i );
}

for( i = 1; i < numItems; i++ )
if( ((Integer)( h1.deleteMin( ) )).intValue( ) != i )
System.out.println( “Oops! ” + i );

BinaryHeap h2 = new BinaryHeap( items );
for( i = 1; i < numItems; i++ )
if( ((Integer)( h2.deleteMin( ) )).intValue( ) != i )
System.out.println( “Oops! ” + i );
}
}

// PriorityQueue interface
//
// ******************PUBLIC OPERATIONS*********************
// Position insert( x )   –> Insert x
// Comparable deleteMin( )–> Return and remove smallest item
// Comparable findMin( )  –> Return smallest item
// boolean isEmpty( )     –> Return true if empty; else false
// void makeEmpty( )      –> Remove all items
// int size( )            –> Return size
// void decreaseKey( p, v)–> Decrease value in p to v
// ******************ERRORS********************************
// Throws UnderflowException for findMin and deleteMin when empty

/**
* PriorityQueue interface.
* Some priority queues may support a decreaseKey operation,
* but this is considered an advanced operation. If so,
* a Position is returned by insert.
* Note that all “matching” is based on the compareTo method.
*/
public interface PriorityQueue {
/**
* The Position interface represents a type that can
* be used for the decreaseKey operation.
*/
public interface Position {
/**
* Returns the value stored at this position.
* @return the value stored at this position.
*/
Comparable getValue( );
}

/**
* Insert into the priority queue, maintaining heap order.
* Duplicates are allowed.
* @param x the item to insert.
* @return may return a Position useful for decreaseKey.
*/
Position insert( Comparable x );

/**
* Find the smallest item in the priority queue.
* @return the smallest item.
* @throws UnderflowException if empty.
*/
Comparable findMin( );

/**
* Remove the smallest item from the priority queue.
* @return the smallest item.
* @throws UnderflowException if empty.
*/
Comparable deleteMin( );

/**
* Test if the priority queue is logically empty.
* @return true if empty, false otherwise.
*/
boolean isEmpty( );

/**
* Make the priority queue logically empty.
*/
void makeEmpty( );

/**
* Returns the size.
* @return current size.
*/
int size( );

/**
* Change the value of the item stored in the pairing heap.
* This is considered an advanced operation and might not
* be supported by all priority queues. A priority queue
* will signal its intention to not support decreaseKey by
* having insert return null consistently.
* @param p any non-null Position returned by insert.
* @param newVal the new value, which must be smaller
*    than the currently stored value.
* @throws IllegalArgumentException if p invalid.
* @throws UnsupportedOperationException if appropriate.
*/
void decreaseKey( Position p, Comparable newVal );
}

/**
* Exception class for access in empty containers
* such as stacks, queues, and priority queues.
*/
public class UnderflowException extends RuntimeException {
/**
* Construct this exception object.
* @param message the error message.
*/
public UnderflowException( String message ) {
super( message );
}
}

In-built java implementation of priorityQueue :

```package com.journaldev.collections;

public class Customer {

private int id;
private String name;

public Customer(int i, String n){
this.id=i;
this.name=n;
}

public int getId() {
return id;
}

public String getName() {
return name;
}

}

```

We will use java random number generation to generate random customer objects. For natural ordering, I will use Integer that is also a java wrapper class.

Here is our final test code that shows how to use priority queue in java.

```package com.journaldev.collections;

import java.util.Comparator;
import java.util.PriorityQueue;
import java.util.Queue;
import java.util.Random;

public class PriorityQueueExample {

public static void main(String[] args) {

//natural ordering example of priority queue
Queue<Integer> integerPriorityQueue = new PriorityQueue<>(7);
Random rand = new Random();
for(int i=0;i<7;i++){
}
for(int i=0;i<7;i++){
Integer in = integerPriorityQueue.poll();
System.out.println("Processing Integer:"+in);
}

//PriorityQueue example with Comparator
Queue<Customer> customerPriorityQueue = new PriorityQueue<>(7, idComparator);

pollDataFromQueue(customerPriorityQueue);

}

//Comparator anonymous class implementation
public static Comparator<Customer> idComparator = new Comparator<Customer>(){

@Override
public int compare(Customer c1, Customer c2) {
return (int) (c1.getId() - c2.getId());
}
};

//utility method to add random data to Queue
private static void addDataToQueue(Queue<Customer> customerPriorityQueue) {
Random rand = new Random();
for(int i=0; i<7; i++){
int id = rand.nextInt(100);
}
}

//utility method to poll data from queue
private static void pollDataFromQueue(Queue<Customer> customerPriorityQueue) {
while(true){
Customer cust = customerPriorityQueue.poll();
if(cust == null) break;
System.out.println("Processing Customer with ID="+cust.getId());
}
}

}```

Apache Graphx

GraphX is a new component in Spark for graphs and graph-parallel computation. At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge. To support graph computation, GraphX exposes a set of fundamental operators

Spark GraphX is a graph processing framework built on top of Spark.

GraphX models graphs as property graphs where vertices and edges can have properties.

GraphX comes with its own package `org.apache.spark.graphx`.

Graph

`Graph` abstract class represents a collection of `vertices` and `edges`.

abstract class Graph[VD: ClassTag, ED: ClassTag]

`vertices` attribute is of type `VertexRDD` while `edges` is of type `EdgeRDD`.

Standard GraphX API

`Graph` class comes with a small set of API.

• Transformations

• `mapVertices`

• `mapEdges`

• `mapTriplets`

• `reverse`

• `subgraph`

• `mask`
• `groupEdges`

• Joins

• `outerJoinVertices`

• Computation

• `aggregateMessages`

Creating Graphs (Graph object)

`Graph` object comes with the following factory methods to create instances of `Graph`:

Main classes & interfaces in Graphx :
Class Description
Edge
A single directed edge consisting of a source id, target id, and the data associated with the edge.
EdgeContext
Represents an edge along with its neighboring vertices and allows sending messages along the edge.
EdgeDirection
The direction of a directed edge relative to a vertex.
EdgeRDD
`EdgeRDD[ED, VD]` extends `RDD[Edge[ED}` by storing the edges in columnar format on each partition for performance.
EdgeTriplet
An edge triplet represents an edge along with the vertex attributes of its neighboring vertices.
Graph
The Graph abstractly represents a graph with arbitrary objects associated with vertices and edges.
GraphKryoRegistrator
Registers GraphX classes with Kryo for improved performance.
Provides utilities for loading `Graph`s from files.
GraphOps
Contains additional functionality for `Graph`.
GraphXUtils
PartitionStrategy.CanonicalRandomVertexCut\$
Assigns edges to partitions by hashing the source and destination vertex IDs in a canonical direction, resulting in a random vertex cut that colocates all edges between two vertices, regardless of direction.
PartitionStrategy.EdgePartition1D\$
Assigns edges to partitions using only the source vertex ID, colocating edges with the same source.
PartitionStrategy.EdgePartition2D\$
Assigns edges to partitions using a 2D partitioning of the sparse edge adjacency matrix, guaranteeing a `2 * sqrt(numParts) - 1` bound on vertex replication.
PartitionStrategy.RandomVertexCut\$
Assigns edges to partitions by hashing the source and destination vertex IDs, resulting in a random vertex cut that colocates all same-direction edges between two vertices.
Pregel
Implements a Pregel-like bulk-synchronous message-passing API.
TripletFields
Represents a subset of the fields of an [[EdgeTriplet]] or [[EdgeContext]].
VertexRDD
Extends `RDD[(VertexId, VD)]` by ensuring that there is only one entry for each vertex and by pre-indexing the entries for fast, efficient joins.

Example Property Graph

Suppose we want to construct a property graph consisting of the various collaborators on the GraphX project. The vertex property might contain the username and occupation. We could annotate edges with a string describing the relationships between collaborators: The resulting graph would have the type signature:

There are numerous ways to construct a property graph from raw files, RDDs, and even synthetic generators and these are discussed in more detail in the section on graph builders. Probably the most general method is to use the Graph object. For example the following code constructs a graph from a collection of RDDs:

In the above example we make use of the `Edge` case class. Edges have a `srcId` and a `dstId` corresponding to the source and destination vertex identifiers. In addition, the `Edge` class has an `attr` member which stores the edge property.

We can deconstruct a graph into the respective vertex and edge views by using the `graph.vertices` and `graph.edges` members respectively.

Note that `graph.vertices` returns an `VertexRDD[(String, String)]` which extends `RDD[(VertexId, (String, String))]` and so we use the scala `case` expression to deconstruct the tuple. On the other hand, `graph.edges` returns an `EdgeRDD` containing`Edge[String]` objects. We could have also used the case class type constructor as in the following:

In addition to the vertex and edge views of the property graph, GraphX also exposes a triplet view. The triplet view logically joins the vertex and edge properties yielding an `RDD[EdgeTriplet[VD, ED]]` containing instances of the `EdgeTriplet` class. This join can be expressed in the following SQL expression:

or graphically as: The `EdgeTriplet` class extends the `Edge` class by adding the `srcAttr` and `dstAttr` members which contain the source and destination properties respectively. We can use the triplet view of a graph to render a collection of strings describing relationships between users.

Graph Operators

Just as RDDs have basic operations like `map`, `filter`, and `reduceByKey`, property graphs also have a collection of basic operators that take user defined functions and produce new graphs with transformed properties and structure. The core operators that have optimized implementations are defined in `Graph` and convenient operators that are expressed as a compositions of the core operators are defined in `GraphOps`. However, thanks to Scala implicits the operators in `GraphOps` are automatically available as members of `Graph`. For example, we can compute the in-degree of each vertex (defined in `GraphOps`) by the following:

The reason for differentiating between core graph operations and `GraphOps` is to be able to support different graph representations in the future. Each graph representation must provide implementations of the core operations and reuse many of the useful operations defined in `GraphOps`.

Summary List of Operators

The following is a quick summary of the functionality defined in both `Graph` and `GraphOps` but presented as members of Graph for simplicity. Note that some function signatures have been simplified (e.g., default arguments and type constraints removed) and some more advanced functionality has been removed so please consult the API docs for the official list of operations.

Property Operators

Like the RDD `map` operator, the property graph contains the following:

Each of these operators yields a new graph with the vertex or edge properties modified by the user defined `map` function.

Note that in each case the graph structure is unaffected. This is a key feature of these operators which allows the resulting graph to reuse the structural indices of the original graph. The following snippets are logically equivalent, but the first one does not preserve the structural indices and would not benefit from the GraphX system optimizations:

Instead, use `mapVertices` to preserve the indices:

These operators are often used to initialize the graph for a particular computation or project away unnecessary properties. For example, given a graph with the out degrees as the vertex properties (we describe how to construct such a graph later), we initialize it for PageRank:

Structural Operators

Currently GraphX supports only a simple set of commonly used structural operators and we expect to add more in the future. The following is a list of the basic structural operators.

The `reverse` operator returns a new graph with all the edge directions reversed. This can be useful when, for example, trying to compute the inverse PageRank. Because the reverse operation does not modify vertex or edge properties or change the number of edges, it can be implemented efficiently without data movement or duplication.

The `subgraph` operator takes vertex and edge predicates and returns the graph containing only the vertices that satisfy the vertex predicate (evaluate to true) and edges that satisfy the edge predicate and connect vertices that satisfy the vertex predicate. The `subgraph` operator can be used in number of situations to restrict the graph to the vertices and edges of interest or eliminate broken links. For example in the following code we remove broken links:

Note in the above example only the vertex predicate is provided. The `subgraph` operator defaults to `true` if the vertex or edge predicates are not provided.

The `mask` operator constructs a subgraph by returning a graph that contains the vertices and edges that are also found in the input graph. This can be used in conjunction with the `subgraph` operator to restrict a graph based on the properties in another related graph. For example, we might run connected components using the graph with missing vertices and then restrict the answer to the valid subgraph.

The `groupEdges` operator merges parallel edges (i.e., duplicate edges between pairs of vertices) in the multigraph. In many numerical applications, parallel edges can be added (their weights combined) into a single edge thereby reducing the size of the graph.

Join Operators

In many cases it is necessary to join data from external collections (RDDs) with graphs. For example, we might have extra user properties that we want to merge with an existing graph or we might want to pull vertex properties from one graph into another. These tasks can be accomplished using the join operators. Below we list the key join operators:

The `joinVertices` operator joins the vertices with the input RDD and returns a new graph with the vertex properties obtained by applying the user defined `map` function to the result of the joined vertices. Vertices without a matching value in the RDD retain their original value.

Note that if the RDD contains more than one value for a given vertex only one will be used. It is therefore recommended that the input RDD be made unique using the following which will also pre-index the resulting values to substantially accelerate the subsequent join.

The more general `outerJoinVertices` behaves similarly to `joinVertices` except that the user defined `map` function is applied to all vertices and can change the vertex property type. Because not all vertices may have a matching value in the input RDD the `map` function takes an `Option` type. For example, we can setup a graph for PageRank by initializing vertex properties with their `outDegree`.

You may have noticed the multiple parameter lists (e.g., `f(a)(b)`) curried function pattern used in the above examples. While we could have equally written `f(a)(b)` as `f(a,b)` this would mean that type inference on `b` would not depend on `a`. As a consequence, the user would need to provide type annotation for the user defined function:

Neighborhood Aggregation

A key step in many graph analytics tasks is aggregating information about the neighborhood of each vertex. For example, we might want to know the number of followers each user has or the average age of the the followers of each user. Many iterative graph algorithms (e.g., PageRank, Shortest Path, and connected components) repeatedly aggregate properties of neighboring vertices (e.g., current PageRank Value, shortest path to the source, and smallest reachable vertex id).

To improve performance the primary aggregation operator changed from `graph.mapReduceTriplets` to the new`graph.AggregateMessages`. While the changes in the API are relatively small, we provide a transition guide below.

Aggregate Messages (aggregateMessages)

The core aggregation operation in GraphX is `aggregateMessages`. This operator applies a user defined `sendMsg` function to each edge triplet in the graph and then uses the `mergeMsg` function to aggregate those messages at their destination vertex.

The user defined `sendMsg` function takes an `EdgeContext`, which exposes the source and destination attributes along with the edge attribute and functions (`sendToSrc`, and `sendToDst`) to send messages to the source and destination attributes. Think of `sendMsg` as the map function in map-reduce. The user defined `mergeMsg` function takes two messages destined to the same vertex and yields a single message. Think of `mergeMsg` as the reduce function in map-reduce. The `aggregateMessages` operator returns a `VertexRDD[Msg]` containing the aggregate message (of type `Msg`) destined to each vertex. Vertices that did not receive a message are not included in the returned `VertexRDD`VertexRDD.

In addition, `aggregateMessages` takes an optional `tripletsFields` which indicates what data is accessed in the `EdgeContext` (i.e., the source vertex attribute but not the destination vertex attribute). The possible options for the `tripletsFields` are defined in `TripletFields` and the default value is `TripletFields.All` which indicates that the user defined `sendMsg` function may access any of the fields in the `EdgeContext`. The`tripletFields` argument can be used to notify GraphX that only part of the `EdgeContext` will be needed allowing GraphX to select an optimized join strategy. For example if we are computing the average age of the followers of each user we would only require the source field and so we would use `TripletFields.Src` to indicate that we only require the source field

In earlier versions of GraphX we used byte code inspection to infer the `TripletFields` however we have found that bytecode inspection to be slightly unreliable and instead opted for more explicit user control.

In the following example we use the `aggregateMessages` operator to compute the average age of the more senior followers of each user.

```import org.apache.spark.graphx.{Graph, VertexRDD}
import org.apache.spark.graphx.util.GraphGenerators

// Create a graph with "age" as the vertex property.
// Here we use a random graph for simplicity.
val graph: Graph[Double, Int] =
GraphGenerators.logNormalGraph(sc, numVertices = 100).mapVertices( (id, _) => id.toDouble )
// Compute the number of older followers and their total age
val olderFollowers: VertexRDD[(Int, Double)] = graph.aggregateMessages[(Int, Double)](
triplet => { // Map Function
if (triplet.srcAttr > triplet.dstAttr) {
// Send message to destination vertex containing counter and age
triplet.sendToDst(1, triplet.srcAttr)
}
},
(a, b) => (a._1 + b._1, a._2 + b._2) // Reduce Function
)
// Divide total age by number of older followers to get average age of older followers
val avgAgeOfOlderFollowers: VertexRDD[Double] =
olderFollowers.mapValues( (id, value) =>
value match { case (count, totalAge) => totalAge / count } )
// Display the results
avgAgeOfOlderFollowers.collect.foreach(println(_))
```
Find full example code at “examples/src/main/scala/org/apache/spark/examples/graphx/AggregateMessagesExample.scala” in the Spark repo.

The `aggregateMessages` operation performs optimally when the messages (and the sums of messages) are constant sized (e.g., floats and addition instead of lists and concatenation).

Map Reduce Triplets Transition Guide (Legacy)

In earlier versions of GraphX neighborhood aggregation was accomplished using the `mapReduceTriplets` operator:

The `mapReduceTriplets` operator takes a user defined map function which is applied to each triplet and can yield messages which are aggregated using the user defined `reduce` function. However, we found the user of the returned iterator to be expensive and it inhibited our ability to apply additional optimizations (e.g., local vertex renumbering). In `aggregateMessages` we introduced the EdgeContext which exposes the triplet fields and also functions to explicitly send messages to the source and destination vertex. Furthermore we removed bytecode inspection and instead require the user to indicate what fields in the triplet are actually required.

The following code block using `mapReduceTriplets`:

can be rewritten using `aggregateMessages` as:

Computing Degree Information

A common aggregation task is computing the degree of each vertex: the number of edges adjacent to each vertex. In the context of directed graphs it is often necessary to know the in-degree, out-degree, and the total degree of each vertex. The `GraphOps` class contains a collection of operators to compute the degrees of each vertex. For example in the following we compute the max in, out, and total degrees:

Collecting Neighbors

In some cases it may be easier to express computation by collecting neighboring vertices and their attributes at each vertex. This can be easily accomplished using the `collectNeighborIds` and the `collectNeighbors` operators.

These operators can be quite costly as they duplicate information and require substantial communication. If possible try expressing the same computation using the `aggregateMessages` operator directly.

Graphx is more faster then Spark naive when graph computation is needed.

Complexity analysis – Big o notation table

Searching

Algorithm Data Structure Time Complexity Space Complexity
Average Worst Worst
Depth First Search (DFS) Graph of |V| vertices and |E| edges `-` `O(|E| + |V|)` `O(|V|)`
Breadth First Search (BFS) Graph of |V| vertices and |E| edges `-` `O(|E| + |V|)` `O(|V|)`
Binary search Sorted array of n elements `O(log(n))` `O(log(n))` `O(1)`
Linear (Brute Force) Array `O(n)` `O(n)` `O(1)`
Shortest path by Dijkstra,
using a Min-heap as priority queue
Graph with |V| vertices and |E| edges `O((|V| + |E|) log |V|)` `O((|V| + |E|) log |V|)` `O(|V|)`
Shortest path by Dijkstra,
using an unsorted array as priority queue
Graph with |V| vertices and |E| edges `O(|V|^2)` `O(|V|^2)` `O(|V|)`
Shortest path by Bellman-Ford Graph with |V| vertices and |E| edges `O(|V||E|)` `O(|V||E|)` `O(|V|)`

Sorting

Algorithm Data Structure Time Complexity Worst Case Auxiliary Space Complexity
Best Average Worst Worst
Quicksort Array `O(n log(n))` `O(n log(n))` `O(n^2)` `O(log(n))`
Mergesort Array `O(n log(n))` `O(n log(n))` `O(n log(n))` `O(n)`
Heapsort Array `O(n log(n))` `O(n log(n))` `O(n log(n))` `O(1)`
Bubble Sort Array `O(n)` `O(n^2)` `O(n^2)` `O(1)`
Insertion Sort Array `O(n)` `O(n^2)` `O(n^2)` `O(1)`
Select Sort Array `O(n^2)` `O(n^2)` `O(n^2)` `O(1)`
Bucket Sort Array `O(n+k)` `O(n+k)` `O(n^2)` `O(nk)`
Radix Sort Array `O(nk)` `O(nk)` `O(nk)` `O(n+k)`

Data Structures

Data Structure Time Complexity Space Complexity
Average Worst Worst
Indexing Search Insertion Deletion Indexing Search Insertion Deletion
Basic Array `O(1)` `O(n)` `-` `-` `O(1)` `O(n)` `-` `-` `O(n)`
Dynamic Array `O(1)` `O(n)` `O(n)` `-` `O(1)` `O(n)` `O(n)` `-` `O(n)`
Singly-Linked List `O(n)` `O(n)` `O(1)` `O(1)` `O(n)` `O(n)` `O(1)` `O(1)` `O(n)`
Doubly-Linked List `O(n)` `O(n)` `O(1)` `O(1)` `O(n)` `O(n)` `O(1)` `O(1)` `O(n)`
Skip List `O(n)` `O(log(n))` `O(log(n))` `O(log(n))` `O(n)` `O(n)` `O(n)` `O(n)` `O(n log(n))`
Hash Table `-` `O(1)` `O(1)` `O(1)` `-` `O(n)` `O(n)` `O(n)` `O(n)`
Binary Search Tree `-` `O(log(n))` `O(log(n))` `O(log(n))` `-` `O(n)` `O(n)` `O(n)` `O(n)`
B-Tree `-` `O(log(n))` `O(log(n))` `O(log(n))` `-` `O(log(n))` `O(log(n))` `O(log(n))` `O(n)`
Red-Black Tree `-` `O(log(n))` `O(log(n))` `O(log(n))` `-` `O(log(n))` `O(log(n))` `O(log(n))` `O(n)`
AVL Tree `-` `O(log(n))` `O(log(n))` `O(log(n))` `-` `O(log(n))` `O(log(n))` `O(log(n))` `O(n)`

Heaps

Heaps Time Complexity
Heapify Find Max Extract Max Increase Key Insert Delete Merge
Linked List (sorted) `-` `O(1)` `O(1)` `O(n)` `O(n)` `O(1)` `O(m+n)`
Linked List (unsorted) `-` `O(n)` `O(n)` `O(1)` `O(1)` `O(1)` `O(1)`
Binary Heap `O(log(n))` `O(1)` `O(log(n))` `O(log(n))` `O(log(n))` `O(log(n))` `O(m+n)`
Binomial Heap `-` `O(log(n))` `O(log(n))` `O(log(n))` `O(log(n))` `O(log(n))` `O(log(n))`
Fibonacci Heap `-` `O(1)` `O(log(n))*` `O(1)*` `O(1)` `O(log(n))*` `O(1)`

Graphs

Node / Edge Management Storage Add Vertex Add Edge Remove Vertex Remove Edge Query
Adjacency list `O(|V|+|E|)` `O(1)` `O(1)` `O(|V| + |E|)` `O(|E|)` `O(|V|)`
Incidence list `O(|V|+|E|)` `O(1)` `O(1)` `O(|E|)` `O(|E|)` `O(|E|)`
Adjacency matrix `O(|V|^2)` `O(|V|^2)` `O(1)` `O(|V|^2)` `O(1)` `O(1)`
Incidence matrix `O(|V| ⋅ |E|)` `O(|V| ⋅ |E|)` `O(|V| ⋅ |E|)` `O(|V| ⋅ |E|)` `O(|V| ⋅ |E|)` `O(|E|)`

Reference :

http://sandbox.runjs.cn/show/vsr4wsy7