 Find the 2 missing number from an array of N element given N2 element
Scala Code :

def findMissingNums(datax:Array[Int], len:Integer) = { val total = datax.sum val actual_total = (1 to len).sum val diff = actual_total  total var i = diff var firstElem=0 var secondElem=0 while (i > 0) { if (!datax.contains(i) && diff < len) { firstElem = i secondElem = diff  firstElem i = 0 } else if(!datax.contains(i) && diff > len){ if(i>=diff){ } else{ firstElem = i secondElem = diff  firstElem } } i = i  1 } println(firstElem+","+secondElem) }
2. Swap the 2 numbers without using temp variable
def swapNum(num1:Int,num2:Int){
var number1=num1
var number2=num2
number1 = number2 – number1
number2 = number2 – number1
number1 = number1 + number2
println(“num1 = “+number1+”, num2 = “+number2)}
Category: algorithm
Priority Queue – Data Structure
We know that Queue
follows FirstInFirstOut model but sometimes we need to process the objects in the queue based on the priority. That is when JavaPriorityQueue
is used.
For example, let’s say we have an application that generates stocks reports for daily trading session. This application processes a lot of data and takes time to process it. So customers are sending request to the application that is actually getting queued but we want to process premium customers first and standard customers after them. So in this case PriorityQueue implementation in java can be really helpful.
PriorityQueue is an unbounded queue based on a priority heap and the elements of the priority queue are ordered by default in natural order. We can provide a Comparator for ordering at the time of instantiation of priority queue.
Java Priority Queue doesn’t allow null
values and we can’t create PriorityQueue of Objects that are noncomparable. We use java Comparable and Comparator for sorting Objects and Priority Queue use them for priority processing of it’s elements.
The simplest way to implement a priority queue data type is to keep an associative array mapping each priority to a list of elements with that priority. If association lists or hash tables are used to implement the associative array, adding an element takes constant time but removing or peeking at the element of highest priority takes linear (O(n)) time, because we must search all keys for the largest one. If a selfbalancing binary search tree is used, all three operations take O(log n) time; this is a popular solution in environments that already provide balanced trees but nothing more sophisticated.
There are a number of specialized heap data structures that either supply additional operations or outperform the above approaches. The binary heap uses O(log n) time for both operations, but allows peeking at the element of highest priority without removing it in constant time. Binomial heaps add several more operations, but require O(log n) time for peeking. Fibonacci heaps can insert elements, peek at the maximum priority element, and decrease an element’s priority in amortized constant time (deletions are still O(log n)).
// BinaryHeap class
//
// CONSTRUCTION: empty or with initial array.
//
// ******************PUBLIC OPERATIONS*********************
// void insert( x ) –> Insert x
// Comparable deleteMin( )–> Return and remove smallest item
// Comparable findMin( ) –> Return smallest item
// boolean isEmpty( ) –> Return true if empty; else false
// void makeEmpty( ) –> Remove all items
// ******************ERRORS********************************
// Throws UnderflowException for findMin and deleteMin when empty
/**
* Implements a binary heap.
* Note that all “matching” is based on the compareTo method.
* @author Bhavesh Gadoya
*/
public class BinaryHeap implements PriorityQueue {
/**
* Construct the binary heap.
*/
public BinaryHeap( ) {
currentSize = 0;
array = new Comparable[ DEFAULT_CAPACITY + 1 ];
}
/**
* Construct the binary heap from an array.
* @param items the inital items in the binary heap.
*/
public BinaryHeap( Comparable [ ] items ) {
currentSize = items.length;
array = new Comparable[ items.length + 1 ];
for( int i = 0; i < items.length; i++ )
array[ i + 1 ] = items[ i ];
buildHeap( );
}
/**
* Insert into the priority queue.
* Duplicates are allowed.
* @param x the item to insert.
* @return null, signifying that decreaseKey cannot be used.
*/
public PriorityQueue.Position insert( Comparable x ) {
if( currentSize + 1 == array.length )
doubleArray( );
// Percolate up
int hole = ++currentSize;
array[ 0 ] = x;
for( ; x.compareTo( array[ hole / 2 ] ) < 0; hole /= 2 )
array[ hole ] = array[ hole / 2 ];
array[ hole ] = x;
return null;
}
/**
* @throws UnsupportedOperationException because no Positions are returned
* by the insert method for BinaryHeap.
*/
public void decreaseKey( PriorityQueue.Position p, Comparable newVal ) {
throw new UnsupportedOperationException( “Cannot use decreaseKey for binary heap” );
}
/**
* Find the smallest item in the priority queue.
* @return the smallest item.
* @throws UnderflowException if empty.
*/
public Comparable findMin( ) {
if( isEmpty( ) )
throw new UnderflowException( “Empty binary heap” );
return array[ 1 ];
}
/**
* Remove the smallest item from the priority queue.
* @return the smallest item.
* @throws UnderflowException if empty.
*/
public Comparable deleteMin( ) {
Comparable minItem = findMin( );
array[ 1 ] = array[ currentSize– ];
percolateDown( 1 );
return minItem;
}
/**
* Establish heap order property from an arbitrary
* arrangement of items. Runs in linear time.
*/
private void buildHeap( ) {
for( int i = currentSize / 2; i > 0; i– )
percolateDown( i );
}
/**
* Test if the priority queue is logically empty.
* @return true if empty, false otherwise.
*/
public boolean isEmpty( ) {
return currentSize == 0;
}
/**
* Returns size.
* @return current size.
*/
public int size( ) {
return currentSize;
}
/**
* Make the priority queue logically empty.
*/
public void makeEmpty( ) {
currentSize = 0;
}
private static final int DEFAULT_CAPACITY = 100;
private int currentSize; // Number of elements in heap
private Comparable [ ] array; // The heap array
/**
* Internal method to percolate down in the heap.
* @param hole the index at which the percolate begins.
*/
private void percolateDown( int hole ) {
int child;
Comparable tmp = array[ hole ];
for( ; hole * 2 <= currentSize; hole = child ) {
child = hole * 2;
if( child != currentSize &&
array[ child + 1 ].compareTo( array[ child ] ) < 0 )
child++;
if( array[ child ].compareTo( tmp ) < 0 )
array[ hole ] = array[ child ];
else
break;
}
array[ hole ] = tmp;
}
/**
* Internal method to extend array.
*/
private void doubleArray( ) {
Comparable [ ] newArray;
newArray = new Comparable[ array.length * 2 ];
for( int i = 0; i < array.length; i++ )
newArray[ i ] = array[ i ];
array = newArray;
}
// Test program
public static void main( String [ ] args ) {
int numItems = 10000;
BinaryHeap h1 = new BinaryHeap( );
Integer [ ] items = new Integer[ numItems – 1 ];
int i = 37;
int j;
for( i = 37, j = 0; i != 0; i = ( i + 37 ) % numItems, j++ ) {
h1.insert( new Integer( i ) );
items[ j ] = new Integer( i );
}
for( i = 1; i < numItems; i++ )
if( ((Integer)( h1.deleteMin( ) )).intValue( ) != i )
System.out.println( “Oops! ” + i );
BinaryHeap h2 = new BinaryHeap( items );
for( i = 1; i < numItems; i++ )
if( ((Integer)( h2.deleteMin( ) )).intValue( ) != i )
System.out.println( “Oops! ” + i );
}
}
// PriorityQueue interface
//
// ******************PUBLIC OPERATIONS*********************
// Position insert( x ) –> Insert x
// Comparable deleteMin( )–> Return and remove smallest item
// Comparable findMin( ) –> Return smallest item
// boolean isEmpty( ) –> Return true if empty; else false
// void makeEmpty( ) –> Remove all items
// int size( ) –> Return size
// void decreaseKey( p, v)–> Decrease value in p to v
// ******************ERRORS********************************
// Throws UnderflowException for findMin and deleteMin when empty
/**
* PriorityQueue interface.
* Some priority queues may support a decreaseKey operation,
* but this is considered an advanced operation. If so,
* a Position is returned by insert.
* Note that all “matching” is based on the compareTo method.
* @author Bhavesh Gadoya
*/
public interface PriorityQueue {
/**
* The Position interface represents a type that can
* be used for the decreaseKey operation.
*/
public interface Position {
/**
* Returns the value stored at this position.
* @return the value stored at this position.
*/
Comparable getValue( );
}
/**
* Insert into the priority queue, maintaining heap order.
* Duplicates are allowed.
* @param x the item to insert.
* @return may return a Position useful for decreaseKey.
*/
Position insert( Comparable x );
/**
* Find the smallest item in the priority queue.
* @return the smallest item.
* @throws UnderflowException if empty.
*/
Comparable findMin( );
/**
* Remove the smallest item from the priority queue.
* @return the smallest item.
* @throws UnderflowException if empty.
*/
Comparable deleteMin( );
/**
* Test if the priority queue is logically empty.
* @return true if empty, false otherwise.
*/
boolean isEmpty( );
/**
* Make the priority queue logically empty.
*/
void makeEmpty( );
/**
* Returns the size.
* @return current size.
*/
int size( );
/**
* Change the value of the item stored in the pairing heap.
* This is considered an advanced operation and might not
* be supported by all priority queues. A priority queue
* will signal its intention to not support decreaseKey by
* having insert return null consistently.
* @param p any nonnull Position returned by insert.
* @param newVal the new value, which must be smaller
* than the currently stored value.
* @throws IllegalArgumentException if p invalid.
* @throws UnsupportedOperationException if appropriate.
*/
void decreaseKey( Position p, Comparable newVal );
}
/**
* Exception class for access in empty containers
* such as stacks, queues, and priority queues.
* @author Bhavesh Gadoya
*/
public class UnderflowException extends RuntimeException {
/**
* Construct this exception object.
* @param message the error message.
*/
public UnderflowException( String message ) {
super( message );
}
}
Inbuilt java implementation of priorityQueue :
package com.journaldev.collections; public class Customer { private int id; private String name; public Customer(int i, String n){ this.id=i; this.name=n; } public int getId() { return id; } public String getName() { return name; } }
We will use java random number generation to generate random customer objects. For natural ordering, I will use Integer that is also a java wrapper class.
Here is our final test code that shows how to use priority queue in java.
package com.journaldev.collections; import java.util.Comparator; import java.util.PriorityQueue; import java.util.Queue; import java.util.Random; public class PriorityQueueExample { public static void main(String[] args) { //natural ordering example of priority queue Queue<Integer> integerPriorityQueue = new PriorityQueue<>(7); Random rand = new Random(); for(int i=0;i<7;i++){ integerPriorityQueue.add(new Integer(rand.nextInt(100))); } for(int i=0;i<7;i++){ Integer in = integerPriorityQueue.poll(); System.out.println("Processing Integer:"+in); } //PriorityQueue example with Comparator Queue<Customer> customerPriorityQueue = new PriorityQueue<>(7, idComparator); addDataToQueue(customerPriorityQueue); pollDataFromQueue(customerPriorityQueue); } //Comparator anonymous class implementation public static Comparator<Customer> idComparator = new Comparator<Customer>(){ @Override public int compare(Customer c1, Customer c2) { return (int) (c1.getId()  c2.getId()); } }; //utility method to add random data to Queue private static void addDataToQueue(Queue<Customer> customerPriorityQueue) { Random rand = new Random(); for(int i=0; i<7; i++){ int id = rand.nextInt(100); customerPriorityQueue.add(new Customer(id, "Pankaj "+id)); } } //utility method to poll data from queue private static void pollDataFromQueue(Queue<Customer> customerPriorityQueue) { while(true){ Customer cust = customerPriorityQueue.poll(); if(cust == null) break; System.out.println("Processing Customer with ID="+cust.getId()); } } }
Apache Graphx
GraphX is a new component in Spark for graphs and graphparallel computation. At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge. To support graph computation, GraphX exposes a set of fundamental operators
Spark GraphX is a graph processing framework built on top of Spark.
GraphX models graphs as property graphs where vertices and edges can have properties.
GraphX comes with its own package org.apache.spark.graphx
.
Graph
Graph
abstract class represents a collection of vertices
and edges
.
abstract class Graph[VD: ClassTag, ED: ClassTag]
vertices
attribute is of type VertexRDD
while edges
is of type EdgeRDD
.
Standard GraphX API
Graph
class comes with a small set of API.

Transformations

mapVertices

mapEdges

mapTriplets

reverse

subgraph
mask

groupEdges


Joins

outerJoinVertices


Computation

aggregateMessages

Creating Graphs (Graph object)
Graph
object comes with the following factory methods to create instances of Graph
:

fromEdgeTuples

fromEdges

apply
Package summery for apache graphx
https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/graphx/packagesummary.html
Class  Description 

Edge 
A single directed edge consisting of a source id, target id, and the data associated with the edge.

EdgeContext 
Represents an edge along with its neighboring vertices and allows sending messages along the edge.

EdgeDirection 
The direction of a directed edge relative to a vertex.

EdgeRDD 
EdgeRDD[ED, VD] extends RDD[Edge[ED} by storing the edges in columnar format on each partition for performance. 
EdgeTriplet 
An edge triplet represents an edge along with the vertex attributes of its neighboring vertices.

Graph 
The Graph abstractly represents a graph with arbitrary objects associated with vertices and edges.

GraphKryoRegistrator 
Registers GraphX classes with Kryo for improved performance.

GraphLoader 
Provides utilities for loading
Graph s from files. 
GraphOps 
Contains additional functionality for
Graph . 
GraphXUtils  
PartitionStrategy.CanonicalRandomVertexCut$ 
Assigns edges to partitions by hashing the source and destination vertex IDs in a canonical direction, resulting in a random vertex cut that colocates all edges between two vertices, regardless of direction.

PartitionStrategy.EdgePartition1D$ 
Assigns edges to partitions using only the source vertex ID, colocating edges with the same source.

PartitionStrategy.EdgePartition2D$ 
Assigns edges to partitions using a 2D partitioning of the sparse edge adjacency matrix, guaranteeing a
2 * sqrt(numParts)  1 bound on vertex replication. 
PartitionStrategy.RandomVertexCut$ 
Assigns edges to partitions by hashing the source and destination vertex IDs, resulting in a random vertex cut that colocates all samedirection edges between two vertices.

Pregel 
Implements a Pregellike bulksynchronous messagepassing API.

TripletFields 
Represents a subset of the fields of an [[EdgeTriplet]] or [[EdgeContext]].

VertexRDD 
Extends
RDD[(VertexId, VD)] by ensuring that there is only one entry for each vertex and by preindexing the entries for fast, efficient joins. 
Example Property Graph
Suppose we want to construct a property graph consisting of the various collaborators on the GraphX project. The vertex property might contain the username and occupation. We could annotate edges with a string describing the relationships between collaborators:
The resulting graph would have the type signature:
There are numerous ways to construct a property graph from raw files, RDDs, and even synthetic generators and these are discussed in more detail in the section on graph builders. Probably the most general method is to use the Graph object. For example the following code constructs a graph from a collection of RDDs:
In the above example we make use of the Edge
case class. Edges have a srcId
and a dstId
corresponding to the source and destination vertex identifiers. In addition, the Edge
class has an attr
member which stores the edge property.
We can deconstruct a graph into the respective vertex and edge views by using the graph.vertices
and graph.edges
members respectively.
Note that
graph.vertices
returns anVertexRDD[(String, String)]
which extendsRDD[(VertexId, (String, String))]
and so we use the scalacase
expression to deconstruct the tuple. On the other hand,graph.edges
returns anEdgeRDD
containingEdge[String]
objects. We could have also used the case class type constructor as in the following:
In addition to the vertex and edge views of the property graph, GraphX also exposes a triplet view. The triplet view logically joins the vertex and edge properties yielding an RDD[EdgeTriplet[VD, ED]]
containing instances of the EdgeTriplet
class. This join can be expressed in the following SQL expression:
or graphically as:
The EdgeTriplet
class extends the Edge
class by adding the srcAttr
and dstAttr
members which contain the source and destination properties respectively. We can use the triplet view of a graph to render a collection of strings describing relationships between users.
Graph Operators
Just as RDDs have basic operations like map
, filter
, and reduceByKey
, property graphs also have a collection of basic operators that take user defined functions and produce new graphs with transformed properties and structure. The core operators that have optimized implementations are defined in Graph
and convenient operators that are expressed as a compositions of the core operators are defined in GraphOps
. However, thanks to Scala implicits the operators in GraphOps
are automatically available as members of Graph
. For example, we can compute the indegree of each vertex (defined in GraphOps
) by the following:
The reason for differentiating between core graph operations and GraphOps
is to be able to support different graph representations in the future. Each graph representation must provide implementations of the core operations and reuse many of the useful operations defined in GraphOps
.
Summary List of Operators
The following is a quick summary of the functionality defined in both Graph
and GraphOps
but presented as members of Graph for simplicity. Note that some function signatures have been simplified (e.g., default arguments and type constraints removed) and some more advanced functionality has been removed so please consult the API docs for the official list of operations.
Property Operators
Like the RDD map
operator, the property graph contains the following:
Each of these operators yields a new graph with the vertex or edge properties modified by the user defined map
function.
Note that in each case the graph structure is unaffected. This is a key feature of these operators which allows the resulting graph to reuse the structural indices of the original graph. The following snippets are logically equivalent, but the first one does not preserve the structural indices and would not benefit from the GraphX system optimizations:
Instead, use
mapVertices
to preserve the indices:
These operators are often used to initialize the graph for a particular computation or project away unnecessary properties. For example, given a graph with the out degrees as the vertex properties (we describe how to construct such a graph later), we initialize it for PageRank:
Structural Operators
Currently GraphX supports only a simple set of commonly used structural operators and we expect to add more in the future. The following is a list of the basic structural operators.
The reverse
operator returns a new graph with all the edge directions reversed. This can be useful when, for example, trying to compute the inverse PageRank. Because the reverse operation does not modify vertex or edge properties or change the number of edges, it can be implemented efficiently without data movement or duplication.
The subgraph
operator takes vertex and edge predicates and returns the graph containing only the vertices that satisfy the vertex predicate (evaluate to true) and edges that satisfy the edge predicate and connect vertices that satisfy the vertex predicate. The subgraph
operator can be used in number of situations to restrict the graph to the vertices and edges of interest or eliminate broken links. For example in the following code we remove broken links:
Note in the above example only the vertex predicate is provided. The
subgraph
operator defaults totrue
if the vertex or edge predicates are not provided.
The mask
operator constructs a subgraph by returning a graph that contains the vertices and edges that are also found in the input graph. This can be used in conjunction with the subgraph
operator to restrict a graph based on the properties in another related graph. For example, we might run connected components using the graph with missing vertices and then restrict the answer to the valid subgraph.
The groupEdges
operator merges parallel edges (i.e., duplicate edges between pairs of vertices) in the multigraph. In many numerical applications, parallel edges can be added (their weights combined) into a single edge thereby reducing the size of the graph.
Join Operators
In many cases it is necessary to join data from external collections (RDDs) with graphs. For example, we might have extra user properties that we want to merge with an existing graph or we might want to pull vertex properties from one graph into another. These tasks can be accomplished using the join operators. Below we list the key join operators:
The joinVertices
operator joins the vertices with the input RDD and returns a new graph with the vertex properties obtained by applying the user defined map
function to the result of the joined vertices. Vertices without a matching value in the RDD retain their original value.
Note that if the RDD contains more than one value for a given vertex only one will be used. It is therefore recommended that the input RDD be made unique using the following which will also preindex the resulting values to substantially accelerate the subsequent join.
The more general outerJoinVertices
behaves similarly to joinVertices
except that the user defined map
function is applied to all vertices and can change the vertex property type. Because not all vertices may have a matching value in the input RDD the map
function takes an Option
type. For example, we can setup a graph for PageRank by initializing vertex properties with their outDegree
.
You may have noticed the multiple parameter lists (e.g.,
f(a)(b)
) curried function pattern used in the above examples. While we could have equally writtenf(a)(b)
asf(a,b)
this would mean that type inference onb
would not depend ona
. As a consequence, the user would need to provide type annotation for the user defined function:
Neighborhood Aggregation
A key step in many graph analytics tasks is aggregating information about the neighborhood of each vertex. For example, we might want to know the number of followers each user has or the average age of the the followers of each user. Many iterative graph algorithms (e.g., PageRank, Shortest Path, and connected components) repeatedly aggregate properties of neighboring vertices (e.g., current PageRank Value, shortest path to the source, and smallest reachable vertex id).
To improve performance the primary aggregation operator changed from
graph.mapReduceTriplets
to the newgraph.AggregateMessages
. While the changes in the API are relatively small, we provide a transition guide below.
Aggregate Messages (aggregateMessages)
The core aggregation operation in GraphX is aggregateMessages
. This operator applies a user defined sendMsg
function to each edge triplet in the graph and then uses the mergeMsg
function to aggregate those messages at their destination vertex.
The user defined sendMsg
function takes an EdgeContext
, which exposes the source and destination attributes along with the edge attribute and functions (sendToSrc
, and sendToDst
) to send messages to the source and destination attributes. Think of sendMsg
as the map function in mapreduce. The user defined mergeMsg
function takes two messages destined to the same vertex and yields a single message. Think of mergeMsg
as the reduce function in mapreduce. The aggregateMessages
operator returns a VertexRDD[Msg]
containing the aggregate message (of type Msg
) destined to each vertex. Vertices that did not receive a message are not included in the returned VertexRDD
VertexRDD.
In addition, aggregateMessages
takes an optional tripletsFields
which indicates what data is accessed in the EdgeContext
(i.e., the source vertex attribute but not the destination vertex attribute). The possible options for the tripletsFields
are defined in TripletFields
and the default value is TripletFields.All
which indicates that the user defined sendMsg
function may access any of the fields in the EdgeContext
. ThetripletFields
argument can be used to notify GraphX that only part of the EdgeContext
will be needed allowing GraphX to select an optimized join strategy. For example if we are computing the average age of the followers of each user we would only require the source field and so we would use TripletFields.Src
to indicate that we only require the source field
In earlier versions of GraphX we used byte code inspection to infer the
TripletFields
however we have found that bytecode inspection to be slightly unreliable and instead opted for more explicit user control.
In the following example we use the aggregateMessages
operator to compute the average age of the more senior followers of each user.
import org.apache.spark.graphx.{Graph, VertexRDD}
import org.apache.spark.graphx.util.GraphGenerators
// Create a graph with "age" as the vertex property.
// Here we use a random graph for simplicity.
val graph: Graph[Double, Int] =
GraphGenerators.logNormalGraph(sc, numVertices = 100).mapVertices( (id, _) => id.toDouble )
// Compute the number of older followers and their total age
val olderFollowers: VertexRDD[(Int, Double)] = graph.aggregateMessages[(Int, Double)](
triplet => { // Map Function
if (triplet.srcAttr > triplet.dstAttr) {
// Send message to destination vertex containing counter and age
triplet.sendToDst(1, triplet.srcAttr)
}
},
// Add counter and age
(a, b) => (a._1 + b._1, a._2 + b._2) // Reduce Function
)
// Divide total age by number of older followers to get average age of older followers
val avgAgeOfOlderFollowers: VertexRDD[Double] =
olderFollowers.mapValues( (id, value) =>
value match { case (count, totalAge) => totalAge / count } )
// Display the results
avgAgeOfOlderFollowers.collect.foreach(println(_))
The
aggregateMessages
operation performs optimally when the messages (and the sums of messages) are constant sized (e.g., floats and addition instead of lists and concatenation).
Map Reduce Triplets Transition Guide (Legacy)
In earlier versions of GraphX neighborhood aggregation was accomplished using the mapReduceTriplets
operator:
The mapReduceTriplets
operator takes a user defined map function which is applied to each triplet and can yield messages which are aggregated using the user defined reduce
function. However, we found the user of the returned iterator to be expensive and it inhibited our ability to apply additional optimizations (e.g., local vertex renumbering). In aggregateMessages
we introduced the EdgeContext which exposes the triplet fields and also functions to explicitly send messages to the source and destination vertex. Furthermore we removed bytecode inspection and instead require the user to indicate what fields in the triplet are actually required.
The following code block using mapReduceTriplets
:
can be rewritten using aggregateMessages
as:
Computing Degree Information
A common aggregation task is computing the degree of each vertex: the number of edges adjacent to each vertex. In the context of directed graphs it is often necessary to know the indegree, outdegree, and the total degree of each vertex. The GraphOps
class contains a collection of operators to compute the degrees of each vertex. For example in the following we compute the max in, out, and total degrees:
Collecting Neighbors
In some cases it may be easier to express computation by collecting neighboring vertices and their attributes at each vertex. This can be easily accomplished using the collectNeighborIds
and the collectNeighbors
operators.
These operators can be quite costly as they duplicate information and require substantial communication. If possible try expressing the same computation using the
aggregateMessages
operator directly.
Graphx is more faster then Spark naive when graph computation is needed.
Complexity analysis – Big o notation table
Searching
Algorithm  Data Structure  Time Complexity  Space Complexity  

Average  Worst  Worst  
Depth First Search (DFS)  Graph of V vertices and E edges   
O(E + V) 
O(V) 

Breadth First Search (BFS)  Graph of V vertices and E edges   
O(E + V) 
O(V) 

Binary search  Sorted array of n elements  O(log(n)) 
O(log(n)) 
O(1) 

Linear (Brute Force)  Array  O(n) 
O(n) 
O(1) 

Shortest path by Dijkstra, using a Minheap as priority queue 
Graph with V vertices and E edges  O((V + E) log V) 
O((V + E) log V) 
O(V) 

Shortest path by Dijkstra, using an unsorted array as priority queue 
Graph with V vertices and E edges  O(V^2) 
O(V^2) 
O(V) 

Shortest path by BellmanFord  Graph with V vertices and E edges  O(VE) 
O(VE) 
O(V) 
Sorting
Algorithm  Data Structure  Time Complexity  Worst Case Auxiliary Space Complexity  

Best  Average  Worst  Worst  
Quicksort  Array  O(n log(n)) 
O(n log(n)) 
O(n^2) 
O(log(n)) 

Mergesort  Array  O(n log(n)) 
O(n log(n)) 
O(n log(n)) 
O(n) 

Heapsort  Array  O(n log(n)) 
O(n log(n)) 
O(n log(n)) 
O(1) 

Bubble Sort  Array  O(n) 
O(n^2) 
O(n^2) 
O(1) 

Insertion Sort  Array  O(n) 
O(n^2) 
O(n^2) 
O(1) 

Select Sort  Array  O(n^2) 
O(n^2) 
O(n^2) 
O(1) 

Bucket Sort  Array  O(n+k) 
O(n+k) 
O(n^2) 
O(nk) 

Radix Sort  Array  O(nk) 
O(nk) 
O(nk) 
O(n+k) 
Data Structures
Data Structure  Time Complexity  Space Complexity  

Average  Worst  Worst  
Indexing  Search  Insertion  Deletion  Indexing  Search  Insertion  Deletion  
Basic Array  O(1) 
O(n) 
 
 
O(1) 
O(n) 
 
 
O(n) 
Dynamic Array  O(1) 
O(n) 
O(n) 
 
O(1) 
O(n) 
O(n) 
 
O(n) 
SinglyLinked List  O(n) 
O(n) 
O(1) 
O(1) 
O(n) 
O(n) 
O(1) 
O(1) 
O(n) 
DoublyLinked List  O(n) 
O(n) 
O(1) 
O(1) 
O(n) 
O(n) 
O(1) 
O(1) 
O(n) 
Skip List  O(n) 
O(log(n)) 
O(log(n)) 
O(log(n)) 
O(n) 
O(n) 
O(n) 
O(n) 
O(n log(n)) 
Hash Table   
O(1) 
O(1) 
O(1) 
 
O(n) 
O(n) 
O(n) 
O(n) 
Binary Search Tree   
O(log(n)) 
O(log(n)) 
O(log(n)) 
 
O(n) 
O(n) 
O(n) 
O(n) 
BTree   
O(log(n)) 
O(log(n)) 
O(log(n)) 
 
O(log(n)) 
O(log(n)) 
O(log(n)) 
O(n) 
RedBlack Tree   
O(log(n)) 
O(log(n)) 
O(log(n)) 
 
O(log(n)) 
O(log(n)) 
O(log(n)) 
O(n) 
AVL Tree   
O(log(n)) 
O(log(n)) 
O(log(n)) 
 
O(log(n)) 
O(log(n)) 
O(log(n)) 
O(n) 
Heaps
Heaps  Time Complexity  

Heapify  Find Max  Extract Max  Increase Key  Insert  Delete  Merge  
Linked List (sorted)   
O(1) 
O(1) 
O(n) 
O(n) 
O(1) 
O(m+n) 

Linked List (unsorted)   
O(n) 
O(n) 
O(1) 
O(1) 
O(1) 
O(1) 

Binary Heap  O(log(n)) 
O(1) 
O(log(n)) 
O(log(n)) 
O(log(n)) 
O(log(n)) 
O(m+n) 

Binomial Heap   
O(log(n)) 
O(log(n)) 
O(log(n)) 
O(log(n)) 
O(log(n)) 
O(log(n)) 

Fibonacci Heap   
O(1) 
O(log(n))* 
O(1)* 
O(1) 
O(log(n))* 
O(1) 
Graphs
Node / Edge Management  Storage  Add Vertex  Add Edge  Remove Vertex  Remove Edge  Query 

Adjacency list  O(V+E) 
O(1) 
O(1) 
O(V + E) 
O(E) 
O(V) 
Incidence list  O(V+E) 
O(1) 
O(1) 
O(E) 
O(E) 
O(E) 
Adjacency matrix  O(V^2) 
O(V^2) 
O(1) 
O(V^2) 
O(1) 
O(1) 
Incidence matrix  O(V ⋅ E) 
O(V ⋅ E) 
O(V ⋅ E) 
O(V ⋅ E) 
O(V ⋅ E) 
O(E) 
Reference :