Scala collections come into 2 categories mutable & immutable collections. Scala’s core power is the collection framework.. let see it’s diagram below.
We know that Queue follows First-In-First-Out model but sometimes we need to process the objects in the queue based on the priority. That is when JavaPriorityQueue is used. For example, let’s say we have an application that generates stocks reports for daily trading session. This application processes a lot of data and takes time to … More Priority Queue – Data Structure
GraphX is a new component in Spark for graphs and graph-parallel computation. At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge. To support graph computation, GraphX exposes a set of fundamental operators Spark GraphX is a graph processing … More Apache Graphx
CueSheet is a framework for writing Apache Spark 2.x applications more conveniently, designed to neatly separate the concerns of the business logic and the deployment environment, as well as to minimize the usage of shell scripts which are inconvenient to write and do not support validation. To jump-start, check out cuesheet-starter-kit which provides the skeleton … More CueSheet – Easy spark application deployment guide
Searching Algorithm Data Structure Time Complexity Space Complexity Average Worst Worst Depth First Search (DFS) Graph of |V| vertices and |E| edges – O(|E| + |V|) O(|V|) Breadth First Search (BFS) Graph of |V| vertices and |E| edges – O(|E| + |V|) O(|V|) Binary search Sorted array of n elements O(log(n)) O(log(n)) O(1) Linear (Brute … More Complexity analysis – Big o notation table
In Hadoop, partitioning a data allows processing of huge volume of data in parallel such that it takes minimum amount of time to process entire dataset. Apache spark decides partitioning based on different factors. Factor that decide default partitioning On hadoop split by HDFS cores. Filter or map function don’t change partitioning Number of … More Re-partitioning & partition in spark
Asynchronous programming is very popular these days, primarily because of its ability to improve the overall throughput on a multi-core system. Asynchronous programming is a programming paradigm that facilitates fast and responsive user interfaces. The asynchronous programming model in Java provides a consistent programming model to write programs that support asynchrony. Asynchronous programming provides a … More Asynchronous processing in java