Abstract- can deal with structured semi structured and unstructured

Abstract- The rapid growth
in data volume, complexity, variety and velocity of data in organizations, need
for handling unstructured data is increasing continuously.  NoSQL databases are well suited in dealing
with big data applications.  The enormous
amount of data generated on web is highly unstructured in nature.  Relational database are designed to manage
structured data and is not capable of managing unstructured data and high data
volume.  This paper presents comparative analysis of an
Oracle Database and NoSQL document oriented database management system –
MongoDB.  The comparison depicts key
features, theoretical differences, restrictions and focuses on basic CRUD operations in MogoDB

 

Key Words- Big data, NoSQL, MongoDB,
RDBMS, crud

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

 

I.       
Introduction

The term NoSQL was first introduced by
Carlo Strozzi in year 1998.  NoSQL stands
for “Not Only SQL”.  The rapid growth of data
and having massive amount of data that comes out every day from the web and business
applications become hard to handle for RDBMS. 
This has added interest to alternatives to RDBMS.  NoSQL databases are defined as distributed,
horizontally scalable and open source. 5

 

Relational database management systems
define fixed schema and data is inserted strictly according to schema.  NoSQL databases are built to allow the
insertion of data without predefined schema, which makes it easy to make
significant application changes in real time and makes development faster.  NoSQL databases are high performance,
scalable systems 1.  It is difficult to
handle both the size of data and concurrent actions on data within standard
RDBMS.  Some of the reasons to employ
NoSQL technique are scalability, high availability; distribute architecture
support, flexible schema, varied data structure, fault tolerance and
consistency. 

 

MongoDB is an open source project held
by the 10gen.company. It is a document-oriented, schema-less database, which
stores data in BSON (Binary JSON) format. 
MongoDB can deal with structured semi structured and unstructured data
unlike RDBMS. MongoDB documents can vary in structure. Fields can vary from
document to document. Similar documents are stored in collections. Here, collection
corresponds to a table and document corresponds to a record.
MongoDB can add, remove or change a field for a document without affecting
other documents in the same collection. This saves the expensive ALTER table
operations that can lead to redesigning the entire set of schemas and the
migration of existing database to the new schema.

 

MongoDB documents hold all data for a
given record in a single document as against relational databases where data
for a single record is spread across different tables. Therefore data in
MongoDB is more localized, which reduces the need to JOIN separate tables 3.
Joins are avoided in MongoDB by embedding documents within the document. The
result is increased performance and scalability as a single read to the
database can retrieve the entire document. MongoDB also provides horizontal
scalability by a technique called Auto sharding and therefore chances of any
node failure are almost nil. Most of the research studies reveal that MongoDB
is much faster than MS SQL in writing (inserts/updates) and reading (retrieval)
1

 

II .   No SQL Databases (Classification)

 

NoSQL
databases are classified as6 –

i.                    
Document
oriented store

ii.                   
Key-value
store

iii.                 
Column
oriented store

iv.                 
Graph
oriented store

 

A.
Document-Oriented

Document-Oriented stores are like
Key-Value stores with the distinction that values are visible and may be
queried. Data formats like JSON or XML are used to store document-oriented
datasets. Document stores give versatile schema therefore there’s no restriction
for documents to possess a similar information or schema. In contrast to
Key-Value store, it offers the indexing and querying based on values.  These databases store their data in form of
documents within the databases. Here the documents are recognized by a unique
set of keys and values that are almost same as there in the Key value
databases. Document Stores Databases are schema free and are variable in
nature.614

 

Other characteristics of
Document-Oriented stores are horizontal scalability and sharding across the
cluster nodes. Examples of some Document- oriented stores are MongoDB, Amazon
DynamoDB, CouchDB, CouchBase, MarkLogic, OrientDB, Rethink DB, Cloudant,
RavenDB and Microsoft Azure DocumentDB 6.

 

B.
Key-Value

Key-Value Stores is a combination of 2
entities: Key and Values. it’s one of the traditional databases that has given
birth to all the other databases of NoSQL. it has a concrete application
programming interface (API) and permits its users to store data in a schemaless
manner. The stored is in 2 parts: key is a unique identifier to a particular
data entry. Key shouldn’t be repeated if one used that it’s not duplicate in
nature. Value is a kind of data that is pointed by a key. 14

 

Key-Value store is the least complex
storage paradigm amongst NoSQL databases. Key-Value Stores give best
performance on basic CRUD (Create, Read, Update and Delete) operations. They
additionally offer scalability and sharding across cluster nodes. Sharding is a
horizontal partitioning technique used to partition great deal of data into
smaller and easily manageable parts/shards. However, Key-Value databases are
less flexible for querying and indexing complex and connected data. Queries for
this category are sometimes based on keys instead of values. Examples of some
Key- value stores are Redis, Memcached, Riak KV, Hazelcast, Ehcached, OrientDB,
Aerospike, Amazon simple db etc.6

 

C.
Column-Oriented

Column oriented databases are also
referred as column family databases. Column oriented stores are feasible once
there is a necessity to handle distributed and huge quantity of data. Column
stores in NoSQL are primarily hybrid row/column store unlike pure relational
column databases. Although it makes use of the columnar extensions but rather
storing data in the tables it stores them in extensively distributed
architecture. Columns are grouped according to the relationship of data. In
column stores, each key is related to one or more attributes (columns). A
Column oriented data storestores its data in such a fashion that it can be aggregated rapidly with
less I/O activity. It focuses on high scalability in data storage. the data is
stored in the sorted sequence of the column family.

 

In the
comparison of row oriented databases, column oriented databases have better
capabilities to manage data and storage space. Horizontal scalability is one in
every of its trending characteristics. Some distinguished examples of column
oriented databases include bloging and event logging etc. examples of
column-oriented stores are Hbase, Accumulo, Hypertable, Google Cloud Bigtable,
Sqrrl, ScyllaDB, MapR-DB614

 

D. Graph-Oriented

Graph
databases evolved from the Graph Theory that is designed to represent entities
and their relationships as nodes and edges respectively. The graph consists of
nodes and edges, where nodes act as the objects and edges act as the
relationship between the objects. Graph databases replace relational tables
with structured relational graphs of interconnected key-value pairings. The
graph also consists of properties related to nodes. It uses a technique
referred to as index free adjacency i.e. each node consists of an immediate
pointer that points to the adjacent node. millions of records can be traversed
using this technique. in a graph database, focus is on the relation established
between data using pointers. Graph databases provides schema less and efficient
storage of semi structured data. The queries are expressed as traversals, thus
creating graph databases quicker than relational databases. it is easy to scale
and whiteboard friendly. Graph databases support ACID axiom and support
rollback14.  As graphs have an
expressive power and strong modeling characteristics therefore each situation
from the real world are often represented as graphs and it is possible to model
in graph database as well. Graph data can be queried more efficient as a result
of intensive joins don’t seem to be essentially needed in graph query
languages. 6

Fig. 1 NoSQL database types

III. COMPARISON -ORACLE AND MONGODB

MongoDB may be a NoSQL management system
discharged in 2009. It stores information as JSON-like documents with dynamic
schemas (the format is named BSON).  
NoSQL may be a category of management system totally different from the
normal relative informationbases therein data isn’t keep victimization mounted
table schemas. primarily its purpose is to function information system for
Brobdingnagian web-scale applications wherever they vanquish ancient relative
databases

MongoDB focussed on four factors: flexibility,
power, speed and simple use.  It supports
classification and it offers multiple programming languages drivers.
information model for MongoDB is schemaless document oreinted wherease Oracle
information supports relative model. Oracle databases possesses a standarnd
search language SQl whereas MongoDB supports API calls.

MongoDB has aggregation functions. A
intrinsic  map-reduce operate are often
wont to mixture giant amounts of information. 
MongoDB accepts larger information. The Oracle information supports most
price size 4KB whereas MongoDB has most price size sixteen MB.  The integrity model utilized by Oracle
information is ACID, whereas MongoDB uses BASE. MongoDB offers consistency,
sturdiness and conditional atomicity. Oracle information provides integrity
options that MongoDB does not offer like: isolation, transactions, denotive
integrity and revision management.  In
manners of distribution each MongoDB and Oracle information ar horizontal
climbable and have support for information replication. whereas MongoDB offers
sharing support, Oracle information does not. each MongoDB and Oracle
information ar cross platform management systems. Oracle information was
written in C++, C and Java, whereas MongoDB was written in C++. MongoDB may be
a software system product, whereas licencence is required to use Oracle
databases.  17.

A.       FEATURES
OF MONGODB

•       MongoDB provides high performance.

•       Has made query language, support all major
CRUD operations, and provides Aggregation options.

•       MongoDB provides High accessibility with
auto- Replication feature. Data is restored through backup (replica) just in
case failure of server.

•       Provides automatic failover mechanism

•       Sharding is major feature due to that
horizontal scalbility is possible.

•       A record in MongoDB may be a document

•      
Holds
collections of documents

B.            ADVANTAGES
OF MONGODB

•       MongoDB simple and extremely easy to install
and setup.

•       MongoDB provides schema-less structure.

•       The document query language supported by
MongoDB plays a significant role in supporting dynamic queries.

•       Very easy to scale.

•       In MongoDB no complex joins are required.
Because data kept in BSON format – key value pair method.

•       It uses internal memory for storage of data due
to this quicker access of data is possible in MongoDB.

•       In MongoDB improvement in performance are
often done easily compared to any relational databases.

•       No need of mapping
the application objects to the data objects.

•      
MongoDB
support Sharding ends up in the horizontal scaling. relative databases support
vertical scaling.

 

Table 1 Comparison of MongoDB and Oracle 14

Key Feature

Oracle

MongoDB

Data Model

Data
Stores in form of tables.  Follow fixed
schema structure.

Follow
Document based model for representing the data. It is schema less and can
handle unstructured data efficiently

Scalability

Providing
both vertical as  well as horizontal scalability

Provide
an effective horizontal scalability

Transaction reliability

follow
ACID rule hence are more reliable

follow
BASE rule

Complexity

More
Complex

Less Complex

Security

Very secure
mechanism

Less Secure

Crash Recovery

Ensure
crash recovery through its ACID properties

depends
on replication as back up to recover from crash.

Cloud

Not
suitable for cloud applications

Suitable for cloud applications

Big Data Handling

Unable
to handle big data problem

Designed
to deal with the Big Data problem effectively.

 

IV . Crud Operations

 

This
section focuses on the basic operations of CRUD. Two databases, one using
Oracle and one in MongoDB are created to compare the way that data will be
created, selected, inserted and deleted in both databases 21.  MongoDB is a fast
responding database management system. If you want a simple database that will
respond very fast, MongoDB is best choice. 
MongoDB support all major CRUD operations, and provides Aggregation
features.  Following are the major CRUD
operations – 

 

Table 2 CRUD Operations

Operations

Oracle

MongoDB

Create Table

CREATE TABLE Accounts (first_name`
VARCHAR(64) NULL , `last_name` VARCHAR(45) NULL , PRIMARY KEY (`id`) );

db.accounts.insert({
name:”abc”, age:26, address:”indore”})

Delete
a Table

Drop table accounts;

db.accounts.drop()

Insert

Insert into accounts( name, age,
address ) VALUES ( “abc”, 26, “indore”)

db.accounts.insert({
name:”abc”, age:26, address:”indore”})

Select

Select * from accounts

db.accounts.find()

Select fields

Select first_name, last_name  from accounts

db.accounts.find({ }, {
first_name: 1, last_name: 1 })

Conditional Select

Select * from Accounts where dep_wid=”D”
and balance>5000

db.accounts.find({dep_wid:”d”,
balance:{$gt:5000}})

Ordered Select ascending

Select * from accounts order
by user_id asc

db.accounts.find({}).sort({user_id
: 1})

Ordered Select descending

Select * from accounts order
by user_id desc

db.accounts.find({}).sort({user_id:
-1 })

Select with count

Select count(*) from users

db.articles.count()

Update

update table student set
section=”F”  where marks