CS301 – Data Structures
___________________________________________________________________
Introduction to Data
Structures
Let’s discuss why we need data structures and what sort of
problems can be solved
with their use. Data structures help us to organize the data in
the computer, resulting
in more efficient programs. An efficient program executes faster
and helps minimize
the usage of resources like memory, disk. Computers are getting
more powerful with
the passage of time with the increase in CPU speed in GHz,
availability of faster
network and the maximization of disk space. Therefore people have
started solving
more and more complex problems. As computer applications are
becoming complex,
so there is need for more resources. This does not mean that we
should buy a new
computer to make the application execute faster. Our effort should
be to ensue that the
solution is achieved with the help of programming, data structures
and algorithm.
What does organizing the data mean? It means that the data should
be arranged in a
way that it is easily accessible. The data is inside the computer
and we want to see it.
We may also perform some calculations on it. Suppose the data
contains some
numbers and the programmer wants to calculate the average,
standard deviation etc.
May be we have a list of names and want to search a particular
name in it. To solve
such problems, data structures and algorithm are used. Sometimes
you may realize
that the application is too slow and taking more time. There are
chances that it may be
due to the data structure used, not due to the CPU speed and
memory. We will see
such examples. In the assignments, you will also check whether the
data structure in
the program is beneficial or not. You may have two data structures
and try to decide
which one is more suitable for the resolution of the problem.
As discussed earlier, a solution is said to be efficient if it
solves the problem within its
resource constraints. What does it mean? In the computer, we have
hard disk, memory
and other hardware. Secondly we have time. Suppose you have some
program that
solves the problem but takes two months. It will be of no use.
Usually, you don’t have
this much time and cannot wait for two months. Suppose the data is
too huge to be
stored in disk. Here we have also the problem of resources. This
means that we have
to write programs considering the resources to achieve some
solution as soon as
possible. There is always cost associated with these resources. We
may need a faster
and better CPU which can be purchased. Sometimes, we may need to
buy memory.
As long as data structures and programs are concerned, you have to
invest your own
time for this. While working in a company, you will be paid for
this. All these
requirements including computer, your time and computer time will
decide that the
solution you have provided is suitable or not. If its advantages
are not obtained, then
either program or computer is not good.
So the purchase of a faster computer, while studying this course,
does not necessarily
help us in the resolution of the problem. In the course of “Computer
Architecture”
you will see how the more efficient solutions can be prepared with
the hardware. In
this course, we will use the software i.e. data structures,
algorithms and the recipes
through which the computer problems may be resolved with a faster
solution.
Selecting a Data
Structure
How can we select the data structure needed to solve a problem?
You have already
studied where to use array and the size of array and when and
where to use the
4
CS301 – Data Structures
___________________________________________________________________
pointers etc. First of all, we have to analyze the problem to
determine the resource
constraints that a solution must meet. Suppose, the data is so
huge i.e. in Gega bytes
(GBs) while the disc space available with us is just 200 Mega
bytes. This problem can
not be solved with programming. Rather, we will have to buy a new
disk.
Secondly, it is necessary to determine the basic operations that
must be supported.
Quantify the resource constraints for each operation. What does it
mean? Suppose you
have to insert the data in the computer or database and have to
search some data item.
Let’s take the example of telephone directory. Suppose there are
eight million names
in the directory. Now someone asks you about the name of some
particular person.
You want that this query should be answered as soon as possible.
You may add or
delete some data. It will be advisable to consider all these
operations when you select
some data structure.
Finally select the data structure that meets these requirements
the maximum. Without,
sufficient experience, it will be difficult to determine which one
is the best data
structure. We can get the help from internet, books or from
someone whom you know
for already getting the problems solved. We may find a similar
example and try to use
it. After this course, you will be familiar with the data
structures and algorithms that
are used to solve the computer problems.
Now you have selected the data structure. Suppose a programmer has
inserted some
data and wants to insert more data. This data will be inserted in
the beginning of the
existing data, or in the middle or in the end of the data. Let’s
talk about the arrays and
suppose you have an array of size hundred. Data may be lying in
the first fifty
locations of this array. Now you have to insert data in the start
of this array. What will
you do? You have to move the existing data (fifty locations) to
the right so that we get
space to insert new data. Other way round, there is no space in
the start. Suppose you
have to insert the data at 25th location. For this purpose,
it is better to move the data
from 26th to 50th locations; otherwise we
will not have space to insert this new data at
25th location.
Now we have to see whether the data can be deleted or not. Suppose
you are asked to
delete the data at 27th position. How can we do
that? What will we do with the space
created at 27th position?
Thirdly, is all the data processed in some well-defined order or
random access
allowed? Again take the example of arrays. We can get the data
from 0th position and
traverse the array till its 50th position. Suppose we want
to get the data, at first from
50th location and then from 13th. It means that there is no
order or sequence. We want
to access the data randomly. Random access means that we can’t say
what will be the
next position to get the data or insert the data.
Data Structure
Philosophy
Let’s talk about the philosophy of data structure. Each data
structure has costs and
benefits. Any data structure used in your program will have some
benefits. For this,
you have to pay price. That can be computer resources or the time.
Also keep in mind
5
CS301 – Data Structures
___________________________________________________________________
that you are solving this problem for some client. If the program
is not efficient, the
client will not buy it.
In rare cases, a data structure may be better than another one in
all situations. It means
that you may think that the array is good enough for all the
problems. Yet this is not
necessary. In different situations, different data structures will
be suitable. Sometimes
you will realize that two different data structures are suitable
for the problem. In such
a case, you have to choose the one that is more appropriate. An
important skill this
course is going to lend to the students is use the data structure
according to the
situation. You will learn the programming in a way that it will be
possible to replace
the one data structure with the other one if it does not prove
suitable. We will replace
the data structure so that the rest of the program is not
affected. You will also have to
attain this skill as a good programmer.
There are three basic things associated with data structures. A
data structure requires:
– space for each data item it
stores
– time to perform each basic
operation
– programming effort
Goals of this Course
Reinforce the concept that costs and benefits exist for every data
structure. We will
learn this with practice.
Learn the commonly used data structures. These form a programmer's
basic data
structure “toolkit”. In the previous course, you have learned how
to form a loop,
functions, use of arrays, classes and how to write programs for
different problems. In
this course, you will make use of data structures and have a
feeling that there is bag
full of different data structures. In case of some problem, you will
get a data structure
from the toolkit and use some suitable data structure.
Understand how to measure the cost of a data structure or program.
These techniques
also allow you to judge the merits of new data structures that you
or others might
develop. At times, you may have two suitable data structures for
some problem. These
can be tried one by one to adjudge which one is better one. How
can you decide
which data structure is better than other. Firstly, a programmer
can do it by writing
two programs using different data structure while solving the same
problem. Now
execute both data structures. One gives the result before the
other. The data structure
that gives results first is better than the other one. But
sometimes, the data grows too
large in the problem. Suppose we want to solve some problem having
names and the
data of names grows to10 lakhs (one million). Now when you run
both programs, the
second program runs faster. What does it mean? Is the data
structure used in program
one not correct? This is not true. The size of the data, being
manipulated in the
program can grow or shrink. You will also see that some data
structures are good for
small data while the others may suit to huge data. But the problem
is how can we
determine that the data in future will increase or decrease. We
should have some way
to take decision in this regard. In this course we will do some
mathematical analysis
and see which data structure is better one.
6
CS301 – Data Structures
___________________________________________________________________
Arrays
You have already studied about arrays and are well-versed with the
techniques to
utilize these data structures. Here we will discuss how arrays can
be used to solve
computer problems. Consider the following program:
main( int argc, char** argv )
{
int x[6];
int j;
for(j = 0; j < 6; j++)
x[j] = 2 * j;
}
We have declared an int array of six elements and
initialized it in the loop.
Let’s revise some of the array concepts. The declaration of array
is as int x[6]; or
float x[6]; or double x[6]; You have
already done these in your programming
assignments. An array is collection of cells of the same type. In
the above program,
we have array x of type int of six elements. We can only store integers in this array.
We cannot put int in first location, float in second location and double in third
location. What is x? x is a name of collection of items. Its individual items are
numbered from zero to one less than array size. To access a cell,
use the array name
and an index as under:
x[0], x[1], x[2], x[3], x[4], x[5]
To manipulate the first element, we will use the index zero as x[0] and so on. The
arrays look like in the memory as follows:
X[0]
X[1]
X[2]
X[3]
X[4]
X[5]
Array cells are
contiguous in
computer memory
Array occupies contiguous memory area in the computer. In case of
the above
example, if some location is assigned to x[0], the next location can not contain data
other than x[1]. The computer memory can
be thought of as an array. It is a very big
array. Suppose a computer has memory of 2MB, you can think it as
an array of size 2
7
CS301 – Data Structures
___________________________________________________________________
million and the size of each item is 32 bits. You will study in
detail about it in the
computer organization, and Assembly language courses. In this
array, we will put our
programs, data and other things.
In the above program, we have declared an array named x. ‘x’ is an array’s name but
there is no variable x. ‘x’ is not an lvalue. If some variable can be
written on the lefthand
side of an assignment statement, this is lvalue variable. It means it has some
memory associated with it and some value can be assigned to it.
For example, if we
have the code int a, b; it can be
written as b = 2; it means that put 2 in the
memory location named b. We can also write as a = b; it means whatever b has assign
it to a, that is a copy operation. If we write as a = 5; it means put the number 5 in the
memory location which is named as a. But we cannot write 2 = a; that is to put at
number 2 what ever the value of a is. Why can’t we do that? Number
2 is a constant.
If we allow assignment to constants what will happen? Suppose ‘a’ has the value
number 3. Now we assigned number 2 the number 3 i.e. all the
number 2 will become
number 3 and the result of 2 + 2 will become 6. Therefore it
is not allowed.
‘x’ is a name of array and not an lvalue. So it cannot be used on the left hand side in
an assignment statement. Consider the following statements
int x[6];
int n;
x[0] = 5; x[1] = 2;
x = 3; //not allowed
x = a + b; // not allowed
x = &n; // not allowed
In the above code snippet, we have declared an array x of int. Now we can assign
values to the elements of x as x[0] = 5 or x[1] = 2 and so
on. The last three
statements are not allowed. What does the statement x = 3; mean? As x is a name of
array and this statement is not clear, what we are trying to do
here? Are we trying to
assign 3 to each element of the array? This statement is not clear.
Resultantly, it can
not be allowed. The statement x = a + b is also
not allowed. There is nothing wrong
with a + b. But we cannot assign the sum of values of a and b to x. In the statement x
= &n, we are trying to assign the memory address of n to x which is not allowed. The
reason is the name x is not lvalue and we cannot assign any value to it. For
understanding purposes, consider x as a constant. Its name or
memory location can
not be changed. This is a collective name for six locations. We
can access these
locations as x[0], x[1] up to x[5]. This is the way arrays are manipulated.
Sometimes, you would like to use an array data structure but may
lack the information
about the size of the array at compile time. Take the example of
telephone directory.
You have to store one lakh (100,000) names in an array. But you
never know that the
number of entries may get double or decline in future. Similarly,
you can not say that
the total population of the country is one crore (10 million) and
declare an array of
one crore names. You can use one lakh locations now and remaining
will be used as
the need arrives. But this is not a good way of using the computer
resources. You
have declared a very big array while using a very small chunk of
it. Thus the
remaining space goes waste which can, otherwise, be used by some
other programs.
8
CS301 – Data Structures
___________________________________________________________________
We will see what can be the possible solution of this problem?
Suppose you need an integer array of size n after the execution of the program. We
have studied that if it is known at the execution of the program
that an array of size 20
or 30 is needed, it is allocated dynamically. The programming
statement is as follows:
int* y = new int[20];
It means we are requesting computer to find twenty memory
locations. On finding it,
the computer will give the address of first location to the
programmer which will be
stored in y. Arrays locations are
contiguous i.e. these are adjacent. These twenty
locations will be contiguous, meaning that they will be neighbors
to each other. Now
y has become an array and we can say y[0] =1 or y[5] = 15. Here y is an lvalue.
Being a pointer, it is a variable where we can store the address
of some variable.
When we said int* y = new int[20]; the new returns the memory address of first of
the twenty locations and we store that address into y. As y is a pointer variable so it
can be used on the left-hand side. We can write it as:
y = &x[0];
In the above statement, we get the address of the fist location of
the array x and store
it in y. As y is lvalue, so it can be used on left hand side. This means
that the above
statement is correct.
y = x;
Similarly, the statement y = x is also correct. x is an array of six elements that holds
the address of the first element. But we cannot change this
address. However we can
get that address and store it in some other variable. As y is a pointer variable and
lvalue so the above operation is legal. We have dynamically allocated the
memory for
the array. This memory, after the use, can be released so that
other programs can use
it. We can use the delete keyword to release the
memory. The syntax is:
delete[ ] y;
We are releasing the memory, making it available for use by other
programs. We will
not do it in case of x array, as ‘new’ was not
used for its creation. So it is not our
responsibility to delete x.
List data structure
This is a new data structure for you. The List data structure is among the most generic
of data structures. In daily life, we use shopping list, groceries
list, list of people to
invite to a dinner, list of presents to give etc. In this course,
we will see how we use
lists in programming.
A list is the collection of items of the same type (grocery items,
integers, names). The
data in arrays are also of same type. When we say int x[6]; it means that only the
integers can be stored in it. The same is true for list. The data
which we store in list
should be of same nature. The items, or elements of the list, are
stored in some
9
CS301 – Data Structures
___________________________________________________________________
particular order. What does this mean? Suppose in the list, you
have the fruit first
which are also in some order. You may have names in some
alphabetical order i.e. the
names which starts with A should come first followed
by the name starting with B and
so on. The order will be reserved when you enter data in the list.
It is possible to insert new elements at various positions in the
list and remove any
element of the list. You have done the same thing while dealing
with arrays. You
enter the data in the array, delete data from the array. Sometimes
the array size grows
and at times, it is reduced. We will do this with the lists too.
List is a set of elements in a linear order. Suppose we have four
names a1, a2, a3, a4
and their order is as (a3, a1, a2, a4) i.e. a3, is the first element, a1 is the second
element, and so on. We want to maintain that order in the list
when data is stored in
the list. We don’t want to disturb this order. The order is
important here; this is not
just a random collection of elements but an ordered one. Sometimes, this order is due
to sorting i.e. the things that start with A come first. At occasions, the order may be
due to the importance of the data items. We will discuss this in
detail while dealing
with the examples.
Now we will see what kind of operations a programmer performs with
a list data
structure. Following long list of operations may help you
understand the things in a
comprehensive manner.
Operation Name
Description
createList() Create a new list (presumably empty)
copy() Set one list to be a copy of another
clear(); Clear a list (remove all elements)
insert(X, ?) Insert element X at a particular position in the list
remove(?) Remove element at some position in the list
get(?) Get element at a given position
update(X, ?) Replace the element at a given position with X
find(X) Determine if the element X is in the list
length() Returns the length of the list.
createList() is a function which creates a new list. For
example to create an array, we
use int x[6] or int* y =
new int[20]; we need similar
functionality in lists too. The
copy() function will create a copy of a list. The function clear() will remove all the
elements from a list. We want to insert a new element in the list,
we also have to tell
where to put it in the list. For this purpose insert(X, position) function is used.
Similarly the function remove(position) will remove the element at position. To get an
element from the list get(position) function
is used which will return the element at
position. To replace an element in the list at some position the
function update(X,
position) is used. The function find(X) will search X in the list. The function length()
tells us about the number of elements in the list.
We need to know what is meant by “particular position” we have
used “?” for this in
the above table. There are two possibilities:
Use the actual index of element: i.e. insert it after element 3,
get element
10
No comments:
Post a Comment