CS301 – Data Structures

___________________________________________________________________

Introduction to Data Structures

Let’s discuss why we need data structures and what sort of problems can be solved

with their use. Data structures help us to organize the data in the computer, resulting

in more efficient programs. An efficient program executes faster and helps minimize

the usage of resources like memory, disk. Computers are getting more powerful with

the passage of time with the increase in CPU speed in GHz, availability of faster

network and the maximization of disk space. Therefore people have started solving

more and more complex problems. As computer applications are becoming complex,

so there is need for more resources. This does not mean that we should buy a new

computer to make the application execute faster. Our effort should be to ensue that the

solution is achieved with the help of programming, data structures and algorithm.

What does organizing the data mean? It means that the data should be arranged in a

way that it is easily accessible. The data is inside the computer and we want to see it.

We may also perform some calculations on it. Suppose the data contains some

numbers and the programmer wants to calculate the average, standard deviation etc.

May be we have a list of names and want to search a particular name in it. To solve

such problems, data structures and algorithm are used. Sometimes you may realize

that the application is too slow and taking more time. There are chances that it may be

due to the data structure used, not due to the CPU speed and memory. We will see

such examples. In the assignments, you will also check whether the data structure in

the program is beneficial or not. You may have two data structures and try to decide

which one is more suitable for the resolution of the problem.

As discussed earlier, a solution is said to be efficient if it solves the problem within its

resource constraints. What does it mean? In the computer, we have hard disk, memory

and other hardware. Secondly we have time. Suppose you have some program that

solves the problem but takes two months. It will be of no use. Usually, you don’t have

this much time and cannot wait for two months. Suppose the data is too huge to be

stored in disk. Here we have also the problem of resources. This means that we have

to write programs considering the resources to achieve some solution as soon as

possible. There is always cost associated with these resources. We may need a faster

and better CPU which can be purchased. Sometimes, we may need to buy memory.

As long as data structures and programs are concerned, you have to invest your own

time for this. While working in a company, you will be paid for this. All these

requirements including computer, your time and computer time will decide that the

solution you have provided is suitable or not. If its advantages are not obtained, then

either program or computer is not good.

So the purchase of a faster computer, while studying this course, does not necessarily

help us in the resolution of the problem. In the course of “Computer Architecture”

you will see how the more efficient solutions can be prepared with the hardware. In

this course, we will use the software i.e. data structures, algorithms and the recipes

through which the computer problems may be resolved with a faster solution.

Selecting a Data Structure

How can we select the data structure needed to solve a problem? You have already

studied where to use array and the size of array and when and where to use the

CS301 – Data Structures

___________________________________________________________________

pointers etc. First of all, we have to analyze the problem to determine the resource

constraints that a solution must meet. Suppose, the data is so huge i.e. in Gega bytes

(GBs) while the disc space available with us is just 200 Mega bytes. This problem can

not be solved with programming. Rather, we will have to buy a new disk.

Secondly, it is necessary to determine the basic operations that must be supported.

Quantify the resource constraints for each operation. What does it mean? Suppose you

have to insert the data in the computer or database and have to search some data item.

Let’s take the example of telephone directory. Suppose there are eight million names

in the directory. Now someone asks you about the name of some particular person.

You want that this query should be answered as soon as possible. You may add or

delete some data. It will be advisable to consider all these operations when you select

some data structure.

Finally select the data structure that meets these requirements the maximum. Without,

sufficient experience, it will be difficult to determine which one is the best data

structure. We can get the help from internet, books or from someone whom you know

for already getting the problems solved. We may find a similar example and try to use

it. After this course, you will be familiar with the data structures and algorithms that

are used to solve the computer problems.

Now you have selected the data structure. Suppose a programmer has inserted some

data and wants to insert more data. This data will be inserted in the beginning of the

existing data, or in the middle or in the end of the data. Let’s talk about the arrays and

suppose you have an array of size hundred. Data may be lying in the first fifty

locations of this array. Now you have to insert data in the start of this array. What will

you do? You have to move the existing data (fifty locations) to the right so that we get

space to insert new data. Other way round, there is no space in the start. Suppose you

have to insert the data at 25th location. For this purpose, it is better to move the data

from 26th to 50th locations; otherwise we will not have space to insert this new data at

25th location.

Now we have to see whether the data can be deleted or not. Suppose you are asked to

delete the data at 27th position. How can we do that? What will we do with the space

created at 27th position?

Thirdly, is all the data processed in some well-defined order or random access

allowed? Again take the example of arrays. We can get the data from 0th position and

traverse the array till its 50th position. Suppose we want to get the data, at first from

50th location and then from 13th. It means that there is no order or sequence. We want

to access the data randomly. Random access means that we can’t say what will be the

next position to get the data or insert the data.

Data Structure Philosophy

Let’s talk about the philosophy of data structure. Each data structure has costs and

benefits. Any data structure used in your program will have some benefits. For this,

you have to pay price. That can be computer resources or the time. Also keep in mind

CS301 – Data Structures

___________________________________________________________________

that you are solving this problem for some client. If the program is not efficient, the

client will not buy it.

In rare cases, a data structure may be better than another one in all situations. It means

that you may think that the array is good enough for all the problems. Yet this is not

necessary. In different situations, different data structures will be suitable. Sometimes

you will realize that two different data structures are suitable for the problem. In such

a case, you have to choose the one that is more appropriate. An important skill this

course is going to lend to the students is use the data structure according to the

situation. You will learn the programming in a way that it will be possible to replace

the one data structure with the other one if it does not prove suitable. We will replace

the data structure so that the rest of the program is not affected. You will also have to

attain this skill as a good programmer.

There are three basic things associated with data structures. A data structure requires:

– space for each data item it stores

– time to perform each basic operation

– programming effort

Goals of this Course

Reinforce the concept that costs and benefits exist for every data structure. We will

learn this with practice.

Learn the commonly used data structures. These form a programmer's basic data

structure “toolkit”. In the previous course, you have learned how to form a loop,

functions, use of arrays, classes and how to write programs for different problems. In

this course, you will make use of data structures and have a feeling that there is bag

full of different data structures. In case of some problem, you will get a data structure

from the toolkit and use some suitable data structure.

Understand how to measure the cost of a data structure or program. These techniques

also allow you to judge the merits of new data structures that you or others might

develop. At times, you may have two suitable data structures for some problem. These

can be tried one by one to adjudge which one is better one. How can you decide

which data structure is better than other. Firstly, a programmer can do it by writing

two programs using different data structure while solving the same problem. Now

execute both data structures. One gives the result before the other. The data structure

that gives results first is better than the other one. But sometimes, the data grows too

large in the problem. Suppose we want to solve some problem having names and the

data of names grows to10 lakhs (one million). Now when you run both programs, the

second program runs faster. What does it mean? Is the data structure used in program

one not correct? This is not true. The size of the data, being manipulated in the

program can grow or shrink. You will also see that some data structures are good for

small data while the others may suit to huge data. But the problem is how can we

determine that the data in future will increase or decrease. We should have some way

to take decision in this regard. In this course we will do some mathematical analysis

and see which data structure is better one.

CS301 – Data Structures

___________________________________________________________________

Arrays

You have already studied about arrays and are well-versed with the techniques to

utilize these data structures. Here we will discuss how arrays can be used to solve

computer problems. Consider the following program:

main( int argc, char** argv )

{

int x[6];

int j;

for(j = 0; j < 6; j++)

x[j] = 2 * j;

}

We have declared an int array of six elements and initialized it in the loop.

Let’s revise some of the array concepts. The declaration of array is as int x[6]; or

float x[6]; or double x[6]; You have already done these in your programming

assignments. An array is collection of cells of the same type. In the above program,

we have array x of type int of six elements. We can only store integers in this array.

We cannot put int in first location, float in second location and double in third

location. What is x? x is a name of collection of items. Its individual items are

numbered from zero to one less than array size. To access a cell, use the array name

and an index as under:

x[0], x[1], x[2], x[3], x[4], x[5]

To manipulate the first element, we will use the index zero as x[0] and so on. The

arrays look like in the memory as follows:

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

Array cells are

contiguous in

computer memory

Array occupies contiguous memory area in the computer. In case of the above

example, if some location is assigned to x[0], the next location can not contain data

other than x[1]. The computer memory can be thought of as an array. It is a very big

array. Suppose a computer has memory of 2MB, you can think it as an array of size 2

CS301 – Data Structures

___________________________________________________________________

million and the size of each item is 32 bits. You will study in detail about it in the

computer organization, and Assembly language courses. In this array, we will put our

programs, data and other things.

In the above program, we have declared an array named x. ‘x’ is an array’s name but

there is no variable x. ‘x’ is not an lvalue. If some variable can be written on the lefthand

side of an assignment statement, this is lvalue variable. It means it has some

memory associated with it and some value can be assigned to it. For example, if we

have the code int a, b; it can be written as b = 2; it means that put 2 in the

memory location named b. We can also write as a = b; it means whatever b has assign

it to a, that is a copy operation. If we write as a = 5; it means put the number 5 in the

memory location which is named as a. But we cannot write 2 = a; that is to put at

number 2 what ever the value of a is. Why can’t we do that? Number 2 is a constant.

If we allow assignment to constants what will happen? Suppose ‘a’ has the value

number 3. Now we assigned number 2 the number 3 i.e. all the number 2 will become

number 3 and the result of 2 + 2 will become 6. Therefore it is not allowed.

‘x’ is a name of array and not an lvalue. So it cannot be used on the left hand side in

an assignment statement. Consider the following statements

int x[6];

int n;

x[0] = 5; x[1] = 2;

x = 3; //not allowed

x = a + b; // not allowed

x = &n; // not allowed

In the above code snippet, we have declared an array x of int. Now we can assign

values to the elements of x as x[0] = 5 or x[1] = 2 and so on. The last three

statements are not allowed. What does the statement x = 3; mean? As x is a name of

array and this statement is not clear, what we are trying to do here? Are we trying to

assign 3 to each element of the array? This statement is not clear. Resultantly, it can

not be allowed. The statement x = a + b is also not allowed. There is nothing wrong

with a + b. But we cannot assign the sum of values of a and b to x. In the statement x

= &n, we are trying to assign the memory address of n to x which is not allowed. The

reason is the name x is not lvalue and we cannot assign any value to it. For

understanding purposes, consider x as a constant. Its name or memory location can

not be changed. This is a collective name for six locations. We can access these

locations as x[0], x[1] up to x[5]. This is the way arrays are manipulated.

Sometimes, you would like to use an array data structure but may lack the information

about the size of the array at compile time. Take the example of telephone directory.

You have to store one lakh (100,000) names in an array. But you never know that the

number of entries may get double or decline in future. Similarly, you can not say that

the total population of the country is one crore (10 million) and declare an array of

one crore names. You can use one lakh locations now and remaining will be used as

the need arrives. But this is not a good way of using the computer resources. You

have declared a very big array while using a very small chunk of it. Thus the

remaining space goes waste which can, otherwise, be used by some other programs.

CS301 – Data Structures

___________________________________________________________________

We will see what can be the possible solution of this problem?

Suppose you need an integer array of size n after the execution of the program. We

have studied that if it is known at the execution of the program that an array of size 20

or 30 is needed, it is allocated dynamically. The programming statement is as follows:

int* y = new int[20];

It means we are requesting computer to find twenty memory locations. On finding it,

the computer will give the address of first location to the programmer which will be

stored in y. Arrays locations are contiguous i.e. these are adjacent. These twenty

locations will be contiguous, meaning that they will be neighbors to each other. Now

y has become an array and we can say y[0] =1 or y[5] = 15. Here y is an lvalue.

Being a pointer, it is a variable where we can store the address of some variable.

When we said int* y = new int[20]; the new returns the memory address of first of

the twenty locations and we store that address into y. As y is a pointer variable so it

can be used on the left-hand side. We can write it as:

y = &x[0];

In the above statement, we get the address of the fist location of the array x and store

it in y. As y is lvalue, so it can be used on left hand side. This means that the above

statement is correct.

y = x;

Similarly, the statement y = x is also correct. x is an array of six elements that holds

the address of the first element. But we cannot change this address. However we can

get that address and store it in some other variable. As y is a pointer variable and

lvalue so the above operation is legal. We have dynamically allocated the memory for

the array. This memory, after the use, can be released so that other programs can use

it. We can use the delete keyword to release the memory. The syntax is:

delete[ ] y;

We are releasing the memory, making it available for use by other programs. We will

not do it in case of x array, as ‘new’ was not used for its creation. So it is not our

responsibility to delete x.

List data structure

This is a new data structure for you. The List data structure is among the most generic

of data structures. In daily life, we use shopping list, groceries list, list of people to

invite to a dinner, list of presents to give etc. In this course, we will see how we use

lists in programming.

A list is the collection of items of the same type (grocery items, integers, names). The

data in arrays are also of same type. When we say int x[6]; it means that only the

integers can be stored in it. The same is true for list. The data which we store in list

should be of same nature. The items, or elements of the list, are stored in some

CS301 – Data Structures

___________________________________________________________________

particular order. What does this mean? Suppose in the list, you have the fruit first

which are also in some order. You may have names in some alphabetical order i.e. the

names which starts with A should come first followed by the name starting with B and

so on. The order will be reserved when you enter data in the list.

It is possible to insert new elements at various positions in the list and remove any

element of the list. You have done the same thing while dealing with arrays. You

enter the data in the array, delete data from the array. Sometimes the array size grows

and at times, it is reduced. We will do this with the lists too.

List is a set of elements in a linear order. Suppose we have four names a1, a2, a3, a4

and their order is as (a3, a1, a2, a4) i.e. a3, is the first element, a1 is the second

element, and so on. We want to maintain that order in the list when data is stored in

the list. We don’t want to disturb this order. The order is important here; this is not

just a random collection of elements but an ordered one. Sometimes, this order is due

to sorting i.e. the things that start with A come first. At occasions, the order may be

due to the importance of the data items. We will discuss this in detail while dealing

with the examples.

Now we will see what kind of operations a programmer performs with a list data

structure. Following long list of operations may help you understand the things in a

comprehensive manner.

Operation Name Description

createList() Create a new list (presumably empty)

copy() Set one list to be a copy of another

clear(); Clear a list (remove all elements)

insert(X, ?) Insert element X at a particular position in the list

remove(?) Remove element at some position in the list

get(?) Get element at a given position

update(X, ?) Replace the element at a given position with X

find(X) Determine if the element X is in the list

length() Returns the length of the list.

createList() is a function which creates a new list. For example to create an array, we

use int x[6] or int* y = new int[20]; we need similar functionality in lists too. The

copy() function will create a copy of a list. The function clear() will remove all the

elements from a list. We want to insert a new element in the list, we also have to tell

where to put it in the list. For this purpose insert(X, position) function is used.

Similarly the function remove(position) will remove the element at position. To get an

element from the list get(position) function is used which will return the element at

position. To replace an element in the list at some position the function update(X,

position) is used. The function find(X) will search X in the list. The function length()

tells us about the number of elements in the list.

We need to know what is meant by “particular position” we have used “?” for this in

the above table. There are two possibilities:

􀂃 Use the actual index of element: i.e. insert it after element 3, get element

CS301 – Data Structures

No comments:

Post a Comment

ShareThis

push up dialog