-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathPython-DataScience
More file actions
90 lines (53 loc) · 7.07 KB
/
Copy pathPython-DataScience
File metadata and controls
90 lines (53 loc) · 7.07 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
Introduction to NumPy
You will learn about the two most important and popular Python libraries for handling data: NumPy and Pandas.
you will learn about the basics of NumPy, which is the fundamental package for scientific computing in Python. NumPy consists of a robust data structure called multidimensional arrays. Pandas is another powerful Python library that provides a fast and easy-to-use data analysis platform.
To enhance the learning outcome, you are expected to pause the videos and code along with the instructor. You will be provided with a structured and blank IPython notebook for coding. It is a must for you to answer certain in-segment questions, as they serve the purpose of practice, and the final notebook will have practice tasks for you to solve.
You will understand the advantages of using NumPy. You will also learn how to:
Create NumPy arrays,
Convert lists and tuples to NumPy arrays,
Inspect the structure and content of arrays, and
Subset, slice, index and iterate through arrays.
Before we get into the technicalities of a NumPy array, explore its useful functions and understand how it is implemented in Python, it is crucial to understand why NumPy is an important library for working with data.
NumPy, an acronym for the term ‘Numerical Python’, is a library in Python which is used extensively for efficient mathematical computing. This library allows users to store large amounts of data using less memory and perform extensive operations efficiently. It provides optimised and simpler functionalities to perform aforementioned operations using homogenous, one-dimensional and multidimensional arrays (You will learn more about this later.).
Now, before delving deep into the concept of NumPy arrays, it is important to note that Python lists can very well perform all the actions that NumPy arrays perform; it is simply the fact that NumPy arrays are faster and more convenient than lists when it comes to extensive computations, which make them extremely useful, especially when you are working with large amounts of data.
Basics of NumPy
NumPy, which stands for ‘Numerical Python’, is a library meant for scientific calculations. The basic data structure of NumPy is an array. A NumPy array is a collection of values stored together, similar to a list.
Ability to operate on individual elements in the array without using loops or list comprehension
Speed of execution
The demonstration in the video above did not cover the aspect of speed, so for now, you can assume that a NumPy array is faster than a list. Later in this session, you will be able to take a look at a detailed demonstration to compare the speed of NumPy arrays.
Creating NumPy Arrays
There are two ways to create NumPy arrays, which are mentioned below.
By converting the existing lists or tuples to arrays using np.array
By initialising fixed-length arrays using the NumPy functions
In this session, you will learn about both these methods.
The key advantage of using NumPy arrays over lists is that arrays allow you to operate over the entire data, unlike lists. However, in terms of structure, NumPy arrays are extremely similar to lists. If you try to run the print() command over a NumPy array, then you will get the following output:
[element_1 element_2 element_3…]
The only difference between a NumPy array and a list is that the elements in the NumPy array are separated by a space instead of a comma. Hence, this is an aesthetic feature that differentiates a list and a NumPy array.
An important point to note here is that the array given above is a one-dimensional array. You will learn about multidimensional arrays in the subsequent segments.
Another feature of NumPy arrays is that they are homogeneous in nature. By homogenous, we mean that all the elements in a NumPy array have to be of the same data type, which could be an integer, float, string, etc.
Creating an array
Is it possible to create an array from a tuple?
Yes, NumPy arrays can be created using a tuple. Any sequence that has an array-like structure can be passed to the np.array function.
To summarise, similar to lists, you can subset your data through conditions based on your requirements in NumPy arrays. To do this, you need to use logical operators such as ‘<’ and ‘>’. NumPy also has a few inbuilt functions such as max(), min() and mean(), which allow you to calculate statistically important data over the data directly. In the following video, Behzad will explore these functions and also summarise the learnings of this segment.
A multidimensional array is an array of arrays. For example, a two-dimensional array would be an array with each element as a one-dimensional array.
1-D array : [1, 2, 3, 4, 5]
2-D array : [ [1, 2, 3, 4, 5], [6, 7, 8, 9, 10] ]
NumPy arrays have certain features that help in analysing multidimensional arrays. Some features are as follows:
shape: It represents the shape of an array as the number of elements in each dimension.
ndim: It represents the number of dimensions of an array. For a 2D array, ndim = 2.
Similar to 1D arrays, nD arrays can also operate on individual elements without using list comprehensions or loops. The following video will give you a small demonstration of operating on nD arrays.
you learnt how to convert lists or tuples to arrays using np.array(). There are other ways in which you can create arrays. The following ways are commonly used when you know the size of the array beforehand:
np.ones(): It is used to create an array of 1s.
np.zeros(): It is used to create an array of 0s.
np.random.randint(): It is used to create a random array of integers within a particular range.
np.random.random(): It is used to create an array of random numbers.
np.arange(): It is used to create an array with increments of fixed step size.
np.linspace(): It is used to create an array of fixed length.
np.full(): It is used to create a constant array of any number ‘n’.
np.tile(): It is used to create a new array by repeating an existing array for a particular number of times.
np.eye(): It is used to create an identity matrix of any dimension
hstack: It puts two arrays with the same number of rows together. By using this command, the number of rows stays the same, while the number of columns increases.
vstack: It puts two arrays on top of each other. This command works only when the number of columns in both the arrays is the same. This command can only change the number rows of an array.
reshape: It can change the shape of an array as long as the number of elements in the array before and after the reshape operation is the same.
Once you have created an array, you may also want to run aggregation operations on the data stored in it. An aggregation function helps you summarise the numerical dat
The objective of this segment is to cover the mathematical capabilities of NumPy. Note that these mathematical functions might not be of direct use for you. In actual practice as a data scientist, you might not use these functions, but the advanced functions that you will use will be built using these functions. So, it would help if you remember that the NumPy library has all of these capabilities.