SAS Viya. Kevin D. Smith
Чтение книги онлайн.
Читать онлайн книгу SAS Viya - Kevin D. Smith страница 10
each variable included in the variable list based
on a user-specified ranking order
Let’s run the summary action on our CAS table.
In [18]: summ = iris.summary()
In [19]: summ
Out[19]:
[Summary]
Descriptive Statistics for IRIS
Column Min Max N NMiss Mean Sum Std \
0 SepalLength 4.3 7.9 150.0 0.0 5.843333 876.5 0.828066
1 SepalWidth 2.0 4.4 150.0 0.0 3.054000 458.1 0.433594
2 PetalLength 1.0 6.9 150.0 0.0 3.758667 563.8 1.764420
3 PetalWidth 0.1 2.5 150.0 0.0 1.198667 179.8 0.763161
StdErr Var USS CSS CV TValue \
0 0.067611 0.685694 5223.85 102.168333 14.171126 86.425375
1 0.035403 0.188004 1427.05 28.012600 14.197587 86.264297
2 0.144064 3.113179 2583.00 463.863733 46.942721 26.090198
3 0.062312 0.582414 302.30 86.779733 63.667470 19.236588
ProbT
0 3.331256e-129
1 4.374977e-129
2 1.994305e-57
3 3.209704e-42
+ Elapsed: 0.0256s, user: 0.019s, sys: 0.009s, mem: 1.74mb
The summary action displays summary statistics in a form that is familiar to SAS users. If you want them in a form similar to what Pandas users are used to, you can use the describe method (just like on DataFrames).
In [20]: iris.describe()
Out[20]:
SepalLength SepalWidth PetalLength PetalWidth
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000
Note that when you call the describe method on a CASTable object, it calls various CAS actions in the background to do the calculations. This includes the summary, percentile, and topk actions. The output of those actions is combined into a DataFrame in the same form that the real Pandas DataFrame describe method returns. This enables you to use CASTable objects and DataFrame objects interchangeably in your workflow for this method and many other methods.
Data Visualization
Since the tables that come back from the CAS server are subclasses of Pandas DataFrames, you can do anything to them that works on DataFrames. You can plot the results of your actions using the plot method or use them as input to more advanced packages such as Matplotlib and Bokeh, which are covered in more detail in a later section.
The following example uses the plot method to download the entire data set and plot it using the default options.
In [21]: iris.plot()
Out[21]: <matplotlib.axes.AxesSubplot at 0x5339050>
If the plot doesn’t show up automatically, you might have to tell Matplotlib to display it.
In [22]: import matplotlib.pyplot as plt
In [23]: plt.show()
The output that is created by the plot method follows.
Even if you loaded the same data set that we have used in this example, your plot might look different since CAS stores data in a distributed manner. Because of this, the ordering of data from the server is not deterministic unless you sort it when it is fetched. If you run the following commands, you plot the data sorted by SepalLength and SepalWidth.
In [24]: iris.sort_values(['SepalLength', 'SepalWidth']).plot()
Closing the Connection
As with any network or file resource in Python, you should close your CAS connections when you are finished. They time out and disappear eventually if left open, but it’s always a good idea to clean them up explicitly.
In [25]: conn.close()
Conclusion
Hopefully this 10-minute guide was enough to give you an idea of the basic workflow and capabilities of the Python CAS client. In the following chapters, we dig deeper into the details of the Python CAS client and how to blend the power of SAS analytics with the tools that are available in the Python environment.
1 Later in the book, we show you how to store your password so that you do not need to specify it in your programs.
Chapter 3: The Fundamentals of Using Python with CAS