SAS Viya. Kevin D. Smith

Чтение книги онлайн.

Читать онлайн книгу SAS Viya - Kevin D. Smith страница 10

Автор:
Жанр:
Серия:
Издательство:
SAS Viya - Kevin D. Smith

Скачать книгу

distinct values of

      each variable included in the variable list based

      on a user-specified ranking order

      Let’s run the summary action on our CAS table.

      In [18]: summ = iris.summary()

      In [19]: summ

       Out[19]:

      [Summary]

      Descriptive Statistics for IRIS

      Column Min Max N NMiss Mean Sum Std \

      0 SepalLength 4.3 7.9 150.0 0.0 5.843333 876.5 0.828066

      1 SepalWidth 2.0 4.4 150.0 0.0 3.054000 458.1 0.433594

      2 PetalLength 1.0 6.9 150.0 0.0 3.758667 563.8 1.764420

      3 PetalWidth 0.1 2.5 150.0 0.0 1.198667 179.8 0.763161

      StdErr Var USS CSS CV TValue \

      0 0.067611 0.685694 5223.85 102.168333 14.171126 86.425375

      1 0.035403 0.188004 1427.05 28.012600 14.197587 86.264297

      2 0.144064 3.113179 2583.00 463.863733 46.942721 26.090198

      3 0.062312 0.582414 302.30 86.779733 63.667470 19.236588

      ProbT

      0 3.331256e-129

      1 4.374977e-129

      2 1.994305e-57

      3 3.209704e-42

      + Elapsed: 0.0256s, user: 0.019s, sys: 0.009s, mem: 1.74mb

      The summary action displays summary statistics in a form that is familiar to SAS users. If you want them in a form similar to what Pandas users are used to, you can use the describe method (just like on DataFrames).

      In [20]: iris.describe()

       Out[20]:

      SepalLength SepalWidth PetalLength PetalWidth

      count 150.000000 150.000000 150.000000 150.000000

      mean 5.843333 3.054000 3.758667 1.198667

      std 0.828066 0.433594 1.764420 0.763161

      min 4.300000 2.000000 1.000000 0.100000

      25% 5.100000 2.800000 1.600000 0.300000

      50% 5.800000 3.000000 4.350000 1.300000

      75% 6.400000 3.300000 5.100000 1.800000

      max 7.900000 4.400000 6.900000 2.500000

      Note that when you call the describe method on a CASTable object, it calls various CAS actions in the background to do the calculations. This includes the summary, percentile, and topk actions. The output of those actions is combined into a DataFrame in the same form that the real Pandas DataFrame describe method returns. This enables you to use CASTable objects and DataFrame objects interchangeably in your workflow for this method and many other methods.

      Since the tables that come back from the CAS server are subclasses of Pandas DataFrames, you can do anything to them that works on DataFrames. You can plot the results of your actions using the plot method or use them as input to more advanced packages such as Matplotlib and Bokeh, which are covered in more detail in a later section.

      The following example uses the plot method to download the entire data set and plot it using the default options.

      In [21]: iris.plot()

      Out[21]: <matplotlib.axes.AxesSubplot at 0x5339050>

      If the plot doesn’t show up automatically, you might have to tell Matplotlib to display it.

      In [22]: import matplotlib.pyplot as plt

      In [23]: plt.show()

      The output that is created by the plot method follows.

image

      Even if you loaded the same data set that we have used in this example, your plot might look different since CAS stores data in a distributed manner. Because of this, the ordering of data from the server is not deterministic unless you sort it when it is fetched. If you run the following commands, you plot the data sorted by SepalLength and SepalWidth.

      In [24]: iris.sort_values(['SepalLength', 'SepalWidth']).plot()

image

      As with any network or file resource in Python, you should close your CAS connections when you are finished. They time out and disappear eventually if left open, but it’s always a good idea to clean them up explicitly.

      In [25]: conn.close()

      Hopefully this 10-minute guide was enough to give you an idea of the basic workflow and capabilities of the Python CAS client. In the following chapters, we dig deeper into the details of the Python CAS client and how to blend the power of SAS analytics with the tools that are available in the Python environment.

      Chapter 3: The Fundamentals of Using Python with CAS

       Connecting to CAS

       Running CAS Actions

       Specifying Action Parameters

       CAS Action Results

       Working with CAS Action Sets

       Details

Скачать книгу