distilled python
David M. Beazley Author of Python Essential Reference
Contents Preface Acknowledgments About the Author Chapter 1: Python Basics Chapter 2: Operators, Expressions, and Data Manipulation Chapter 3: Program Structure and Control Flow Chapter 4: Objects, Types, and Protocols Chapter 5: Functions Chapter 6: Generators Chapter 7: Classes and Object-Oriented Programming Chapter 8: Modules and Packages Chapter 9: Input and Output Chapter 10: Built-in Functions and the Standard Library
Table of Contents Preface Acknowledgments About the Author Chapter 1: Python Basics Running Python Python Programs Primitives, Variables, and Expressions Arithmetic Operators Conditions and Control Flow Text Strings File Input and Output Lists Tuple Arrays Dictionaries Iteration Functions and Loop Exceptions Program Scheduling Objects and Classes Modules Script Packages
Structuring an Application Third-Party Package Management Python: Fits Your Brain Chapter 2: Operators, Expressions, and Data Manipulation Literal Expressions and Locations Standard Operators In-Place Assignment Object Comparison Ordered Comparison Operators Boolean Expressions and Boolean Expressions Constraints Operations on Iterables Operations on Sequences Operations on Varying Sequences Operations on sets operations on assignments list, set, and dictionary generator expressions The attribute operator (.) The function call operator () Evaluation order Closing words: The secret life of data Chapter 3: Program structure and control flow-Program structure and execution iteration and conditional execution loops
Exception Context Manager and Statement Assertions and __debug__ Closing Words Chapter 4: Objects, Types, and Protocols Key Concepts Object Identity and Type Reference Counting and Garbage Collection References and Copies Object Representation and Printing Prime Objects Use None for optional or missing data objects Protocols and Data Abstraction Object Protocol Number Comparison Protocol Conversion Protocol Container Protocols Iteration Protocol Attribute history Function history Context manager history Final Words: Be Pythonic Chapter 5: Function Definitions Function Default Arguments Variadic Arguments
Keyword arguments Variadic keyword arguments Functions Accepting all input Positional arguments only Names, docstrings, and type Notes Function Parameter passing and application Return values Error handling Scoping rules Recursion The lambda expression Higher-order functions Passing arguments in callback functions Returning results of callbacks Decorator Mapping, filtering, and constraining Introspecting Functions, Attributes, and Signatures Exploring the Environment Running and Creating Dynamic Code Asynchronous Functions and Waiting for Trailing Words: Thoughts on Functions and Composition Chapter 6: Generators Generators and Performance Resettable Generators Generator Delegation Using Generators in Practice Improves generators and performance expressions
Applications of Improved Generators Generators and the Bridge to Waiting for Last Words: A Brief History of Generators and an Outlook Chapter 7: Object-Oriented Programming Classes and Objects The Class Declaration Instances Attribute Access Scoping Rules Operator Overloading and Protocols Inheritance Avoidance Inheritance by Composition Avoidance Inheritance by Functions Dynamic Binding and Duck Typing The Danger of Inheritance from Built-in Types Class Variables and Methods Static Methods A Word on Design Patterns Data Encapsulation and Private Attributes Type Hints Properties Types, Interfaces, and Abstract Base Classes Multiple Inheritance, Interfaces, and Mixtures Type-Based Dispatch Class Decorators Supervised Inheritance Object Lifecycle and Memory Management
Weak references Internal object representation and attribute binding Proxies, containers, and delegation Reducing memory usage with __slots__ descriptors Class definition process Dynamic class creation Metaclasses Built-in objects for instances and classes Conclusion: Keeping it simple Chapter 8: Modules and packages Modules and the module import directive Caching import selected names of a Modules Circular Imports Module Reloading and unloading the module Compiling the module Search path of the module Running as main program packages Running a package submodule as a script Package namespace control Package export control Package data Module objects Access to module attributes Implementation Python packages
Chapter 9: Input and output data representation Text encoding and decoding Text and byte formatting Reading command-line options Environment variables Files and file objects I/O abstraction layers Standard input, output, and error directories Concurrency and locking operations Standard library modules Closing words Chapter 10: Built-in functions and the standard library Built-in Functions Built-in Exceptions Standard Library Closing Words: Using built-in functions
Preface More than 20 years have passed since I wrote the Essential Python Reference. At the time, Python was a much smaller language and came with a handy set of batteries in its standard library. It was something you could mostly build into your brain. The Essential Reference reflected this era. It should be a small book that you can take with you to write Python code on a desert island or in a secret vault. Over three consecutive revisions, the Essential Reference evolved more or less with this vision of being a compact but complete language reference, because if you were programming Python on vacation, why wouldn't you want to use everything? Today, more than a decade has passed since the last edition was published, and the world of Python is very different. Python is no longer a niche language, but has become one of the most popular programming languages in the world. Python programmers also have a wealth of information available in the form of advanced editors, IDEs, notebooks, web pages, and more. In fact, there is probably little need to consult a "reference book" when nearly any reference material you want can appear before your eyes at the touch of a button. If anything, the ease of information retrieval and the "size" of the Python universe present a different kind of challenge. If you're just trying to learn something or solve a new problem, it can be a little overwhelming to know where you should start. It can also be difficult to separate the functions of different tools from the core language itself. These types of problems are the basis of this book. Python Distilled is a book about programming in Python. However, rather than trying to document absolutely "everything" that is possible or has been previously done in Python, the focus is on presenting a modern but curated (or distilled) core of the language. Much of this was influenced by my years of teaching Python to software scientists, engineers, and professionals. However, it is also a product of writing various programs.
Libraries, push the outer limits of what Python will work and learn what's most useful. The book mostly focuses on the topic of Python programming itself. This includes abstraction techniques, program structure, data, functions, objects, modules, etc. Topics that will be very useful for programmers working on Python projects of any size. Pure reference material that can easily be obtained via an IDE (i.e. function lists, command names, arguments, etc.) is generally omitted. I also consciously chose not to describe the changing world of Python tools around editors, IDEs, implementation and other topics. Perhaps controversial, I don't generally focus on language features related to managing large software projects. Python is sometimes used for big, serious things that somehow involve millions and millions of lines of code. Maintaining such applications requires tools, design, functions, types, committees, meetings and decisions to be made on very important matters. Matters of such importance are too important for this little book. Therefore, they are left as an exercise for the reader. However, the honest answer is that I don't use Python to write these kinds of applications, and neither should you. When writing a book, there is always a limit to the ever-evolving characteristics of language. This book was written during the Python 3.9 era. Therefore, it contains some important additions planned for later versions, e.g. B. the structural pattern matching, not. That is a topic for another time and place. Last but not least, I find it important that programming is fun. I hope the book not only helps you become a productive Python programmer, but also captures some of the magic that inspired people to use Python to explore the stars, fly helicopters on Mars, or squirrels with a to spray water cannon in the courtyard.
Acknowledgments [This content is currently under development.] This content is currently under development.
About the Author [This content is currently under development.] This content is currently under development.
Python Basics This chapter provides an overview of the core of the Python language. Topics include variables, data types, expressions, control flow, functions, classes, and input/output. The chapter closes with a discussion of modules, scripting, packages, and some advice on organizing larger programs. This chapter is not intended to fully cover all of the features, nor does it cover all of the tools that can surround a larger Python project. However, experienced programmers should be able to extrapolate from the material here to write more advanced programs. Newbies are advised to try the examples in a simple environment, e.g. B. in a terminal window and a simple text editor.
Python Execution Python programs are executed by an interpreter. There are many different environments in which the Python interpreter can be run, from the IDE, a browser, or a terminal window. Beneath all of this, however, the core of the interpreter is a text-based application that can be launched by typing python into a terminal command shell such as bash. Since Python 2 and Python 3 can be installed on the same machine, you may need to type python2 or python3 to choose a version. This book requires Python 3.8 or higher. When the interpreter starts, a message appears where you can start writing programs in what is called a "Read-Evaluate-Print-Loop" (or REPL). For example, in the following output, the shell displays its copyright message and presents the user with the >>> prompt, where the user types a well-known "Hello World" program: Python 3.8.0 (default, January 3). 2019 05:53:21) [GCC 4.2.1 Compatible with Apple LLVM 8.0.0 (clang-800.0.38)] on Darwin Type Help, Copyright, Credits, or License to see more to get information. >>> print('Hello World')
Hello world >>>
Certain environments may display a different prompt, such as For example, the following output from ipython (an alternative shell for Python). Python 3.8.0 (Default, Feb 4 2019 07:39:16) Type copyright, credits, or license for more information. IPython 6.5.0: An improved interactive Python. writes '?' for help In [1]: print('Hello World') Hello World In [2]:
Regardless of the exact result you see, the underlying principle is the same. When you enter a command, it will be executed and you will immediately see the result. Python's interactive mode is one of its most useful features because you can type in any valid statement and see the results immediately. This is useful for debugging and experimenting. Many people, including the author, even use the interactive Python as a desktop calculator. For example: >>> 6000 + 4523.50 + 134.25 10657.75 >>> _ + 8192.75 18850.5 >>>
When using Python interactively, the _ variable contains the result of the last operation. This is useful when you want to use the result of the last operation in subsequent statements. This variable is defined only when working interactively, so you don't use it in saved programs. You can exit the interactive shell by typing quit() or the EOF (end of file) character. On UNIX, EOF is Ctrl+D; on Windows it is Ctrl+Z.
Python programs If you want to create a program that you can run repeatedly, put statements like the following in a text file:
# hello.py print('Hallo Welt')
Python source files are UTF-8 encoded text files, usually with a .py suffix. The # character denotes a comment that extends to the end of the line. International characters (Unicode) are free to use in your source code as long as you remember to use UTF-8 encoding (this is usually the default in most editors, but it never hurts to check your editor settings if you're doing this don't you're sure). To run the hello.py file, give the interpreter the filename like this: Shell % python3 hello.py Hello World Shell %
It is usual, #! to specify the interpreter on the first line of a program, like this: #!/usr/bin/env python3 print('Hello World')
If you give this file execute permissions on Unix (e.g. by using chmod +x hello.py), you can run the program by typing hello.py in your shell. On Windows, you can double-click a .py file or type the program's name into the Run command on the Windows Start menu to run it. He #! The line, if present, is used to select the version of the interpreter (Python 2 vs. 3). Note that a program's execution can take place in a console window that disappears immediately after the program has completed its execution (often before you can read its output). For debugging, it might be best to run the program in a Python development environment. The interpreter executes statements in order until it reaches the end of the input file. At this point, the program exits and Python exits.
Primitives, Variables, and Expressions Python provides a collection of primitive types such as integers, floats, and strings:
42 4.2 "forty-two" Right
# # # #
int float str bool
A variable is a name that refers to a value. A value represents an object of a specific type (e.g. an integer, float, string, etc.). x = 42
Sometimes you may see a type explicitly appended to a name. Example: x: int = 42
The presence of a type is merely a suggestion to aid in code readability and possible use by third party code review tools. Otherwise it is completely ignored. It doesn't prevent you from assigning a different value type later. An expression is a combination of primitives, names, and operators that produce a value: 2 + 3 * 4
# -> 14
The following program demonstrates the use of variables and expressions in calculating compound interest. # Interest.py principal = 1000 # initial amount = 0.05 # interest rate nyears = 5 # number of years year = 1 while year > 4
# x = 0b1100 (12)
In this example, you write 0b11001001 as an integer value in binary form. You could have written it as decimal 201 or hexadecimal 0xc9, but if you play around with the bits, the binary makes it easier to visualize what you're doing. The semantics of the bitwise operators assume that integers are represented in two's complement binary representation and that the sign bit extends left infinity. Some care is required when working with raw bit patterns that are intended to be mapped to native integers in hardware. This is because Python doesn't truncate bits or overflow values; instead, the result will increase in size arbitrarily. It is up to you to ensure that the result is the correct size or cropped if necessary. To compare numbers, use the comparison operators in Table 4: Table 4: Comparison Operators
The result of a comparison is a Boolean True or False value. The and, or, and not operators can form more complex boolean expressions related to logical truth (not to be confused with the bit manipulation operators above). The behavior of these operators is as shown in Table 5. Table 5: Logical operators
A value is considered "false" if it is literally false, none, numerically zero, or blank. Otherwise it is considered true. It's common to write an expression that updates a value. For example: x = x + 1 y = y * n
Instead, you can use the following short form: x += 1 y *= n
This abbreviation for update can be used with all operators +, -, *, **, /, //, %, &, |, ˆ. Python does not have the increment (++) or decrement (--) operators that are sometimes found in other languages.
Conditions and Control Flow The while, if, and else statements are used for loops and running conditional code. Here's an example: if a < b: print('Computer says yes') else: print('Computer says no')
The bodies of the if and else clauses are denoted by indentation. The else clause is optional. To create an empty clause, use the pass statement like this: if a < b: pass # Do nothing else: print('Computer says no')
To process multiple test cases, use the elif statement as follows: if suffix == '.htm': content='text/html' elif suffix == '.jpg': content='image/jpeg' elif suffix = = '.png': content = 'image/png' plus: raise RuntimeError(f'Unknown content-type {suffix!r}')
If you assign a value in combination with a test, you can use a conditional expression like this: maxval = a if a > b else b
This is the same as the longest:
si a > b: maxval = a si nein: maxval = b
Sometimes you can see the assignment of a variable and a condition combined with the := operator. This is known as an assignment expression (or colloquially the "walrus operator" since := looks like a walrus lying on its side, presumably playing dead). For example: x = 0 while (x := x + 1) < 10: print(x)
# Print 1, 2, 3, ..., 9
The use of parentheses is always required to enclose an assignment expression. A loop can be terminated prematurely with the break statement. It only applies to the innermost loop. For example: x = 0 while x < 10: if x == 5: break # Stops the loop. Moves to "Done" below print(x) x += 1 print('Done')
The continuation statement is used to skip the rest of the loop body and return to the beginning of the loop. For example: x = 0 while x < 10: x += 1 if x == 5: continue print(x) print('Done')
# Skip printing (x). Returns to the beginning of the loop.
Text strings To define string literals, enclose them in single, double, or triple quotes, as follows: a b c d
= = = =
'Hello World' "Python is groovy" '''Computer says no''' """Computer still says no"""
The same kind of double quotes used to start a string should be used to end it. Triple-quoted strings capture all text preceding the last triple-quoted character, unlike single- and double-quoted strings, which must be specified on one logical line. Strings in triple quotes are useful when the content of a string literal spans multiple lines of text, as in the following: print('''Content-type: text/html Hello World Click here.''')
Immediately adjacent string literals are concatenated into a single string. Therefore, the example above could also be written as: print( 'Content-type: text/html\n' '\n' ' Hello World \n' 'Clock here\n' )
When the opening quotes of a string are preceded by an f, escaped expressions within a string are evaluated. For example, in previous examples, the following statement was used to print values from a calculation: print(f'{year:>3d} {principal:0.2f}')
Although only simple variable names are used here, any valid expression can appear. For example:
base_year = 2020 ... print(f'{base_year + year:>4d} {main:0.2f}')
As an alternative to f-strings, the format() method and the % operator are also sometimes used to format strings. For example: print('{0:>3d} {1:0.2f}'.format(year, parent)) print('%3d %0.2f' % (year, parent))
See Chapter 9 for more information on string format. Strings are stored as sequences of Unicode characters indexed by integers, starting at zero. Index of the negative indices from the end of the string. The length of a string s is calculated with len(s). To extract a single character, use the indexing operator s[i], where i is the index. a = 'Hello World' print(len(a)) b = a[4] c = a[-1]
# 11 # b = 'o' # c = 'd'
To extract a substring, use the division operator s[i:j]. This extracts all characters from s whose index k is in the range i >> s = 'hello\nworld' >>> print(str(s)) hello world >>> print(repr(s)) 'hello\nworld ' . >>>
When debugging, it is recommended to use repr(s) when generating the output. repr() shows you more detailed information about the value and its type. The format() function is used to convert a single value into a string with a specific format applied to it. For example: >>> x = 12.34567 >>> format(x, '0.2f') '12.35' >>>
The formatting code passed to format() is the same code that you would use with fstrings when producing formatted output. For example, the code above could be replaced with the following: >>> f'{x:0.2f}' '12.35' >>>
File input and output The following program opens a file and reads its contents line by line as text strings: with open('data.txt') as file: for line in file: print(line, end='') # end= '' skips the extra new line
The open() function returns a new file object. The preceding with statement declares a statement block (or context) in which the file (file) is to be used. As soon as control leaves this block, the file is automatically closed. If you don't use the with statement, your code should look like this: file = open('data.txt') for line in file: print(line, end='') # end='' skips the extra line break file .close()
It's easy to forget the extra step of calling close(), so it's usually best to just use the with statement and close the file for yourself. The for loop iterates over the file line by line until no more data is available. If you want to read the file completely as a string, use the read() method like this: with open('data.txt') as the file: data = file.read()
If you want to read a large file in chunks, give the read() method a size hint like this: with open('data.txt') as file: while (chunk := file.read(10000)): print ( piece, end='')
The := operator used in this example is assigned to a variable and returns its value so the while loop can test it for a break. when the end
read() returns an empty string from a file. An alternative way to write about the function is to use break like this: with open('data.txt') as file: while True: chunk = file.read(10000) if not chunk: break print(chunk , end =' ')
To direct a program's output to a file, pass a file argument to the print() function, as shown in this example: with open('out.txt', 'wt') as out: while year [ 'Dave' , 'Paula'] c -> ['Aya', 'Tom', 'Lewis', 'Alex'] Replace 'Paula' with 'Becky' 'Mark', 'Jeff'] # Replace the first two elements # with [ 'Dave', 'Mark', 'Jeff']
Use the plus operator (+) to concatenate lists: a = ['x','y'] + ['z','z','y']
# The result is ['x','y','z','z','y']
An empty list is created in two ways: names = [] names = list()
# An empty list # An empty list
Specifying [] for an empty list is more idiomatic. list is the name of the class associated with the list type. It is more commonly used when converting data to a list. For example: letters = list('Dave')
# Letters = ['D', 'a', 'v', 'e']
Most of the time, all elements of a list are of the same type (for example, a list of numbers or a list of strings). However, lists can contain any combination of Python objects, including other lists, as in the following example: a = [1, 'Dave', 3.14, ['Mark', 7, 9, [100, 101]] , 10]
Elements in nested lists are accessed using more than one indexing operation as follows: a[1] a[3][2] a[3][3][1]
# Returns 'Dave' # Returns 9 # Returns 101
The following pcost.py program demonstrates how to read data into a list and perform a simple calculation. This example assumes that the rows contain comma-separated values. The program calculates the sum of the product of two columns. # pcost.py # # Read the input lines of the form 'NAME, ACTIONS, PRICE'. # For example: # # SYM,123,456.78 import sys if len(sys.argv) != 2: raise SystemExit(f'Usage: {sys.argv[0]} filename') rows = [] with open(sys.argv [1], 'rt') as file: for line in file: rows.append(line.split(',')) # rows is a list of this form
# [ # ['SYM', '123', '456.78'] # ... # ] total = sum([int(row[1]) * float(row[2]) for row in rows ]) print( f'Gesamtkosten: {total:0.2f}')
The first line of this program uses the import statement to load the sys module from the Python library. This module is loaded to retrieve command line arguments found in the sys.argv list. The initial check ensures that a filename has been specified. Otherwise, a SystemExit exception is thrown with a helpful error message. In this message, sys.argv[0] contains the name of the program that is running. The open() function uses the filename given on the command line. The for line in file loop reads the file line by line. Each line is broken into a small list using the comma as a separator. This list is appended to the lines. The end result rows is a list of lists; Remember that a list can contain anything, including other lists. The expression [int(row[1]) * float(row[2]) for row in rows] constructs a new list by iterating through all lists in rows and computing the product of the second and third elements. This useful technique for creating a list is called list comprehension. The same calculation could also have been expressed in more detail as follows: values = [] for row in rows: values.append(int(row[1]) * float(row[2])) total = sum(values )
List comprehensions are usually a preferred technique for performing simple calculations. The built-in function sum() calculates the sum of all elements of a sequence.
Tuples To create simple data structures, you can pack a collection of values into an immutable object called a tuple. You create a tuple by
Enclosing a group of values in parentheses like this: Celebration = ('GOOG', 100, 490.10) Address = ('www.python.org', 80)
For completeness, tuples of 0 and 1 elements can be defined, but they have a special syntax: a = () b = (element,)
# 0 tuple (empty tuple) # 1 tuple (consider trailing comma)
The values in a tuple can be extracted as a list using a numeric index. However, it is more common to unwrap tuples into a set of variables like this: name, actions, price = hold host, port = address
Although tuples support most of the same operations as lists (such as indexing, splitting, and concatenation), a tuple's elements cannot be modified (i.e., you cannot replace, delete, or add new elements) after creation. existing tuple). This reflects the fact that a tuple is best viewed as a single immutable object made up of multiple parts, rather than a collection of distinct objects like a list. Tuples and lists are often used together to represent data. For example, this program shows how to read a file that consists of several columns of data separated by commas: # File with rows of the form "Name, Stocks, Price" filename = 'portfolio.csv' portfolio = [] with open( filename ) as file: for line in file: line = line. split(',') name = row[0] share = int(row[1]) price = float(row[2]) share = (Name, Shares, Price) portfolio.append(hold)
The resulting portfolio list created by this program looks like a two-dimensional array of rows and columns. Each row is represented by a tuple and can be accessed as follows: >>> wallet[0] ('AA', 100, 32.2) >>> wallet[1] ('IBM', 50, 91.1) >>>
Individual data items can be accessed as follows: >>> Portfolio[1][1] 50 >>> Portfolio[1][2] 91.1 >>>
To loop through all records and unpack fields into a set of variables: Total = 0.0 for Name, Stocks, Portfolio Price: Total + = Stocks * Price
Alternatively, you can use a list comprehension to do this calculation: Total = Sum([stocks * _price, stocks, portfolio price])
When iterating over tuples, the variable _ is sometimes used to indicate a discarded value. In the calculation above, this means that we ignore the first element (the name).
Sets A set is an unordered collection of unique objects and is used to find unique values or handle membership issues. To create a set, enclose a collection of values in braces or assign an existing collection of items to set(). For example:
names1 = { 'IBM', 'MSFT', 'AA' } names2 = set(['IBM', 'MSFT', 'HPE', 'IBM', 'CAT'])
The elements of an array are usually restricted to immutable objects. For example, you can create an array of numbers, strings, or tuples. However, you cannot create an array that contains lists. However, most common objects will probably work with a set; If in doubt try it. Unlike lists and tuples, sets are unordered and cannot be indexed by numbers. Also, the elements of a set are never duplicated. For example, if you examine the value of names2 from the code above, you get: >>> names2 {'CAT', 'IBM', 'MSFT', 'HPE'} >>>
Note that "IBM" occurs only once. Also remember that the order of items cannot be predicted and the result may differ from what is shown. The order may even change from one execution to another of the interpreter. If you are working with existing data, you can also create a set with a set comprehension. For example, this statement converts all stock names in the data from the previous section into an array: names = { s[0] for s in portfolio }
To create an empty array, use set() with no arguments: r = set()
# Initially empty array
Sets support a standard suite of operations including union, intersection, difference, and symmetric difference. Here is an example: a b c d e
= = = = =
t t t t t
| & ˆ
s s s t s
# # # # #
Union {'MSFT', 'CAT', 'HPE', 'AA', 'IBM'} Intersection {'IBM', 'MSFT'} Difference { 'CAT', 'HPE' } Difference { 'AA' } Symmetric difference { 'CAT', 'HPE', 'AA' }
The difference operation s - t gives elements in s that are not in t. The symmetric difference s ˆ t gives elements that are in s or t but not in both sets.
New elements can be added to an array with add() or update(): t.add('DIS') # Add a single element s.update({'JJ', 'GE', 'ACME'}) # Add several elements to s
An item can be removed with remove() or discard(): t.remove('IBM') s.discard('SCOX')
# Delete 'IBM' if the KeyError gene is missing. # Delete 'SCOX' if present.
The difference between remove() and discard() is that discard() doesn't throw an exception if the element doesn't exist.
Dictionaries A dictionary is a mapping between keys and values. You create a dictionary by enclosing the keys and values in braces ({ }), separated by colons, like this: s = { 'name' : 'GOOG', 'shares' : 100, 'price' : 490.10 }
To access the members of a dictionary, use the indexing operator like this: name = s['name'] cost = s['actions'] * s['price']
Inserting or changing objects works like this: s['shares'] = 75 s['date'] = '2007-06-07'
A dictionary is a useful way to define an object made up of named fields, as shown. However, dictionaries are also often used as a mapping for quick searches in chaotic data. For example, here is a dictionary of stock prices: price = { 'GOOG' : 490.1,
'AAPL': 123,5, 'IBM': 91,5, 'MSFT': 52,13}
In such a dictionary, you can look up a price with an expression like p = prices['IBM'].
Dictionary membership is tested with the in operator, as in the following example: if 'IBM' in Prices: p = Prices['IBM'] else: p = 0.0
This particular sequence of steps can also be performed more compactly using the get() method, as follows: p =prices.get('IBM', 0.0) 0.0
# Prices['IBM'] if present, otherwise
Use the del statement to remove an element from a dictionary: ['GOOG'] prices
Although strings are the most common key type, you can use many other Python objects, including numbers and tuples. For example, tuples are commonly used to create multipart or composite keys: prices = { } prices[('IBM', '2015-02-03')] = 91.23 prices['IBM', '2015-02-04 '] = 91.42
# omitted parent
Any type of object can be placed in a dictionary (including other dictionaries). However, mutable data structures such as lists, arrays, and dictionaries cannot be used as keys. Dictionaries are often used as building blocks for various algorithms and data processing problems. One of these problems is tabulation. For example, you could count the total number of actions for each action name in the previous data:
Portfolio = [('ACME', 50, 92.34), ('IBM', 75, 102.25), ('PHP', 40, 74.50), ('IBM', 50, 124.75) ] total_shares = { s[0] : 0 for s in portfolio } for name, shares, _ in portfolio: total_shares[name] += Shares # total_shares = {'IBM': 125, 'ACME': 50, 'PHP' : 40}
In this example, { s[0]: 0 for s in portfolio } is an example of dictionary comprehension. Builds a dictionary of key/value pairs from another data collection. In this case, you create an initial dictionary that assigns 0 to action names. The following for loop iterates through the dictionary and adds any actions included for each action icon. Common data processing tasks like these have often already been implemented using library modules. For example, the collections module has a Counter object that can be used for the same task: from collections import Count total_holds = Count() for name, Shares, _ in portfolio: total_holds[name] += Shares # total_holds = Counter({ 'IBM ': 125, 'ACME': 50, 'PHP': 40})
An empty dictionary is created in two ways: prices = {} prices = dict()
# An empty dictation # An empty dictation
It's more idiomatic to use {} for an empty dictionary, although some caution is needed as it sounds like you might as well try to create an empty set (use set() instead). dict() is commonly used to create dictionaries from key values. For example:
pares = [('IBM', 125), ('ACME', 50), ('PHP', 40)] d = dict(pares)
To get a list of dictionary keys, convert a dictionary to a list: syms = list(prices)
# Symbole = ['AAPL', 'MSFT', 'IBM', 'GOOG']
Alternatively, you can get the keys with dict.keys(): syms =prices.keys()
The difference between these two methods is that keys() returns a special "key view" that is attached to the dictionary and actively reflects any changes made to the dictionary. For example: >>> d = { 'x': 2, 'y':3 } >>> k = d.keys() >>> k dict_keys(['x', 'y']) >>> d['z'] = 4 >>> k dict_keys(['x', 'y', 'z']) >>>
The keys always appear in the same order that the items were originally inserted into the dictionary. Therefore, converting the list above will keep this order. This can be useful when using dictations to represent key-value data read from files and other data sources. The dictionary preserves the input order. This can help with readability and debugging. It's also good if you want to write the data back to a file. Prior to Python 3.6, however, this order is not guaranteed and should not be relied upon when compatibility with older versions of Python is required. Even with multiple deletions and insertions, the order is not guaranteed. To retrieve the values stored in a dictionary, use the dict.values() method. To get key-value pairs, use dict.items(). For example, to iterate over the entire contents of a dictionary as key-value pairs: for sym, price in prices.items(): print(f'{sym} = {price}')
Iteration and Loops The most commonly used looping construct is the for statement, which is used to iterate over a collection of items. A common form of iteration is to iterate over all members of a sequence, such as a string, list, or tuple. Here's an example: for n in [1, 2, 3, 4, 5, 6, 7, 8, 9]: print(f'2 to the {n} power is {2**n}')
In this example, the variable n is assigned successive elements of the list [1, 2, 3, 4, ..., 9] at each iteration. Since iterating over integer ranges is quite common, the following abbreviation is often used for this purpose: for n in range(1, 10): print(f'2 to the power of {n} is {2**n} ')
The range(i, j [,step]) function creates an object representing a range of integers with values from i to j, but not including j. If the initial value is omitted, it is assumed to be zero. An optional step can also be specified as the third argument. Here are some examples: a b c d
= = = =
range(5) range(1, 8) range(0, 14, 3) range(8, 1, -1)
# # # #
A B C D
= = = =
0, 1, 0, 8,
1, 2, 3, 7,
2, 3, 6, 6,
3, 4, 9, 5,
4 5, 6, 7 12 4, 3, 2
The object created by range() calculates the values it represents on-demand when lookups are requested. So it's efficient to use even if you represent a variety of numbers. The for statement is not limited to sequences of integers and can be used to iterate over many types of objects, including strings, lists, dictionaries, and files. Here's an example: message = 'Hello world' # Prints each character in the message for c in the message: print(c)
names = ['Dave', 'Mark', 'Ann', 'Phil'] # Print the members of a list for names in names: print(name) prices = { 'GOOG': 490.10, 'IBM': 91.50 , ' AAPL' : 123.15 } # Print all members of a dictionary for price input: print(key, '=',prices[key]) # Print all lines of a file as a file with open('foo.txt') : for the line in the File: print(line, end='')
The for loop is one of the most powerful language features in Python because you can create custom iterator objects and generator functions that feed them with sequences of values. More details on iterators and generators are provided later in this chapter and in Chapter X, “Iterators and Generators”.
Functions The def statement is used to define a function, as shown in the following example: def rest(a, b): q = a // b # // is a truncated division. r = a - q * b returns r
To invoke a function, use the function name followed by its arguments in parentheses, e.g. B. result = remainder (37,
fifteen).
It is common for a function to include a docstring as its first declaration. This string feeds the help() command and can be used by IDEs and other development tools to help the programmer. For example:
def rest(a, b): ''' Compute the remainder of dividing a by b''' q = a // b r = a - q * b return r
If the inputs and outputs of a function are not clear from their names, they can be annotated with the types: def rest(a:int, b:int) -> int: ''' Computes the remainder of dividing a by b ' ' ' q = a // b r = a - q * b returns r
Such annotations are for informational purposes only and are not actually applied at runtime. So someone could still call the above function with non-integer values as result = remainder(37.5,3.2). Use a tuple to return multiple values from a function, as shown here: def divide(a, b): q = a // b r = a - q * b return (q, r)
# If a and b are integers, then q is an integer
If multiple values are returned in a tuple, they can be unpacked into separate variables like this: quotient, rest = divide(1456, 33)
To assign a default value to a function parameter, use the assignment: def connect(hostname, port, timeout=300): # Function body...
If default values are provided in a function definition, they can be omitted from subsequent function calls. If omitted, the argument takes the specified default value. Here's an example: connect('www.python.org', 80) connect('www.python.org', 80, 500)
Standard arguments are often used to enable optional functionality. When there are many such arguments, readability can suffer. Therefore, it is recommended to specify such arguments using keyword arguments. Example: connect('www.python.org', 80, timeout=500)
If you know the names of the arguments, they can all be named when calling a function. The order in which they are listed is irrelevant when naming them. This is fine, for example: connect(port=80, hostname='www.python.org')
When variables are created or assigned within a function, their scope is local. That is, the variable is only defined within the body of the function and is destroyed when the function returns. Functions can also access variables defined outside of a function as long as they are defined in the same file. Example: debug = True
# Variable global
def read_data(filename): if debug: print('Reading', filename) ...
See Chapter 5 (Functions) for more information on scoping rules.
Exceptions If your program encounters an error, an exception is thrown and a trace message like the following is displayed:
Trace (last current call): file "readport.py", line 9, in Shares = int(row[1]) ValueError: invalid literal for int() with base 10: 'N/A'
The trace message indicates the type of error that occurred along with its location. Errors usually cause a program to terminate. However, you can catch and handle exceptions using test and exception statements like this: portfolio = [] with open('portfolio.csv') as file: for line in file: row = line.split(',') try : name = row[0] stocks = int(row[1]) price = float(row[2]) holding = (name, stocks, price) portfolio.append(holding) except ValueError as err: print('Bad row : ' , row) print('Reason:', err)
When a ValueError occurs in this code, the details of what caused the error are placed in err and control passes to the code in the Except block. If any other type of exception is thrown, the program will fail as usual. If no errors occur, the code in the exception block is ignored. When an exception is handled, program execution continues with the statement immediately following the last exception block. The program does not return to the place where the exception occurred. The raise statement is used to signal an exception. When throwing an exception, you must specify the name of an exception. For example RuntimeError, a built-in exception, in the following example: raise RuntimeError('Computer says no')
Properly managing system resources like locks, files, and network connections is often a tricky problem when combined with exception handling. Sometimes there are actions that still have to be carried out
what happens. To do this, use try-finally. For example, here's an example with a lock that must be released to avoid a deadlock: import threading lock = threading.Lock() ... lock.acquire() # Once a lock has been acquired, it MUST be released try : . .. statements... finally: lock.release() # Always execute
To simplify this programming, most objects that deal with resource management also support the with statement. Here is a modified version of the code above: with lock: ... Statements ...
In this example, the lock object is automatically obtained when the with statement is executed. When execution leaves the context of the with block, the lock is automatically released. This treatment occurs regardless of what happens inside the with block. For example, if an exception occurs, the lock is released when the control leaves the block context. The with statement typically only supports objects related to system resources or the execution environment, such as files, connections, and locks. However, custom objects can define their own custom processing, as described later in Chapter 3.
Program Termination A program terminates when there are no more statements to be executed in the entry program or when an uncaught SystemExit exception is thrown. If you want to force close a program, do the following:
Raise SystemExit() message Raise SystemExit("Something's wrong")
# exit without error # exit with error
On exit, the interpreter does its best to collect all active objects. However, if you need to perform a specific cleanup action (e.g. delete files, close a connection), you can register it with the atexit module like this: import atexit # Connection example = open_connection("deaddot.com") def cleanup( ) : print "Go away..." close_connection(connection) atexit.register(cleanup)
Objects and Classes All values used in a program are objects. An object consists of internal data and methods that perform various types of operations on that data. You've already used objects and methods when working with built-in types like strings and lists. For example: item = [37, 42] item.append(73)
# Create a list object. # Call the append() method
The dir() function lists the available methods for an object and is a useful tool for interactive experimentation when a sophisticated IDE isn't available. For example: >>> items = [37, 42] >>> dir(items) ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', ... 'append', 'count ', 'expand', 'index', 'insert', 'pop',
'remove', 'reset', 'order'] >>>
As you examine objects, you'll see familiar methods like append() and insert() in the list. However, you will also see special methods that always start and end with a double underscore. These methods implement several operators. For example, the __add__() method is used to implement the + operator. These methods are explained in more detail in later chapters. >>> Elements.__add__([73, 101]) [37, 42, 73, 101] >>>
The class declaration is used to define new object types and for object-oriented programming. For example, the following class defines a stack with push() and pop() operations: class Stack: def __init__(self): self._items = [ ]
# Initialize stack
def push(self, item): self._items.append(item) def pop(self): return self._items.pop() def __repr__(self): return f'' def __len__(self): return len(self ._elementos)
Within the class definition, methods are defined using the def statement. The first argument in each method always refers to the object itself. By convention, self is the name used for this argument. All operations involving an object's attributes must explicitly refer to its own variable. Methods with leading and trailing double underscores are special
methods. For example, __init__ is used to initialize an object. In this case, __init__ creates an internal list to store the batch data. To use a class, write code like this: s = Stack() s.push('Dave') s.push(42) s.push([3, 4, 5]) x = s.pop( ) y = s.pop()
# Create a stack # Slide some stuff on it
# x gets [3,4,5] # y gets 42
Inside the class you will notice that the methods use an internal variable _items. Python has no mechanism for hiding or protecting data. However, there is an observed programming convention where names preceded by a single underscore are considered "private". That is, in this example, _items should be treated (by you) as an internal implementation and not used from outside the Stack class itself. Note that this convention doesn't really apply; If you want to access _items, you can do so at any time. You only have to reply to your colleagues when they review your code. The __repr__() and __len__() methods are used to make the object work better with the rest of the environment. For example, __len__() makes a stack work with the built-in function len(), and __repr__() changes the way a stack is displayed and printed. It's generally a good idea to define __repr__() as it can make debugging easier. >>> s = Stack() >>> s.push('Dave') >>> s.push(42) >>> len(s) 2 >>> s
>>>
An important feature of objects is that you can extend or override the capabilities of existing classes through inheritance.
Suppose you want to add a method to swap the top two items on the stack. You could write a class like this: class MyStack(Stack): def swap(self): a = self.pop() b = self.pop() self.push(a) self.push(b)
it is identical to Stack except that it has a new swap() operation available. my stack
>>> s = MyStack() >>> s.push('Dave') >>> s.push(42) >>> s.swap() >>> s.pop() 'Dave' >>> s.pop() 42 >>>
Inheritance can also be used to change the behavior of an existing method. Suppose you want to constrain the stack to only contain numeric data. You could write a class like this: class NumericStack(Stack): def push(self, item): if not isinstance(item, (int, float)): raise TypeError('Expected an int or float') super(). push (object)
In this example, the push() method has been redefined to add an additional check. The super() operation is one way to call the push() definition above. Here's how this class would work: >>> s = NumericStack() >>> s.push(42) >>> s.push('Dave') Traceback (last call):
...TypeError: expected int or float >>>
Inheritance is often not the best solution. Suppose you want to define a simple 4-function stack-based calculator that works like this: >>> >>> >>> >>> >>> >>> >>> >>> 14 >>>
# Calcular 2 + 3 * 4 calc = Calculadora() calc.push(2) calc.push(3) calc.push(4) calc.mul() calc.add() calc.pop()
You might look at this code, see the use of push() and pop() and think that the calculator could be defined by inheriting from Stack. Although that would work, it's probably best to define Calculator as an entirely separate class like this: class Calculator: def __init__(self): self._stack = Stack() def push(self, item): self._stack.push ( item ) def pop(self): return self._stack.pop() def add(self): self.push(self.pop() + self.pop()) def mul(self): self.push( self.pop() * self.pop()) def sub(self):
derecha = self.pop() self.push(self.pop() - rechts) def div(self): right = self.pop() self.push(self.pop() / right)
In this implementation, a calculator includes a stack as an internal implementation detail. This is an example of "composition". Also, the push() and pop() methods delegate to the inner stack. The main reason for this approach is that you don't really think of the calculator as a stack. It's a separate concept, a different kind of object. Similarly, your phone contains a central processing unit (CPU), but you don't typically think of your phone as some sort of CPU.
Modules As your programs grow in size, you will want to split them into multiple files for easy maintenance. To do this, use the import statement. To create a module, put the relevant declarations and definitions in a file that has the same name as the module. (Note that the file must have a .py extension.) Here's an example: # readport.py # # Read a file with 'NAME, STOCK, PRICE' data def read_portfolio(filename): portfolio = [] with open ( filename) as file: for line in file: row = line. divide(',') test: Name = Row[0] Shares = Int(Row[1]) Price = Float(Row[2]) Holding = (Name, Stocks, Price) Portfolio.append(Holding) except ValueError as err:
print('Bad line:', line) print('Reason:', err) Return wallet
To use your module in other files, use the import statement. For example, here is a pcost.py module using the read_portfolio() function above: # pcost.py import readport def portfolio_cost(filename): ''' Calculate the total price of stocks* in a portfolio ''' port = readport . read_portfolio(filename) return sum(stock * price of _, stock, port price)
The import statement creates a new namespace (or environment) and executes all statements in the associated .py file within that namespace. To access the content of the namespace after import, use the module name as a prefix, as in readport.read_portfolio() in the previous example. If the import statement fails with an ImportError exception, there are a few things you need to check in your environment. First, check the filename to make sure you've created a file called readport.py. Next, check the directories listed in sys.path. If your file isn't stored in one of these directories, Python won't be able to find it. If you want to import a module with a different name, add an optional qualifier to the import statement, like this: import readport as rp port = rp.read_portfolio('portfolio.dat')
To import definitions specific to the current namespace, use the from statement: from readport import read_portfolio port = read_portfolio('portfolio.dat')
As with objects, the dir() function lists the contents of a module and is a useful tool for interactive experimentation. >>> import readport >>> dir(readport) ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'read_portfolio'] . ..>>>
Python provides a large standard library of modules that can simplify certain programming tasks. For example, the csv module is a standard library for handling comma-separated value files. You could use it in your program like this: # readport.py # # Read a file from 'NAME, STOCK, PRICE' data import csv def read_portfolio(filename): portfolio = [] with open(filename) as file: rows = csv.reader(file) for row in row: try: name=row[0] share=int(row[1]) price=float(row[2]) share=(Name, Shares, Price) Portfolio. holding) except ValueError as error: print('Bad row:', row) print('Reason:', err) return portfolio
Python also has a large number of third-party modules that can be installed to solve almost any task imaginable, including reading CSV files. See https://pypi.org.
Scripting Any file can be run as a script or imported as a library imported with import. To better support imports, script code is often appended with a conditional check against the module name like this: # readport.py # # Read a file from 'NAME, SHARE, PRICE' import data csv def read_portfolio(filename): ... def main(): portfolio = read_portfolio('portfolio.csv') for name, shares, price in portfolio: print(f'{name:>10s} {shares:10d} { price:10.2f}') if __name__ = = '__Principal__': Principal()
is a built-in variable that always contains the name of the attached module. When a program is run as the main script with a command like python readport.py, the __name__ variable is set to '__main__'. Otherwise, if the code is imported with a statement like import readport, the variable __name__ is 'readport'. __Surname__
As shown, the program is coded to use a filename "portfolio.csv". Instead, you can prompt the user for a filename or accept the filename as a command-line argument. To do this, you can use the built-in input() function or the sys.argv list. For example, here is a modified version of the main() function: def main(argv): if len(argv) == 1: filename = input('Enter filename: ') elif len(argv) == 2: filename = arg [1]
else: aumentar SystemExit(f'Usage: {argv[0]} [filename]') portfolio = read_portfolio(filename) for name, acciones, precio en portfolio: print(f'{name:>10s} {shares:10d} {price:10.2f}') if __name__ == '__main__': import sys main(sys.argv)
This program can be run from the command line in two different ways: bash %python readport.py Enter the file name: portfolio.csv ... bash %python readport.py portfolio.csv ... bash %python readport.py a b c Usage: readport.py [filename] bash%
For very simple programs it is often sufficient to process arguments in sys.argv as shown. For more advanced usage, the standard library module argparse can be used.
Packages In large programs it is common to organize code in a package structure. A package is a hierarchical collection of modules. To do this, organize your code on the file system as a collection of files in a directory like this: tutorial/ __init__.py readport.py pcost.py stack.py ...
The directory must have an __init__.py file, which can be empty. Once you've done that, you should be able to run a nested import statement. to
Example: import tutorial.readport port = tutorial.readport.read_portfolio('portfolio.dat')
If you don't like long names, you can shorten things by using another import like this: from tutorial.readport import read_portfolio port = read_portfolio('portfolio.dat')
A tricky problem that arises with packages concerns imports between files within the same package. An earlier example showed a pcost.py module started with an import like this: # pcost.py import readport...
If the pcost.py and readport.py files are put into a package, this import statement will fail. To fix this you should use a fully qualified module import like this: # pcost.py from tutorial import read port...
Alternatively, you can use a pack-relative import like this: # pcost.py from . Import reading port...
The last way has the advantage that the package name is not encoded. This makes it easier to rename a package or move it within your project. There are many other subtle details related to packages that will be covered later. See Chapter 8.
Structuring an Application As you begin to write more Python code, you may find yourself working on larger applications that contain a combination of your own code and third-party dependencies. Managing all of this is a complex subject that is constantly evolving. There are also many conflicting opinions about what constitutes “best practice”. However, there are some essential facets that you should know. First, it is common to organize large code bases into a package structure as described in the last section (e.g. directories with .py files containing the special __init__.py file). Choose a unique package name for the top-level directory name. The main purpose of the package directory is to manage import declarations and the namespace of modules used during programming. You want your code to be isolated from the code of others. In addition to your main project's source code, you may also have tests, examples, scripts, and documentation. This additional material is generally in a different set of directories than the package containing your source code. So it's common practice to create a top-level mounted directory for your project and put all your work under it. For example, a fairly typical project organization might look like this: tutorial-project/ tutorial/ __init__.py readport.py pcost.py stack.py ... tests/ test_stack.py test_pcost.py ... example/ sample. py... doc/
Tutorial.txt ...
Note that there is more than one way to do this, and the nature of the problem you are solving may dictate a different structure. However, as long as your main source code is in a proper package (again, the directory with the __init__.py file), you should be fine.
Third-party package management Python has a large library of contributed packages, which can be found in the Python Package Index (https://pypi.org). You may want to use some of these packages in your own code. To install a third-party package, you typically use a pip command like this: bash % python3 -m pip install somepackage
Installed packages are placed in a special site package directory, which you can find by checking the value of sys.path. For example, on a Unix machine, packages can be placed in a directory like /usr/local/lib/python3.8/site-packages. If you're ever wondering where a package came from, you can also check a package's __file__ attribute after importing it into the interpreter: >>> import pandas >>> pandas.__file__ '/usr/local/lib/ python3 .8/sitepackages/pandas/__init__.py' >>>
A potential problem with installing a package is that you may not have permission to modify the locally installed version of Python. Even if you had permission to do so, this might not be a good idea (e.g. many systems come with Python already installed so that different system utilities can be used. Changing the installation of this version of Python is usually not a good idea). To create a sandbox where you can install and work with packages without worrying about corruption, you can create what's called a "virtual environment" by typing a command like this: bash % python3 -m venv myproject
This sets up a dedicated Python installation for you in a directory called myproject/. In this directory you will find an interpreter executable and a library where you can safely install packages. For example, if you run myproject/bin/python3 you will get an interpreter configured for your personal use. You can install packages in this interpreter without worrying about breaking any part of the default Python installation. To install a package you would use pip as before, but be careful to specify the correct interpreter: bash % ./myproject/bin/python3 -m pip install somepackage
There are several tools aimed at making pip and venv easier to use. These issues can also be handled automatically by your IDE. Since this is a fluid and constantly evolving part of Python, no further advice is given here.
Python: Fits Your Brain In the early days of Python, "fits your brain" was a common catchphrase used to describe it. Even today, the core of Python consists of a simple programming language along with a useful collection of built-in objects such as lists, arrays, and dictionaries. A wide range of practical problems can only be solved with the basic functions presented in this chapter. It's often good to keep this in mind as you begin your Python adventure; While there are always more complicated ways to solve a problem, there may also be an "easy" way to do it, just using Python's basic functionality. offers. When in doubt, you'll probably thank your future self for it.
Operators, Expressions, and Data Manipulation This chapter describes Python expressions, operators, and evaluation rules related to data manipulation. Expressions are essential for performing useful calculations. Additionally, third-party libraries can often customize Python's behavior to provide a better user experience. This chapter describes high-level expressions. Chapter 3 describes the underlying protocols that can be used to customize the behavior of the interpreter.
Literals A literal is a single value entered directly into a program, such as 42, 4.2, or 'forty-two'. Integer literals represent a signed integer value of any size. It is possible to specify integers in binary, octal, or hexadecimal, as shown here: 42 0b101010 0o52 0x2a
# Binary integer # Octal integer # Hexadecimal integer
The base is not stored as part of the integer value. All of the above literals show up as 42 when printed. You can use the built-in functions bin(x), oct(x), or hex(x) to convert an integer to a string that represents its value in different bases. Floating point numbers can be written by providing a number with a decimal point or scientific notation where an e or E is used to indicate an exponent. All of the following examples are floating point numbers: 4.2 42. .42 4.2e+2
4.2E2-4.2e-2
Internally, floating-point numbers are represented as IEEE 754 double-precision (64-bit) values. When writing numeric literals, a single underscore (_) can be used as a visible separator between digits. For example: 123_456_789 0x1234_5678 0b111_00_101 123.789_012
The digit separator is not stored as part of the number; This is just a syntax to make large numeric literals more readable in source code. Boolean literals are written as True and False. String literals can be written by enclosing characters in single, double, or triple quotes. Strings enclosed in single and double quotes must appear on the same line. Strings enclosed in triple quotes can span multiple lines. For example: 'Hello World' "Hello World" '''Hello World''' """Hello World"""
Tuples, lists, sets, and dictionary literals are written as follows: (1, 2, 3) [1, 2, 3] {1, 2, 3} {'x':1, 'y':2 , 'z': 3}
# # # #
Dictation of the theorem of lists of tuples
Expressions and Locations An expression represents a calculation that evaluates to a specific value. It consists of a combination of literals, names, operators, and function/method calls. An expression can always appear on the right side of an assignment statement and can be used as an operand in operations
with other expressions or passed as a function argument. For example: value = 2 + 3 * 5 + sqrt(6+7)
Operators such as + and * represent an operation performed on objects provided as operands. For example addition or multiplication. sqrt() is a function that operates on input arguments. The left side of an assignment represents a place where a reference to an object is stored. A location can be a simple identifier like the value shown in the previous example. It could also be an attribute of an object or an index within a container. For example: a = 4 + 2 b[1] = 4 + 2 c['key'] = 4 + 2 d.value = 4 + 2
Reading a value from a location is also an expression. For example: value = a + b[1] + c['key']
Assigning a value and evaluating an expression are separate concepts. In particular, you cannot use the assignment operator as part of an expression to write code like this: while line=file.readline(): print(line)
# Syntax error.
However, an "assignment-expression" operator (:=) can be used to perform this combined action of evaluating assignment and expression. For example: while (line:=file.readline()): print(line)
The := operator is generally only used in combination with statements such as if and while. In fact, trying to use it as a normal assignment operator will result in a syntax error unless you enclose it in parentheses.
standard operators
Python objects can be made to work with any of the operators in Table 1. Table 1: Standard operators
Usually these have a numerical interpretation. However, there are notable special cases. For example, the + operator is also used to concatenate sequences, the * operator replicates sequences, - is used to specify differences, and % is defined to format strings:
[1,2,3]+[4,5]#[1,2,3,4,5][1,2,3]*4#[1,2,3,1,2,3,1, 2,3,1,2,3] '%s has %d messages' % ('Dave', 37)
Operator verification is a dynamic process. Mixed data type operations will often "work" when there is an intuitive feeling that an operation should work. For example, you can add whole numbers and fractions. >>> from fractions import Fraction >>> a = Fraction(2, 3) >>> b = 5 >>> a + b Fraction(17, 3) >>>
However, it's not always foolproof. For example, it doesn't work with decimals. >>> import from decimal Decimal >>> import from fractions Fraction >>> a = Fraction(2, 3) >>> b = Decimal('5') >>> a + b Trace (last last call): File "", Line 1, at TypeError: Unsupported operand types for +: 'Fraction' and 'decimal.Decimal' >>>
However, for most combinations of numbers, Python follows a standard number hierarchy of booleans, integers, fractions, floating point, and complex numbers. Mixed operations just "work" and you don't have to worry about it.
Direct Assignment Python provides the "direct" or "extended" assignment operations in Table 2: Table 2: Extended Assignment Operators
Expressions are ignored. Instead, they are a syntactic convenience for updating a value "in place". For example: a = 3 a = a + 1 a += 1
# like = 4 # like = 5
Mutable objects can use these operators to perform an in-place mutation on the data as an optimization. Consider this example: >>> a = [1, 2, 3] >>> b = a >>> a += [4, 5] list) >>> a [1, 2, 3, 4, 5 ] >>> second [1, 2, 3, 4, 5] >>>
# Create a new reference to a # local update (don't create a new one
In this example, a and b are references to the same list. When a += [4, 5] is executed, it updates the existing list object without creating a new list.
Therefore, b also sees this update. This is usually surprising.
Object comparison The equality operator (x == y) tests the equality of the values of x and y. In the case of lists and tuples, they must be of the same size, have the same elements, and be in the same order. For dictionaries, True is only returned if x and y have the same key set and all objects with the same key have the same values. Two sets are equal if they have the same elements. An equality comparison between objects of incompatible types, such as a file and a floating point number, does not generate an error but returns False. However, sometimes a comparison between objects of different types yields a true result. For example, compare an integer and a floating point number with the same value: >>> 2 == 2.0 True >>>
The identity operators (x is y and x is not y) test two values to see if they literally refer to the same object in memory (e.g. id(x) == id(y)). In general it can be that x == y, but x is not y. For example: >>> a >>> b >>> a False >>> a True >>>
= [1, 2, 3] = [1, 2, 3] is second == second
In practice, comparing objects with the is operator is almost never what you want. Use the == operator for all comparisons unless you have very good reason to believe that two objects have the same identity.
Ordered comparison operators
The ordered comparison operators in Table 3 have the standard mathematical interpretation of numbers and return a Boolean value. Table 3: Ordered comparison operators
For sets, x < y tests whether x is a strict subset of y (i.e. has fewer elements but is not equal to y). When comparing two sequences, the first elements of each sequence are compared. If they differ, this determines the result. If they are equal, the comparison moves to the second element of each sequence. This process continues until either two different elements are found or there are no more elements in either sequence. When the end of both sequences is reached, the sequences are considered equal. If a is a subsequence of b, then a < b. Strings and bytes are compared using lexicographic order. Each character is assigned a unique numeric index determined by the character set (such as ASCII or Unicode). A character is less than another character if its index is less. Not all types support ordered comparisons. For example, attempting to use < in dictionaries is undefined and results in a TypeError. Likewise, applying ordered comparisons to incompatible types (such as between a string and a number) will result in a TypeError.
Boolean Expressions and Boolean Values The and, or, and not operators can form more complex Boolean expressions. The behavior of these operators is shown in Table 4. Table 4: Logical operators
When you use an expression to determine a true or false value, any non-zero number, non-empty string, list, tuple, or dictionary is considered true. False, null, none, and empty lists, tuples, and dictionaries all evaluate to false. Boolean expressions are evaluated from left to right, using the right operand only when necessary to determine the final value. For example, a and b evaluate to b only if a is true. This is sometimes referred to as a "short" evaluation. It can be a useful way to simplify code that involves testing and post-processing. For example: if y != 0: result = x / y else: result = 0 # alternative result = y y x / y
In the second version, the x
/ Year
Division is only performed if y is not zero.
Relying on the implicit "truth" of objects is sometimes a very hard-to-find source of error. For example, consider this function: def f(x, items=None): if not items: items = [] items.append(x) returns Items
This function has one optional argument which, if omitted, causes a new list to be created and returned. For example,
>>> foo(4) [4] >>>
However, the function behaves quite strangely if you pass it an existing empty list as an argument: >>> a = [] >>> foo(3, a) [3] >>> a [] >>>
# Note that it has NOT been updated
This is a truth check error. Empty lists evaluate to false, so the code created a new list instead of using the (a) passed as an argument. To fix the code, you need to be more specific with your check against None: def f(x, items=None): if items is None: items = [] items.append(x) return items
It's always good practice to be specific when implementing conditional controls.
Conditional Expressions A common programming pattern is to conditionally assign a value based on the result of an expression. For example: If a > >>> >>> [3, >>>
a = [3, 4, 5] b = [6, 7] a + b 4, 5, 6, 7]
The operator s * n makes n copies of a sequence. However, these are shallow copies replicating elements for reference only. For example, consider the following code: >>> >>> >>> >>>
a = [3, 4, 5] b = [a] c = 4 * b c
[[3, 4, 5], [3, 4, 5], [3, 4, 5], [3, 4, 5]] >>> a[0] = -7 >>> c [[- 7, 4, 5], [-7, 4, 5], [-7, 4, 5], [-7, 4, 5]] >>>
Notice how the change has changed all of the items in the list. c. In this case, a reference to list a was set in list b. When b was replicated, four additional references to a were created. If a was eventually changed, that change was propagated to all other "copies" of a. This behavior of sequence multiplication is often unexpected and not intended by the programmer. One way to work around the problem is to create the replicated stream manually by copying the contents of a. Here is an example: a = [ 3, 4, 5 ] c = [list(a) for _ in range(4)]
# list() creates a copy of a list
The indexing operator s[n] returns the nth object in a sequence in which s[0] is the first object. Negative subscripts can be used to get characters from the end of a sequence. For example, s[-1] returns the last element. Otherwise, attempts to access elements that are out of range result in an IndexError exception. The slice operator s[i:j] extracts a subsequence of s consisting of the elements with the index k, where i > a >>> b >>> a {'a',
= {'a', 'b', 'c' } = {'c', 'd'} | b 'b', 'c', 'd'}
>>> a & b >>> {'c' } >>> a - b {'a', 'b'} >>> b - a {'d'} >>> a ^ b {'a' , 'b', 'd'} >>>
The setup operations also work on the key view and element view objects of dictionaries. For example, if you want to find out what keys two dictionaries have in common, you can do: >>> a = { 'x': 1, 'y': 2, 'z': 3 } >>> b = { 'z ': 3, 'w': 4, 'q': 5 } >>> a.keys() & b.keys() { 'z' } >>>
Operations with Assignments An assignment is a mapping between keys and values. The built-in dictation type is an example. The operations in Table 10 can be applied to associations. Table 10: Operations on assignments
Key values can be any immutable object, such as strings, numbers, and tuples. If you use a tuple as a key, you can omit the parentheses and enter comma-separated values like this: d = { } d[1,2,3] = "foo" d[1,0,3] = " Bar"
In this case, the key values represent a tuple, making the above assignments the same as the following: d[(1,2,3)] = "foo" d[(1,0,3)] = "Pub"
Using a tuple as a key is a common technique for creating compound keys in a map (for example, a key consisting of a "last name" and a "first name").
List, Set, and Dictionary Comprehension One of the most common operations on data is the transformation of a data collection into another data structure. For example, take all elements of a list, apply an operation and create a new list: Numbers = [1, 2, 3, 4, 5] Squares = []
para n en nums: nums.append(n * n)
Because this type of operation is so common, it is available as an operator known as list comprehension. Here's a more compact version of this code that performs the same operation: nums = [1, 2, 3, 4, 5] squares = [n * n for n in nums]
It is also possible to apply a filter to the operation: squares = [n * n for n in numbers if n > 2]
# [9, 16, 25]
The general syntax for list comprehension is as follows: [expression for item1 in Iterable1 if condition1 for item2 in Iterable2 if condition2 ... for itemN in IterableN if conditionN]
Diese Syntax entspricht dem folgenden Code: result = [] for item1 in iterable1: if condition1: for item2 in iterable2: if condition2: ... for itemN in iterableN: if conditionN: result.append(expression)
List comprehensions are a very useful way to manipulate list data in a variety of ways. Here are some practical examples: # Some data (a list of dictionaries) portfolio = [ {'name': 'IBM', 'shares': 100, 'price': 91.1 }, {'name': 'MSFT', ' stocks': 50, 'price': 45.67 }, {'name': 'HPE', 'stocks': 75, 'price': 34.51 }, {'name': 'CAT', 'stocks' : 60, 'price': 67.89 },
{'name': 'IBM', 'shares': 200, 'price': 95.25 } ] # collect all names ['IBM', 'MSFT', 'HPE', 'CAT', 'IBM' ] names = [ s['name'] for s in portfolio] # Find all entries with more than 100 stocks ['IBM'] more100 = [s['name'] for s in portfolio if s['shares'] > 100 ] # Find total shares*price cost = sum([s['shares']*s['price'] for s in portfolio]) # Collect(name, shares) tuple share_name = [ (s['name'] , s[ ' shares' ]) for s in portfolio]
All variables used within a list comprehension are private to the comprehension. So you don't have to worry about these variables overwriting other variables with the same name. For example: >>> >>> >>> [1, >>> 42 >>>
x = 42 squares = [x*x for x in [1,2,3]] squares 4, 9] x
Instead of creating a list, you can also create an array by changing the brackets to curly brackets. This is called common understanding. An established understanding gives you a clear set of values. Example: # Set Comprehension Names = { s['Name'] for s in the portfolio } # names = { 'IBM', 'MSFT', 'HPE', 'CAT' }
If you specify key:value pairs, a dictionary is created instead. This is called dictionary comprehension. For example: prices = { s['name']:s['price'] for s in portfolio } #prices = { 'IBM': 95.25, 'MSFT': 45.67, 'HPE': 34.51, 'CAT': 67.89 }
When creating sentences and dictionaries, keep in mind that later entries may overwrite earlier entries. In the price dictionary, for example, you get the current price for "IBM". The first prize is lost. Within a comprehension it is not possible to include any kind of exception handling. If this worries you, you should wrap the exceptions in a function like this: def toint(x): try: return int(x) exceptValueError: return None values = [ '1', '2', '-4 ' , ' n/a', '-3', '5' ] data1 = [toint(x) for x in values] # data1 = [1, 2, -4, None, -3, 5] data2 = [ toint(x ) for x in values if toint(x) is not None ] # data2 = [1, 2, -4, -3, 5]
The double evaluation of toint(x) in the last example can be avoided by using the := operator. For example: data3 = [v for x in values if (v:=toint(x)) is None] # data3 = [1, 2, -4, -3, 5] data4 = [v for x in values if (v:=toint(x)) is not None and v >= 0 ] # data4 = [1, 2, 5]
Generator Expressions A generator expression is an object that performs the same computation as a list comprehension, but iteratively produces the result. The syntax is the same as for list comprehensions, except that parentheses are used instead of square brackets. Here is an example: numbers = [1,2,3,4] squares = (x*x for x in numbers)
Unlike a list comprehension, a generator expression doesn't actually create a list or immediately evaluate the expression inside the parentheses. Instead, it creates a generator object that produces the values by iterating as needed. If you look at the output of the example above, you will see: >>> Squares
>>> next(squares) 1 >>> next(squares) 4 ... >>> for n in squares: ... print(n) 9 16 >>>
A generator expression can only be used once. If you try to iterate twice you get nothing: >>> for n in squares: ... print(n) ... >>>
The difference between list comprehensions and generator utterances is important but subtle. With a list comprehension, Python actually creates a list with the resulting data. With a generator expression, Python creates a generator that simply knows how to produce data on demand. In certain applications, this can significantly improve performance and memory usage. Here is an example: # Read file f = open('data.txt') lines = (t.strip() for t in f)
# open file # read lines, # remove trailing/leading lines
whitespace comments = (t for t on lines if t[0] == '#') # Alle Kommentare
for c in comments: print(c)
In this example, the builder expression that extracts lines and removes whitespace does not read or store the entire file in memory. The same applies to the expression that extracts comments. Instead, the lines of the file are read one at a time as the program begins iterating through the following for loop. During this iteration, the lines of the file are generated as needed and filtered accordingly. In fact, at no point during this process is the entire file loaded into memory. So this would be a very efficient way to extract comments from a gigabyte-sized Python source file. Unlike a list comprehension, a generator expression does not create an object that acts as a sequence. It cannot be indexed and none of the usual list operations work (e.g. add()). However, the items produced by a generator expression can be converted to a list using list(): clist = list(comments)
When passed as a single function argument, a series of parentheses can be removed. For example, the following statements are equivalent: sum((x*x for x in values)) sum(x*x for x in values)
In both cases a generator (x*x the sum() function).
# removed extra brackets for x in values)
is created and shared
The Attribute Operator (.) The dot operator (.) is used to access an object's attributes. Here's an example: foo.x = 3 print(foo.y) a = foo.bar(3,4,5)
There can be more than one dot operator in a single expression, as in foo.y.a.b. The dot operator can also be applied to intermediate results.
of functions, as in a = foo.bar(3,4,5).spam. However, stylistically, it is not common for programs to create long attribute search strings.
The Function Call Operator () The operator f(args) is used to make a function call on f. Each argument to a function is an expression. Before calling the function, all argument expressions are fully evaluated from left to right. This is sometimes referred to as application-based order evaluation. See Chapter 5 for more information on the features.
Order of Evaluation Table 11 lists the order of operations (precedence rules) for Python operators. All operators except the power operator (**) are evaluated from left to right and are listed in the table from highest to lowest precedence. That is, the operators listed first in the table are evaluated before the operators listed later. (Note that operators that are contained together in subsections, such as x * y, x / y, x // y, x @ y, and x % y, have the same precedence.) Table 11: Evaluation order ( from highest priority to lowest)
The order of evaluation is not determined by the x and y types in the x table. Therefore, although custom objects can override individual operators, it is not possible to customize the evaluation order, precedence, and underlying associativity rules. A common confusion of precedence rules involves the use of bitwise and (&) and bitwise or (|) operators to mean logical-and (and) and logical-or.
(either). For example: >>> a = 10 >>> a >> a >>
The problem with this is that the last expression to 1) < a or a > (a >>
This may seem like an esoteric edge case, but it's quite common in data-oriented packages like numpy and pandas. The logical operators and and or are not customizable, so bitwise operators are used instead, although they have higher precedence and evaluate differently when used in Boolean relationships.
Final Words: The Secret Life of Data One of Python's most important uses is in applications that involve data manipulation and analysis. This makes Python provide a specific kind of "domain language" to think about your problem. The built-in operators and expressions are the core of this language and everything else builds on them. So if you can develop some kind of intuition for Python's built-in objects and operations, you'll find that your intuition has applications elsewhere. Suppose you are working with a database and you want to iterate through the records returned by a query. You will most likely use the "for" statement to do just that. Or suppose you are working with numeric arrays and want to calculate the arrays element by element. You might think that the standard math operators would work and your intuition would be correct. Or suppose you are using a library to retrieve data over HTTP and you want to access the content of the HTTP headers.
There's a good chance the data will be presented in a way that makes it look like a dictionary. Chapter 4 provides more information about Python's internal protocols and how to customize them.
Program Structure and Control Flow This chapter covers the details of the program structure and control flow. Topics include conditions, loops, exceptions, and context managers.
Program Structure and Execution Python programs are structured as a sequence of statements. All language features, including variable assignments, expressions, function definitions, classes, and module imports, are declarations, which have the same status as all other declarations, which means that any declaration can be placed almost anywhere in a program (although certain statements such as return can only appear within a function). For example, this code defines two different versions of a function within a condition: if debug: def square(x): if not isinstance(x,float): raise TypeError('Float expected') return x * x else :def square (x) : returns x * x
When loading source files, the interpreter executes instructions in the order in which they appear until there are no more instructions to execute. This execution model applies to both files that you run as the main program and library files that are loaded by import.
Conditional Execution The if, else, and elif statements control the conditional execution of code. The general format of a conditional statement is as follows:
if-expression: instructions elif-expression: instructions elif-expression: instructions ... else: instructions
If no action is to be taken, you can omit the else and elif clauses of a condition. Use the pass statement when there are no statements for a particular clause: if expression: pass else: Statements
# To do: Please implement
Loops and Iterations Loops are implemented using for and while statements. Here is an example: while expression: statements for i in s: statements
The while statement executes statements until the associated expression evaluates to false. The for statement iterates over all elements of s until no more elements are available. The for statement works with any object that supports iteration. This includes the built-in sequence types like lists, tuples, and strings, but also all objects that implement the iterator protocol. In the declaration for i in s, the variable i is called the iteration variable. With each iteration of the loop, it receives a new value of s. The scope of the iteration's variable is not private to the for statement. when a
previously defined variable has the same name, this value is overwritten. Also, the iteration variable keeps the last value after the loop completes. If the elements produced by the iteration are Iterables of identical size, you can unpack their values into separate iteration variables with a statement like this: s = [(1, 2, 3), (4, 5, 6) ] for x , y , z in s: instructions
In this example, s must contain or create iterables with three elements each. At each iteration, the contents of the variables x, y, and z are assigned to the elements of the corresponding iterable. Although this usage is more commonly seen when s is a sequence of tuples, unpacking works when the elements in s are iterable, including lists, generators, and strings. Sometimes a throwaway variable like _ is used when unpacking. For example: for x, _, z in s: instructions
This example still puts a value into the _ variable, but the name implies that it's not interesting or useful in the statements that follow. If the elements produced by an iterable have different sizes, you can use wildcard unpacking to insert multiple values into one variable. For example: s = [ (1, 2), (3, 4, 5), (6, 7, 8, 9) ] for x, y, *extra in s: statements
# # # #
x = 1, y = 2, extra = [] x = 3, y = 4, extra = [5] x = 6, y = 7, extra = [8, 9] ...
In this example, at least two values x and y are required, but *extra contains any additional values that may be present. These values are always placed
in a list. At most one featured variable can appear in a single unpack. However, it can appear in any position. So both variants are allowed: for *first, x, y in s: ... for x, *middle, y in s: ...
When looping, it's sometimes useful to keep track of a numeric index in addition to the data values. Here is an example: i = 0 for x in s: instructions i += 1
Python provides a built-in function, enumerate(), that can be used to simplify this code: for i, x in enumerate(s): statements
creates an iterator that produces tuples (0, s[0]), (1, s[1]), (2, s[2]), etc. If desired, a different initial value for the count can be supplied with the start keyword argument be provided for enumerate() as follows: enumerate(s)
for i, x in enumerate(s, start=100): sentences
In this case, tuples of the form (100, are generated.
s[0]), (101, s[1]),
and so on it will be
Another common looping problem involves iterating over two or more iterables in parallel; For example, write a loop where on each iteration you want to take elements from different sequences like this: # s and t are two sequences i = 0, while i < len( s) and i < len(t): x = s [ i ] # Take an element of s
y = t[i] instructions i += 1
# Take an item from t
This code can be simplified using the zip() function. Example: # s and t are two sequences for x, y in zip(s, t): statements
combines the iterables s and t into an iterable of tuples (s[0], t[0]), (s[1], t[1]), (s[2], t[2]) and so on, where the shorter of s and t ends if they are of different lengths. The result of zip() is an iterator that produces the results when iterated. If you want to turn the result into a list, use list(zip(s, t)). Zipper(s, t)
To break out of a loop, use the break statement. For example, this code reads lines of text from a file until an empty line of text is found: with open('foo.txt') as file: for line in file: stripped = line.strip() if not stripped: break # An empty line , stop reading # Processing stripped line...
To advance to the next iteration of a loop (skip the rest of the loop body), use the Continue statement. This statement is useful when reversing a test and indenting a different level would make the program too nested or unnecessarily complicated. As an example, the following loop will skip all blank lines in a file: with open('foo.txt') as file: for line in file: stripped = line.strip() if not stripped: Continue # Skip blank line # Process stripped line ...
The break and continue statements apply only to the innermost loop that is currently executing. If you need to break out of a deeply nested loop structure, you can use an exception. Python does not provide a "goto" statement. You can also append the else statement to looping constructs, as in the following example: # for-else with open('foo.txt') as file: for line in file: stripped = line.strip() if not stripped: break # process the deleted row ... more: runtimeError('Missing section separator')
The else clause of a loop is only executed when the loop completes execution. This happens immediately (if the loop isn't running at all) or after the last iteration. If the loop is terminated prematurely with the break statement, the else clause is omitted. The main use case for the loop else clause is code that iterates over the data but needs to set or check some kind of flag or condition when the loop breaks prematurely. For example, if you didn't use anything else, you might need to rewrite the code above with a flag variable as follows: found_separator = False with open('foo.txt') as file: for line in file: stripped = line . strip() if not stripped: found_separator = True break # process stripped line ... if not found_separator: raise RuntimeError('Missing section separator')
Exceptions Exceptions indicate errors and interrupt the normal flow of control of a program. The augment statement throws an exception. The general format of the raise statement is raise Exception([value]), where Exception is the type of exception and value is an optional value that provides specific details about the exception. Here is an example: raise RuntimeError('Unrecoverable error')
To catch an exception, use the try and except statements, as shown here: try: file = open('foo.txt', 'rt') except FileNotFoundError as e: statements
When an exception occurs, the interpreter stops executing statements in the try block and looks for an exception clause that matches the type of exception that occurred. If one is found, control is passed to the first statement in the exception clause. After the exception clause is executed, control continues with the first statement that appears after the entire try-except block. A test statement is not required to match every possible exception that may occur. If no matching exception clause can be found, an exception will propagate and possibly be caught in another test block unless you can actually handle the exception elsewhere. For programming style reasons, you should only catch exceptions that your code can actually recover. If recovery is not possible, it is often better to propagate the exception. If an exception reaches the top level of a program uncaught, the shell terminates with an error message. If the raise statement is used alone, the last thrown exception will be rethrown (although this only works while a previously thrown exception is being handled). For example try: file = open('foo.txt', 'rt') except FileNotFoundError:
print("Well, that didn't work.") raise # Reraises the current exception
Each exception clause can be used with an as var modifier, which specifies the name of a variable in which to place an instance of the exception type when an exception occurs. Exception handlers can examine this value to get more information about the cause of the exception. For example, you can use isinstance() to check the type of exception. Exceptions have some standard attributes that can be useful in code that needs to take additional action in response to an error: e.args
The tuple of arguments provided when the exception is thrown. In most cases, this is a one-tuple element containing a string describing the error. For OSError exceptions, the value is a double or triple tuple containing an integer error number, a string error message, and an optional filename. e.__cause__
Old Exception when the exception was thrown intentionally in response to the handling of another exception. See the later section on chained exceptions. e.__Context__
Previous Exception if the exception was thrown unexpectedly while another exception was being handled. e.__tracking__
The stack trace object associated with the exception. The variable used to hold an exception value can only be accessed within the associated exception block. Once control exits the block, the variable becomes undefined. For example: try: int('N/A') except ValueError as e: print('Error:', e) print(e)
# Raise ValueError
# error -> name error. 'e' undefined.
Multiple exception handling blocks are specified with multiple exception clauses, as in the following example: try: do something other than TypeError as e: # handle type error... other than ValueError as e: # handle value error...
A single handler clause can catch multiple types of exceptions like this: try: do something other than (TypeError, ValueError) like e: # handle type or value errors...
To ignore an exception, use the pass statement like this: try: do everything except ValueError: pass
# Do nothing (shrug).
Silently ignoring errors is often dangerous and a hard-to-find source of errors. Even if it's ignored, it's often a good idea to optionally allow error reporting to a log or other location where you can review it later. like e: print(f'An error occurred: {e!r}')
If you catch all exceptions, you must be very careful to report accurate information about the error to the user. For example, the code above prints an error message and the associated exception value. if you don't
Including information about the value of the exception can make it very difficult to debug code that fails for unexpected reasons. The try statement also supports an else clause, which must follow the last except clause. This code runs if the code in the test block does not throw an exception. Here's an example: try: file = open('foo.txt', 'rt') except FileNotFoundError as e: print(f'Cannot open foo:{e}') data = '' else: data = file .read ( ) file.close()
Finally, the statement defines a cleanup action to be performed regardless of what happens in a try-except block. Here's an example: file = open('foo.txt', 'rt') try: # do some things... finally: file.close() # file closed no matter what happened
The final sentence is not used to catch errors. Rather, it is used to provide code that should always run, regardless of whether an error occurs. If no exception is thrown, the code in the final clause is executed immediately after the code in the try block. When an exception occurs, an appropriate exception block (if any) is executed first, and then control is passed to the first statement of the finally clause. If an exception is still pending after running this code, that exception will be rethrown to be caught by another exception handler.
The Exception Hierarchy One of the challenges of working with exceptions is managing the large number of exceptions that can potentially occur in your program. for example there is
there are over 60 built-in exceptions alone. Consider the rest of the standard library and there are hundreds of possible exceptions. Also, there is often no way to easily determine in advance what kind of exceptions some piece of code might throw. Exceptions are not logged as part of a function's call signature, nor is there a compiler to check that your code is handling exceptions properly. As a result, exception handling can sometimes appear messy and disorganized. A useful tool in managing exceptions is the realization that exceptions are organized into a hierarchy through inheritance. Rather than writing code that addresses very specific errors, it may be easier to focus on more general categories of errors. For example, consider the various errors that can occur when looking up values in a container: try: item = items[index]except IndexError: # Raised if items is a sequence...except KeyError: # Raised if items is an assignment . ..
Instead of writing code to handle two very specific exceptions as shown, it might be easier to write the code like this: try: item = items[index] except LookupError: ...
is a class that represents a grouping of top-level exceptions. IndexError and KeyError both inherit from LookupError, so the returned exception clause would catch both. However, LookupError is not wide enough to catch non-lookup errors. search error
Table 1 describes the most common categories of built-in exceptions. Table 1: Exception categories
The BaseException class is rarely used directly in exception handling because it matches all possible exceptions. This includes special exceptions that affect the program's control flow, such as SystemExit, KeyboardInterrupt, and StopIteration. Catching these is rarely what you want. Instead, all normal programmatic errors are inherited from Exception. ArithmeticError is the basis for all math related errors like ZeroDivisionError, FloatingPointError and OverflowError. ImportError is a base for all import related errors. LookupError is a base for all container lookup related errors. OSError is a base for all errors originating from the operating system and the environment. OSError covers a wide range of exceptions related to files, network connections, permissions, pipes, timeouts and more. The ValueError exception is typically generated when a
an operation is given an incorrect input value. UnicodeError is a subclass of ValueError but is used to group all Unicode related encoding and decoding errors. Table 2 shows some common built-in exceptions that simply inherit from Exception but are not part of a larger exception group. Table 2: Other built-in exceptions
Exceptions and control flow exceptions are usually reserved for error handling. However, some exceptions are used to change the program's control flow. These exceptions, shown in Table 3, inherit directly from BaseException. Table 3: Exceptions used for flow control
The SystemExit exception is used to terminate a program on purpose. You can provide an integer exit code or a string message as an argument. If a character string is specified, it is written to sys.stderr and the program exits with exit code 1. Here is a typical example: import sys if len(sys.argv) != 2: raise SystemExit(f'Usage : {sys.argv[0]} filename) filename = sys.argv[1]
The KeyboardInterrupt exception is thrown when the program receives a SIGINT signal (typically by pressing Ctrl-C in a terminal). This exception is a bit unusual because it's asynchronous, which means it can occur at almost any time and on any statement in your program. Python's default behavior is to just exit when this happens. If you want to control the delivery of SIGINT, you can use the signal library module (see Chapter 9, "I/O"). The StopIteration exception is part of the iteration log and signals the end of the iteration.
Defining New Exceptions All built-in exceptions are defined in terms of classes. To create a new exception, create a new class definition that inherits from Exception as follows:
Class NetworkError (Exception): pass
To use your new exception, use the raise statement like this: raise NetworkError('Cannot find host')
When an exception is thrown, the optional values provided with the Raise declaration are used as arguments to the exception's class constructor. Most often this is a string that displays some kind of error message. However, custom exceptions can be written to take one or more exception values, as shown in this example: class DeviceError(Exception): def __init__(self, errno, msg): self.args = (errno, msg) self .errno = errno self.errmsg = msg # raises an exception (multiple arguments) raise DeviceError(1, 'Not responding')
When creating a custom exception class that overrides __init__(), it is important to assign a tuple containing the arguments to __init__() to the self.args attribute, as shown. This attribute is used when printing exception trace messages. If you leave it undefined, users won't be able to see any useful information about the exception when an error occurs. Exceptions can be arranged hierarchically through inheritance. For example, the NetworkError exception defined above could serve as a base class for a variety of more specific errors. Here is an example: class HostnameError(NetworkError): pass class TimeoutError(NetworkError): pass def error1(): raise HostnameError('Unknown host')
def error2(): Increase TimeoutError('Timed out') try: error1() except NetworkError as e: if type(e) is HostnameError: # Take special action for this type of error...
In this case, the except NetworkError clause catches all exceptions derived from NetworkError. To find the specific type of error that was generated, examine the type of the execution value using type().
Chained Exceptions Sometimes you may want to throw another exception in response to another exception. To do this, throw a chained exception, as shown in this example: class ApplicationError(Exception): pass def do_something(): x = int('N/A')
# throws ValueError
def spam(): try: do_something() außer Exception as e: raise ApplicationError('Failure') of e
If an uncaught ApplicationError occurs, you will receive a message that contains both exceptions. For example: >>> spam() trace (last current call): file "c.py", line 9, in spam do_something() file "c.py", line 5, in do_something
x = int('N/A') ValueError: invalid literal for base 10 int(): 'N/A' The above exception was the direct cause of the following exception: Trace (Last Most Recent Call): File " " , Line 1 in c.py file, line 11 in spam throws ApplicationError('Failed') from e __main__.ApplicationError: Failed >>>
If it catches an ApplicationError, the __cause__ attribute of the resulting exception contains the other exception. For example: try: spam() except ApplicationError as e: print('Failed.Reason:', e.__cause__ )
If you want to throw a new exception without including the chain of other exceptions, throw a None error like this: def spam(): try: do_something() except Exception like e: raise ApplicationError('Failure') of None
Programming errors that occur in except blocks also result in a chained exception, but they work slightly differently. Suppose you have code with errors like this: def spam(): try: do_something() except Exception as e: print('Failed:', err)
# error undefined (typo)
The resulting exception trace message is slightly different:
>>> Spam() trace (last last call): File "d.py", line 9, in spam do_something() File "d.py", line 5, in do_something x = int('N/A ' ) ValueError: invalid literal for base 10 int(): 'N/A' Another exception occurred during the above exception handling: trace (last last call): file "", line 1, in file "d .py" , Line 11, in spam print('Failed. Reason:', err) NameError: name 'err' is undefined >>>
If an unexpected exception is thrown while another exception is being handled, the __context__ attribute (instead of __cause__) contains information about the exception that was being handled when the error occurred. For example: try: spam() except Exception as e: print('Error. Reason:', e) if e.__context__: print('During the drive:', e.__context__)
There is an important difference between expected and unexpected exceptions in exception chains. In the first example, the code was written to anticipate the possibility of an exception. For example, the code was explicitly wrapped in a try-except block: try: do_something() except Exception as e: raise ApplicationError('Failed') from e
In the second case, there is a programming error in the block except:
try: do_something() außer Exception als e: print('Error:', err)
# undefined error
Being able to distinguish between these two cases is subtle but important. For this reason, the exception chaining information is placed in the __cause__ or __context__ attribute. The __cause__ attribute is reserved when you anticipate the possibility of an error. The __context__ attribute is used for the unexpected case (an uncaught error occurs while another exception is being handled).
Exception Traces Exceptions have an associated stack trace that provides information about where an error occurred. Trace is stored in the __traceback__ attribute of an exception. You may want to generate the trace message yourself for reporting or debugging purposes. The trace module can be used for this. For example: import traceback try: spam() except exception as e: tblines = traceback.format_exception(type(e), e, e.__traceback__) tbmsg = ''.join(tblines) print('Error:') print ( tbmsg)
In this code, format_exception() produces a list of strings containing the output that Python would normally produce in a trace message. As input, it provides the exception type, value, and trace.
Exception Handling Tips
Exception handling is one of the hardest things to get right in any large program. However, there are some general rules that make it easier. The first rule is to not catch exceptions that cannot be directly handled at that particular point in the code. Consider a function like this: def read_data(filename): with open(filename, 'rt') as file: rows = [] for line in file: row = line.split() rows.append((row[ 0], int(row[1]), float(row[2])) returns rows
Suppose the open() function fails because of an incorrect filename. Is this an error that should be caught with a test statement other than in this function? Probably not. If the caller supplies the wrong filename, there is no reasonable way to recover (no file to open, no data to read, and nothing else). It's better to let the operation fail and throw an exception to the caller. Avoiding error checking in read_data() doesn't mean that the exception isn't handled anywhere, it just means that it's not read_data()'s function to do so. Perhaps the code that prompted a user for a filename would look for it. This advice may go against the grain of programmers used to languages that rely on special error codes or wrapped result types. In these languages, great care is taken that you always check return codes for errors in all operations. You don't do this in Python. If an operation might fail and you can't do anything to recover, it's best to let it fail. The exception is propagated to higher levels of the program, where it is usually the responsibility of other code to handle it. On the other hand, a function could recover from bad data. For example: def read_data(filename): with open(filename, 'rt') as file: rows = [] for line in file: row = line.split()
pruebe: filas.append((fila[0], int(fila[1]), float(fila[2])) except ValueError as e: print('Incorrect queue:', queue) print('Reason:', e) return queues
If you encounter bugs, try to make your exception clauses as narrow as possible. The above code could have been written to catch all errors except Exception. However, doing so would result in the code catching legitimate errors that probably shouldn't be ignored. Don't do this, it will make debugging difficult. Finally, if you do throw an exception explicitly, consider creating your own exception types. For example: class ApplicationError(Exception): pass class UnauthorizedUserError(ApplicationError): pass def spam(): ... raise UnauthorizedUserError('Go away') ...
It's subtle, but one of the hardest problems in large codebases is blaming bugs. By creating your own exceptions, you can better distinguish between intentionally thrown errors and legitimate errors. For example, if your program crashes with some kind of ApplicationError above, you'll know immediately why that error was thrown (because you wrote the code for it). On the other hand, if the program crashes with one of Python's built-in exceptions (e.g. TypeError, ValueError, etc.), this could indicate a more serious problem.
context manager and the with statement
Proper management of system resources such as files, locks, and connections, combined with exceptions, is often a tricky problem. For example, a thrown exception can cause control flow to bypass instructions that are responsible for freeing critical resources, such as: B. a lock. The with statement allows a series of statements to be executed within a runtime context controlled by an object acting as the context manager. Here is an example: with open('debuglog', 'wt') as file: file.write('Debugging\n') statements file.write('Done\n') import threading lock = threading.Lock() with lock : # critical section declarations # exit critical section
In the first example, the with statement causes the open file to be automatically closed when control flow exits the following statement block. In the second example, the with statement automatically acquires and releases a lock as control enters and exits the following block of statements. The with obj statement allows the obj object to manage what happens as control flows in and out of the associated block of statements that follow. When the with obj statement executes, it calls the obj.__enter__() method to signal that a new context is being entered. When control flow leaves the context, the obj.__exit__(type,value,traceback) method is executed. If no exception was thrown, all three arguments to __exit__() are set to None. Otherwise, they contain the type, value, and trace associated with the exception that caused control flow to go out of context. If the __exit__() method returns True, this indicates that the generated exception has been handled and should no longer be propagated. Returning None or False causes the exception to be propagated.
The with obj statement accepts an optional var identifier. If specified, the value returned by obj.__enter__() is placed in var. This value is usually the same as obj because it allows you to create and use an object as a context manager in the same step. For example, consider this class: Class Manager: def __init__(self, x): self.x = x def yow(self): pass def __enter__(self): return self def __exit__(self, ty, val, tb): happen
As written you can create and use an instance as a context manager in one step like this: with Manager(42) as m: m.yow()
Here's a more interesting example using list transactions: class ListTransaction: def __init__(self,thelist): self.thelist = thelist def __enter__(self): self.workingcopy = list(self.thelist) return self.workingcopy def __exit__ (self , type ,value,tb): if type is None: self.thelist[:] = self.workingcopy returns False
This class allows you to make a number of modifications to an existing list. However, the changes will only take effect if no exceptions are thrown. Otherwise, the original list remains unchanged. For example:
items = [1,2,3] with ListTransaction(items) como trabajando: working.append(4) working.append(5) print(items) # Produce [1,2,3,4,5] try: with ListTransaction (Artikel) mit Funktion: working.append(6) working.append(7) aumentar RuntimeError("¡Estamos regados!") außer RuntimeError: pass print(items)
# Produce[1,2,3,4,5]
The standard library module contextlib contains functions related to more advanced uses of context managers. If you create context managers regularly, this might be worth a look.
Assertions and __debug__ The assertion statement can introduce debugging code into a program. The general way to assert isassertest[,message]
where test is an expression that must evaluate to true or false. If the test evaluates to False, the assertion throws an AssertionError exception with the optional message msg provided to the assertion statement. Here is an example: def write_data(file, data): assert file, 'write_data: undefined file!' ...
The assert statement should not be used for code that needs to be executed for the program to succeed, since it will not be executed when Python is running in optimized mode (specified with the -O option to the interpreter). In particular, using the assertion to check user input or success is a bug
a major operation. Instead, assert statements are used to verify invariants that should always be true; If one is violated, it represents an error in the program, not user error. For example, if the write_data() function shown above was intended for an end user, the assertion statement should be replaced with a traditional if statement and the desired error handling. A common use of assert is in testing. For example, you can use it to insert a minimal test of a function: def factorial(n): result = 1 while n > 1: result *= n n -= 1 return result assert factorial(5) == 120
The purpose of such a test is not to be exhaustive, but to serve as a sort of "smoke test". If something obvious is broken in the function, the code will immediately crash with a failed assertion on import. Assertions can also be useful to provide some sort of roadmap contract about expected inputs and outputs. For example: def factorial(n): assert n > 0, "must return a positive value" result = 1 while n > 1: result *= n n -= 1 return result
Again, this is not intended to validate user input. It's more of a check of the internal consistency of the program. If other code tried to calculate negative factorials, the assertion would fail and point to the offending code for you to debug.
Final Words Although Python supports a variety of different programming styles using functions and objects, the basic model of program execution is one of imperative programming. That is, programs consist of statements that are executed sequentially in the order in which they appear in a source file. There are only three basic control flow constructs with if statement, while loop and for loop. In a way, there are few mysteries when it comes to understanding how Python runs your program. By far the most complicated and potentially error-prone feature is the use of exceptions. In fact, much of this chapter has focused on how to properly think about exception handling. Even if you follow this advice, exceptions are still a delicate part of library, framework, and API design. Exceptions can also wreak havoc with proper resource management, issues that are often addressed through the use of context managers and the with statement. This chapter does not cover the techniques for customizing almost all functionality of the Python language, including the built-in operators and even aspects of the flow control features described in this chapter. Although Python programs often appear "simple" in structure on the surface, there is often a surprising amount of magic at work behind the scenes. Much of this is described in the next chapter,
Objects, Types, and Protocols Python programs manipulate objects of various types. There are a variety of built-in types such as numbers, strings, lists, arrays, and dictionaries. You can also create your own types using classes. This chapter describes the underlying Python object model and the mechanisms that make all objects work. Particular attention is paid to "protocols" that define the core behavior of various objects.
Basic Concepts Each piece of data stored in a program is an object. Every object has an identity, a type (aka its class), and a value. For example, writing a = 42 creates an integer object with a value of 42. The object's identity is a number that represents its location. a is a tag related to that specific location, although the tag is not part of the object itself. An object's type, also known as an object class, defines the object's internal data representation and the methods it supports. When an object of a particular type is created, that object is referred to as an "instance" of that type. After an instance is created, its identity does not change. If an object's value can be changed, the object is said to be mutable. If the value cannot be changed, the object is said to be immutable. An object that contains references to other objects is called a container. Objects are characterized by their attributes. An attribute is a value associated with an object accessed with the dot (.) operator. An attribute can be a simple data value, e.g. a number. However, an attribute could also be a function that is called to perform an operation. Such functions are called methods. The following example illustrates attribute access: a = 34 n = a.counter
# generate an integer # get the counter (an attribute)
b = [1, 2, 3] b.append(7)
# Make a list. # Add a new item with the add method
Objects can also implement various operators, such as B. the + operator. For example: c = a + 10 d = b + [4, 5]
#c = 13 + 4j #d = [1, 2, 3, 4, 5]
Although the operators use a different syntax, they ultimately map to methods. For example, if you type + 10, a method a.__add__(10) will be executed.
Identity and object type The built-in function id() returns the identity of an object. The identity is an integer that generally corresponds to the location of the object in memory. The operators is and is not compare the identity of two objects. type() returns the type of an object. Here is an example of the different ways to compare two objects: # Compare two objects def Compare(a, b): if a is b: print('same object') if a == b: print('same value ' ) if type(a) is type(b): print('same type')
Here's an example of how this function works: >>> a = [1, 2, >>> b = [1, 2, >>> Compare(a, same object, same value, same type >>> Compare (a, equal value
3] 3] a)
b)
same type >>> compare(a, [4,5,6]) same type >>>
The type of an object is itself an object, known as an object class. This object is uniquely defined and always the same for all instances of a specific type. Classes often have names (e.g. list, int, dict, etc.) that can be used for instantiation, type checking, and type hinting. For example: items = list() if isinstance(items, list): items.append(item) def removeall(items: list, item) -> list: return [i for i in items if i != item]
A "subtype" is a type defined by inheritance. It carries all of the functionality of the original type plus additional and/or redefined methods. Inheritance is covered in more detail in Chapter 7, but here's an example of a list subtype definition with a new method added. class mylist(list): def removeall(self, val): return [i for i in self if i != val] # example elements = mylist([5, 8, 2, 7, 2, 13, 9]) x = items.removeall(2) print(x) # [5, 8, 7, 13, 9]
The isinstance(instance, type) function is the preferred way to compare a value to a type because it recognizes subtypes. You can also compare many possible types. Example: if isinstance(items, (list, tuple)): maxval = max(items)
Although type checking can be added to a program, type checking is often not as useful as you might think. On the one hand, over-checking hurts performance. Second, programs do not always define perfectly matching objects.
in a nice hierarchy of types. For example, if the purpose of the above isinstance(items, list) statement is to test whether items are "list-like", it would not work for objects that have the same API as a list but do not inherit directly from the built-in List type (e.g. deque instances of collection module).
Reference Counting and Garbage Collection Python manages objects through automatic garbage collection. All objects are counted by reference. An object's reference count increases each time it is assigned a new name or placed in a container such as a list, tuple, or dictionary, as shown here: a = 37 b = a c = [] c. append(b)
# Create an object with a value of 37. # Increase the reference count by 37. # Increase the reference count by 37
This example creates a single object that contains the value 37. a is a name that initially refers to the newly created object. When b is assigned to a, b becomes a new name for the same object, and the number of references to the object increases. Similarly, if you insert b into a list, the object's reference count increases again. In the entire example, only one object corresponds to 37. All other operations create references to this object. An object's reference count is decremented by the del statement or whenever a reference goes out of scope (or is reallocated). Here is an example: del a # decrements the reference count by 37 b = 42 # decrements the reference count by 37 c[0] = 2.0 # decrements the reference count by 37
An object's current reference count can be retrieved using the sys.getrefcount() function. For example: >>> a = 37 >>> import sys >>> sys.getrefcount(a)
7 >>>
The number of references is usually much higher than you might think. For immutable data such as numbers and strings, the interpreter aggressively shares objects between different parts of the program to conserve memory. You just don't realize it because objects are immutable. When an object's reference count reaches zero, it is garbage collected. However, in some cases, a circular dependency may exist between a collection of objects that are no longer used. Here is an example: a = { } b = { } a['b'] = b b['a'] = a from a from b
# a contains a reference to b # b contains a reference to a
In this example, the del statements reduce the reference count of a and b and destroy the names used to refer to the underlying objects. However, since each object contains a reference to the other, the reference count does not drop to zero and the objects remain allocated. Ultimately, the shell won't leak memory, but object destruction will be delayed until a loop listener runs to find and remove the inaccessible objects. The loop detection algorithm runs periodically as the interpreter allocates more and more memory during execution. The exact behavior can be set and controlled via functions in the gc standard library module (see URL). The gc.collect() function can be used to immediately invoke the cyclic garbage collector. In most programs, garbage collection is something that just "happens" without you having to think much about it. However, there are situations in which it can make sense to delete objects manually. Such a scenario arises when working with huge data structures. For example, consider this code: def some_calculation(): data = create_giant_data_structure() # Use data for part of a calculation
... # detach data from data # continue calculation ...
In this code, the use of the del data declaration indicates that the data variable is no longer needed. If this causes the reference count to reach 0, the object is immediately garbage collected at that point. Without the del statement, the object persists indefinitely until the data variable goes out of scope at the end of the function. You might only notice this when trying to figure out why your program is using more memory than it should.
References and Copies When a program makes an assignment such as b = a, a new reference to a is created. For immutable objects like numbers and strings, this assignment appears to make a copy of a (although it doesn't). However, for mutable objects like lists and dictionaries, the behavior seems to be quite different. Here is an example: >>> a = [1,2,3,4] >>> b = a >>> b is true >>> b[2] = -100 >>> a [1, 2, - 100, 4] >>>
# b is a reference to a
# Change an element in b # Notice how a has also changed
Because a and b in this example refer to the same object, a change made to one variable is reflected in the other. To avoid this, you should create a copy of an object instead of a new reference. Two types of copy operations are applied to container objects such as lists and dictionaries: a shallow copy and a deep copy. Creates a shallow copy a
new object, but fills it with references to the same elements contained in the original object. Here is an example: >>> a = [ 1, 2, [3,4] ] >>> b = list(a) >>> b is False >>> b.append(100) >>> b [ 1 , 2, [3, 4], 100] >>> a [1, 2, [3, 4]] >>> b[2][0] = -100 >>> b [1, 2, [ - 100, 4], 100] >>> to [1, 2, [-100, 4]] >>>
# Make a shallow copy of a.
# Add item to b.
# Note that a has not changed. # Change an item in b
# Note the change within a
In this case, a and b are separate list objects, but the items they contain are shared. Therefore, a modification to one of the elements of a also modifies an element of b, as shown. A deep copy creates a new object and recursively copies all objects within it. There is no built-in operator to create deep copies of objects. However, you can use the copy.deepcopy() function in the standard library, as shown in the following example: >>> >>> >>> >>> >>> [1, >>> [1, >> >
import copy a = [1, 2, [3, 4]] b = copy.deepcopy(a) b[2][0] = -100 b 2, [-100, 4]] a # Note that a this has not changed 2, [3, 4]]
The use of deepcopy() should be actively discouraged in most programs. Copying an object is slow and often unnecessary. Reserve deepcopy() for
Situations where you know you really need a copy because you're about to mutate data and you don't want your changes to affect the original object. Also note that deepcopy() fails on objects related to system or runtime state (e.g. open files, network connections, threads, factory, etc.).
Object rendering and printing programs often need to display objects. For example, displaying data to a user or printing it for debugging purposes. If you pass an object x to the print(x) function, or convert it to a string using str(x), you usually get a "nice" human-readable representation of the object's value. For example, consider an example using dates: >>> from datetime import date >>> d = date(2012, 12, 21) >>> print(d) 2012-12-21 >>> str(d) ' 2012 - 12-21' >>>
This "nice" representation of an object is often insufficient for debugging. For example, in the output of the code above, there is no obvious way to tell whether the variable d is a date instance or a simple string containing the text "2012-12-21". Use the repr(x) function for more information. repr(x) creates a string showing the representation of the object as it should be written in source code. For example: >>> d = date(2012, 12, 21) >>> repr(d) 'datetime.date(2012, 12, 21)' >>> print(repr(d)) datetime.date(2012 , 12, 21) >>> print(f'The date is: {d!r}') The date is: datetime.date(2012, 12, 21) >>>
In string format, the !r su˚x can be added to a value to generate its value repr() instead of the normal string conversion.
First Class Objects All objects in Python are referred to as “first class”. This means that all objects that can be assigned a name can also be treated as data. Like data, objects can be stored as variables, passed as arguments, returned by functions, compared to other objects, and more. For example, here is a simple dictionary with two values: items = { 'number' : 42 'text' : "Hello World" }
The excellence of the objects is evident when one adds some more unusual elements to this dictionary. Here are some examples: items['func'] = abs import math items['mod'] = math items['error'] = ValueError nums = [1,2,3,4] items['append'] = nums .fasten
# Add the abs() function # Add a module # Add an exception type # Add a method of another object
In this example, the element dictionary now contains a function, module, exception, and method of another object. If you want, you can use dictionary lookups on items instead of their original names and the code will still work. For example: >>> 45 >>> 2.0 >>> ... ... ... ...
items['func'](-45)
# Execute abs (-45)
items['mod'].sqrt(4)
# Run math.sqrt(4).
try: x = int('a lot') except elements['error'] like e: # Dasselbe wie except ValueError like e print("Could not convert")
>>> Elements ['append'](100) >>> Numbers [1, 2, 3, 4, 100] >>> could not be converted
# Run nums.append(100).
The fact that everything is top notch in Python is often not fully appreciated by newcomers. However, it can be used to write very compact and flexible code. Suppose you have a line of text like "ACME,100,490.10" and you want to convert it to a value list with the appropriate casts. Here's a clever way to do it, by creating a list of types (which are first-class objects) and doing some common list-processing operations: >>> line = 'ACME,100,490.10' >>> column_types = [str, int, float ] >>> parts = line.split(',') >>> row = [ty(val) for ty, val in zip(column_types, parts)] >>> row ['ACME', 100, 490.1] > > >
Placing functions or classes in a dictionary is a common technique for eliminating complex if-elif-else statements. For example, instead of writing code like this: if format == 'text': formatter = TextFormatter() elif format == 'csv': formatter = CSVFormatter() elif format == 'html': formatter = HTMLFormatter() else : runtimeError ('Bad Format')
The code could be rewritten using a dictionary: _formats = { 'text': TextFormatter,
'csv': CSVFormatter, 'html': HTMLFormatter } if the format is in _formats: formatter = _formats[format]() else: runtimeError('Bad format')
This latter form is also more flexible in that new cases can be added by adding more entries to the dictionary, rather than having to modify a large block of if-elif-else statements.
Use None for optional or missing data Sometimes programs need to present an optional or missing value. None is a special instance reserved for this purpose. Neither is returned by functions that don't explicitly return a value. None is also often used as the default value for optional arguments, so the function can detect whether the caller actually passed a value for that argument. None has no attributes and evaluates to False in Boolean expressions. Internally, None is stored as a singleton (i.e. there is only one value of None in the interpreter). Therefore, a common way to test a value against None is to use the is operator like this: If the value is None: Statements...
Testing for None with the == operator also works, but it's not a recommended style (and may be flagged as a "style bug" by code testing tools).
Object Protocols and Data Abstraction Most features of the Python language are defined by "protocols". To illustrate, consider the following function:
def compute_cost(unit_price, num_units): return unit_price * num_units
Now ask yourself which entries are allowed. The answer is amazingly simple: everything is allowed! For example, this function looks at first glance as if it could be applied to numbers. >>> compute_cost(1.25, 50) 62.5 >>>
In fact, it works as expected. However, the feature works with much more. For example, you can use special numbers like fractions or decimals. >>> from fractions import Fraction >>> calculate_cost(Fraction(5, 4), 50) Fraction(125, 2) >>> from decimal import Decimal >>> calculate_cost(Decimal('1.25'), Decimal('50 ')) Decimal('62.50') >>>
In addition, the function works with arrays and other complex package structures like numpy. For example: >>> import numpy as np >>> prices = np.array([1.25, 2.10, 3.05]) >>> units = np.array([50, 20, 25]) >>> compute_cost(prices , sets) array([62.5, 42., 76.25]) >>>
The function might even "work" in an unexpected way: >>> compute_cost('a lot', 10) 'a lot lot lot lot lot' >>>
And yet certain type combinations fail:
>>> compute_cost(Fraction(5, 4), Decimal('50')) Trace(last last call): File "", line 1, at File "", line 2, at compute_cost TypeError: Operand type not compatible ( s ) for *: 'fraction' and 'decimal.decimal' >>>
Unlike a compiler for a static language, Python does not pre-check the correct behavior of the program. Instead, an object's behavior is determined by a dynamic process using so-called "special" or "magic" methods. The names of these special methods are always preceded and followed by double underscores (__). Methods are automatically called by the interpreter when a program is run. For example, the operation x * y is performed by a method x.__mul__(y). The names of these methods and their corresponding operators are hardcoded. The behavior of a given object depends entirely on the set of special methods it implements. The next sections describe the specific methods associated with the different categories of core-shell features. These categories are sometimes referred to as "logs". Objects, including user-defined classes, can define any combination of these characteristics to cause an object to behave in different ways.
Object Protocol The procedures in Table X relate to general object management. This includes the creation, initialization, destruction, and rendering of objects.
The __new__() and __init__() methods are used together to create and initialize instances. When an object is created by calling SomeClass(args), it is translated into the following steps: x = SomeClass.__new__(SomeClass, args) if isinstance(x, SomeClass): x.__init__(args)
Usually these steps are done behind the scenes and you don't have to worry too much about them. The most common method implemented in a class is __init__(). The use of __new__() almost always indicates the presence of advanced instantiation-related magic (e.g. use in class methods that want to bypass __init__(), or in certain design patterns for building, e.g. when defining singletons or caching). The implementation of __new__() need not necessarily return an instance of that class; otherwise, the post-creation call of __init__() is skipped. The __del__() method is called when an instance is garbage collected. This method is only called when an instance is no longer used. It is important to note that the declaration of x only reduces the instance's reference count and does not necessarily result in a call to this function. __del__() is almost never defined unless an instance needs to perform additional resource management steps after destruction. The __repr__() method, called by the built-in repr() function, creates a string representation of an object, which can be useful for debugging and printing.
This is also the method responsible for creating the output of the values you see when you examine variables in the interactive shell. The convention is that __repr__() returns an expression string that can be evaluated to recreate the object with eval(). For example: a = [2, 3, 4, 5] s = repr(a) b = eval(s)
# Make a list # s = '[2, 3, 4, 5]' # Convert s back to a list
By convention, if a simple string expression cannot be constructed, __repr__() returns a string of the form as shown here: f = open('foo.txt') a = repr(f) # a = "
The Numeric Protocol X-Table lists the special methods that objects must implement to provide mathematical operations. Table X Methods for Mathematical Operations
Given an expression like x + y, the shell calls a combination of the x.__add__(y) or y.__radd__(x) methods to perform the operation. The initial choice is to test x.__add__(y) in all cases except for the special case where y happens to be a subtype of x (in this case
runs first). If the initial method does not return NotImplemented, an attempt is made to invoke the operation with reversed operands such as y.__radd__(x). If this second attempt fails, the entire operation fails. Here is an example that illustrates the process: y.__radd__(x)
>>> a = 42 # int >>> b = 3.7 # float >>> a.__add__(b) No implementation >>> b.__radd__(a) 45.7 >>>
This example may seem surprising, but it reflects the fact that integers don't really know anything about floating point numbers. Floating point numbers, however, know integers (because mathematically integers are a special kind of floating point number). Therefore, the inverted operand produces the correct answer. The methods __iadd__(), __isub__() etc. are used to support direct arithmetic operators like a += b and a -= b (aka extended assignment). A distinction is made between these operators and standard arithmetic methods because the in-place implementation of the operators requires certain customizations, such as B. performance optimizations, could enable. For example, if the object is not shared, an object's value could be changed instead without having to assign a newly created object to the result. If the existing operators are left undefined, an operation like a += b will evaluate to a = a + b instead. There are no methods to define the behavior of the logical operators and, or, or not. The operators and and or implement short-circuit evaluation, where evaluation stops when the final result can already be determined. For example: >>> True or 1/0 True >>>
# Does not rate 1/0
This behavior with unevaluated subexpressions cannot be expressed using the same evaluation rules as a regular function or method. Therefore, there is no protocol or set of methods available to override it (instead it is treated as a special case deep in the Python implementation).
Compare log objects can be compared in several ways. The most basic verification is an identity verification using the is operator. Example: a is b. Identity does not take into account values stored in an object, even if they happen to be the same. For example: >>> a >>> b >>> a True >>> c >>> a False >>>
= [1, 2, 3] = a is b = [1, 2, 3] is c
The is operator is an internal part of Python that cannot be redefined. All other comparisons are implemented on objects using the methods in Table X. Table X Special Methods for Example and Hash Comparison
The __bool__() method, if present, is used to determine "truth" when testing an object as part of a condition or conditional expression. For example: if: ... if not: ...
# Run a.__bool__()
If __bool__() is not defined, __len__() is used as fallback. If both __bool__() and __len__() are undefined, an object is simply considered "True". The __eq__() method is used to determine basic equality to use with the == and != operators. The default implementation of __eq__() compares objects by identity using the is operator. The __ne__() method, if present, can be used to implement special processing for !=, but is generally not required as long as __eq__() is defined. The order is determined by the comparison operators (, =) using methods such as __lt__() and __gt__(). As with other mathematical operations, the scoring rules are subtle. To evaluate a < b, the shell first tries to execute a.__lt__(b) except in the case where b is a subtype of a. In this particular case, b.__gt__(a) is executed instead. If this initial method is undefined or returns NotImplemented, the interpreter tries a
reverse comparison. For example, call b.__gt__(a). Similar rules apply to operators like =. Example: Evaluating >> a = 42 # int >>> b = 52.3 # float >>> a.__lt__(b) NotImplemented >>> b.__gt__(a) True >>>
An ordered object is not required to implement all comparison operations on table X. If you want to order objects or use functions like min() or max(), __lt__() should be minimally defined. When adding comparison operators to a custom class, the @total_ordering class decorator in the functools module can be useful. You can generate any method as long as you minimally implement __eq__() and one of the other comparisons. The __hash__() method is defined for instances placed in an array or used as keys in a map (dictionary). The return value is an integer that must be identical for two instances that are compared as equal. Also, __eq__() should always be defined together with __hash__() since the two methods work together. The return value of __hash__() is generally used as internal implementation detail of various data structures. However, it is possible for two different objects to have the same hash value. Therefore __eq__() is needed to resolve possible collisions.
Conversion Protocols Sometimes you need to convert objects to various built-in types, including strings and numbers. For this purpose, the methods in Table X can be defined:
The __str__() method is called by the built-in str() function and by print-related functions. The __format__() method is called by the format() function or the format() method of Strings. The format_spec argument is a string containing the format specification. This string is identical to format()'s format_spec argument. For example: f'{x:spec}' format(x, 'spec') 'x is {0:spec}' .format(x)
# Lama a x.__format__('spec') # Lama a x.__format__('spec') # Lama a x.__format__('spec')
The syntax of the format specification is arbitrary and can be adapted object by object. However, there is a standard set of conventions used for built-in types. For more information on the format of strings, including the general format of the identifier, see Chapter 9, “I/O”. The __bytes__() method is used to construct a byte representation when passing an instance to bytes(). Not all types support byte conversion.
The numeric conversions __bool__(), __int__(), __float__() and __complex__() are expected to produce a value of the appropriate built-in type. Python never performs implicit type conversions with these methods. So even if an object x implements an __int__() method, the expression 3 + x still throws a TypeError. The only way to do __int__() is to use the int() function explicitly. The __index__() method performs an integer conversion of an object when used in an operation that requires an integer value. This includes indexing in sequence operations. For example, if elements is a list, performing an operation like elements[x] will attempt elements[x.__index__()] if x is not an integer. __index__() is also used in various base conversions such as oct(x) and hex(x).
Container protocol Table X's methods are used by objects that wish to implement containers of various types (e.g. lists, dicts, arrays, etc.). Table X methods for containers
Here is an example: a = [1, 2, 3, 4, 5, 6] len(a) # a.__len__() x = a[2] # x = a.__getitem__(2) a[1] = 7#a.__setitem__(1,7)
from to[2] 5 in to
# a.__remove__(2) # a.__contains__(5)
The built-in function len() calls the __len__ method to return a non-negative length. This function also determines truth values, unless the __bool__() method has also been defined. To access individual items, the __getitem__() method can return an item by key value. The key can be any Python object, but for ordered sequences such as lists and arrays, it is expected to be an integer. The __setitem__() method assigns a value to an item. The __delitem__() method is called each time the del operation is applied to a single item. The __contains__() method is used to implement the in operator. Slicing operations like x = s[i:j] are also implemented by __getitem__(), __setitem__() and __delitem__(). For slices, a special slice instance is passed as the key. This instance has attributes that describe the scope of the requested segment. For example: a = [1,2,3,4,5,6] x = a[1:5] a[1:3] = [10,11,12] 12]) del a[1:4]
# x = a.__getitem__(slice(1, 5, None)) # a.__setitem__(slice(1, 3, None), [10, 11, # a.__delitem__(slice(1, 4, None))
Python's slicing capabilities are more powerful than many programmers realize. For example, the following variations of extended slices are supported and can be useful for working with multidimensional data structures such as arrays and arrays. a = m[0:100:10] b = m[1:10, 3:20] c = m[0:100:10, 50:75:5] m[0:5, 5:10] = n from m[:10, 15:]
# # # # #
Splined-Slice (step=10) Multidimensional slice Multiple dimensions with steps Advanced slice allocation Advanced slice removal
The general format for each dimension of an extended segment is i:j[:step], where step is optional. As with regular slices, you can omit the start or end values of any portion of a slice.
Additionally, ellipses (written as ...) are available to indicate any number of leading or trailing dimensions in an expanded segment: a = m[..., 10:20] m[10:20, ...] = north
# Extended section access with ellipses
When using extended slices, the __getitem__(), __setitem__(), and __delitem__() methods implement access, modification, and deletion, respectively. However, instead of an integer, the value passed to these methods is a tuple containing a combination of slice objects or ellipses. Example: a = m[0:10, 0:100:5, ...]
calls __getitem__() like this: a = m.__getitem__((slice(0,10,None), slice(0,100,5), Ellipses))
Python's strings, tuples, and lists currently have some support for extended segments. No part of Python or its standard library uses multidimensional slices or ellipses. These functions are reserved exclusively for third-party libraries and frameworks. Perhaps the most common place you would see them is in a library like numpy.
Iteration Protocol If an instance, obj, supports iteration, it exposes a method, obj.__iter__(), that returns an iterator. An iterator, in turn, implements a single method, iter.__next__(), that either returns the next object or triggers StopIteration to signal the end of the iteration. Both methods are used when implementing the for statement, as well as other operations that implicitly iterate. For example, the statement for x in s is executed by executing steps that correspond to the following: _iter = s.__iter__() while True: try: x = _iter.__next__() except StopIteration: break
# Make statements in the body of the for loop...
An object can optionally provide a reversed iterator by implementing the special __reversed__() method. This method should return an Iterator object with the same interface as a normal iterator (ie a __next__() method and using StopIteration). This method is used by the built-in function invert(). For example: >>> for inverted x([1,2,3]): ... print(x) 3 2 1 >>>
A common implementation technique for iteration is to use a generator function that involves power. For example: class FRange: def __init__(self, start, stop, step): self.start = start self.stop = stop self.step = step def __iter__(self): x = self.start while x < self.stop : yield x x += self.step # Application example: nums = FRange(0.0, 1.0, 0.1) for x in nums: print(x) # 0.0, 0.1, 0.2, 0.3, ...
This works because the generator functions conform to the iteration protocol. It's a bit easier to implement an iterator this way, since you only have to worry about the __iter__() method. All other iteration engines are already provided by the generator.
Attribute Protocol The methods in Table X read, write, and remove an object's attributes using the dot operator (.) and the del operator, respectively. Table X Special Procedures for Attribute Access
Each time an attribute is accessed, the __getattribute__() method is called. If the attribute is found, its value is returned. Otherwise, the __getattr__() method is called. The default behavior of __getattr__() is to throw an AttributeError exception. The __setattr__() method is called whenever an attribute is set, and the __delattr__() method is called whenever an attribute is deleted. These methods are fairly robust in that they allow a type to completely override attribute access for all attributes. Custom classes can define properties and descriptors that allow finer control over access to attributes. This is discussed further in Chapter 7.
function log
An object can emulate a function by providing the __call__() method. If an object x provides this method, it can be called as a function. That is, x(arg1, arg2, ...) calls x.__call__(arg1, arg2, ...). There are many built-in types that support function calls. For example, types implement __call__() to create new instances. The bound methods implement __call__() to pass the self argument to the instance methods. Library functions like functools.partial() also create objects that emulate functions.
Context Manager Protocol The with statement allows a sequence of statements to be executed under the control of an entity known as the context manager. The general syntax is as follows: with context [as var]: statements
A context object shown here is expected to implement the methods shown in Table X. Table X Special Methods for Context Managers
The __enter__() method is called when the with statement is executed. The return value of this method is placed in the variable specified with the optional as var identifier. The __exit__() method is called as soon as control flow exits the statement block associated with the with statement. As arguments, __exit__() receives the current exception type, value, and trace if an exception was thrown. If no error is handled, all three values are set to None. The __exit__() method must return True or False to indicate whether a thrown exception was handled or not. If True is returned, all pending exceptions are cleared and program execution continues normally with the first statement after the with block. The primary use of the context management interface is to provide simplified resource control for objects related to system state, such as open files, network connections, and locks. By implementing this interface, an object can safely clean up resources when execution leaves a context in which an object is in use. See Chapter 5, “Program Structure and Control Flow” for more details.
Final Words: About Pythonic A frequently mentioned design goal is to write code that is "Pythonic". This can mean many things, but basically it means that you should be encouraged to follow the established idioms used by the rest of Python. That means knowing the Python protocols for containers, iterables, resource management, etc. Many of the most popular Python frameworks use these protocols to provide a good user experience. So you should make an effort. Of the various protocols, three deserve particular attention because of their widespread use. The first is to create an appropriate object representation using the __repr__() method. Python programs are often debugged and experimented with in the interactive REPL. It is also common to generate objects using print() or a logging library. If you make it easy to observe the state of your objects, you'll make all of these things easier. Second, iterating over data is one of the most common programming tasks out there. If you want to do that, you need to make your code work with Python's for statement. Many core parts of Python and the standard library are designed to work with iterable objects. By allowing iterations in the
As usual, you'll automatically get a significant amount of additional functionality, and your code will be intuitive to other programmers. Finally, use context managers and the with statement to address the common programming pattern of statements sandwiched between some sort of start and teardown steps. For example, opening and closing resources, acquiring and releasing locks, subscribing and unsubscribing, etc.
Functions Functions are the basic building blocks of most Python programs. This chapter describes function definitions, function applications, scoping rules, closures, decorators, and other functional programming functions. Particular attention is paid to the different programming languages, evaluation models and functional models.
Function definitions Functions are defined with the def statement: def add(x, y): return x + y
The first part of a function definition specifies the name of the function and the names of the parameters that represent the input values. The body of a function is a sequence of statements that are executed when the function is called or applied. Applies a function to the arguments by writing the function name followed by the arguments enclosed in parentheses, e.g. e.g. a = add(3, 4). Arguments are fully evaluated from left to right before the body of the function is executed. For example, sum(1+1, 2+2) is first reduced to sum(2, 4) before calling the function. This is referred to as an "applying assessment order". The order and number of arguments must match the parameters specified in the function definition. If there is a mismatch, a TypeError exception is thrown. The structure required to invoke a function (i.e. the number of arguments required, etc.) is called the function's invocation signature.
Default Arguments You can assign default values to function parameters by assigning values in the function definition. For example: def split(line, delimiter=','): Statements
When a function defines a parameter with a default value, that parameter and all following parameters are optional. It is not possible to specify a parameter with no default value after a parameter with a default value. Default parameter values are evaluated once when the function is first defined, rather than each time the function is invoked. This often leads to surprising behavior when using default mutable objects: def func(x, items=[]): items.append(x) return items func(1) func(2) func(3)
# returns [1] # returns [1, 2] # returns [1, 2, 3].
Notice how the default argument preserves changes made in previous calls. To avoid this, it's better to use None and add a check like this: def func(x, items=None): If items is None: items = [] items.append(x) returns Items
As a general practice, it is recommended that you only use immutable objects for default argument values (e.g. numbers, strings, booleans, none, etc.). This avoids such surprises.
Variadic Arguments A function can accept a variable number of arguments by using an asterisk (*) as a prefix for the last parameter name. For example: def product(first, *arguments): result = first for x in arguments: result = result * x return result
product(10, 20) product(2, 3, 4, 5)
# -> 200 # -> 120
In this case, any additional arguments are placed as tuples in the args variable. As a tuple, it can operate on the arguments using the standard operations that apply to sequences (e.g. iteration, division, unpacking, etc.).
Keyword Arguments Function arguments can be provided by explicitly naming each parameter and supplying a value. These are called keyword arguments. Here's an example: def func(w, x, y, z): Statements # keyword argument call func(x=3, y=22, w='hello', z=[1, 2])
For keyword arguments, the order of the arguments doesn't matter as long as each required parameter is given a unique value. If you omit any of the required arguments, or if a keyword name does not match any of the parameter names in the function definition, a TypeError exception is thrown. The keyword arguments are evaluated in the same order as they are specified in the function application. Positional arguments and keyword arguments can appear in the same function call, as long as all positional arguments appear first, values are provided for all non-optional arguments, and no argument is given more than one value. Here is an example: func('hello', 3, z=[1, 2], y=22) func(3, 22, w='hello', z=[1, 2]) for w
# Typing error. multiple values
If desired, it is possible to force the use of keyword arguments. It does this by listing the parameters after a * argument, or possibly just including a single * in the definition. For example:
def read_data(filename, *, debug=False): ... def product(first, *values, scale=1): result = first * scale for val in values: result = result * val return result
In this example, the read_data debug argument can only be specified by a keyword. This constraint often improves code readability: data = read_data('Data.csv', True) data = read_data('Data.csv', debug=True)
# NOT. Enter Error # Yes.
The product() function accepts any number of positional arguments and an optional keyword-only argument. For example: result = product(2,3,4) result = product(2,3,4, scale=10)
# Result = 24 # Result = 240
Variadic Keyword Arguments When the last argument of a function definition is prefixed with **, any additional keyword arguments (those that do not match any of the other parameter names) are put into a dictionary and passed to the function. The order of the items in this dictionary is guaranteed to match the order in which the keyword arguments were provided. Taking arbitrary keyword arguments can be a useful way of defining functions that accept a large number of potentially open configuration options that would be too unwieldy to list as parameters. Here's an example: def make_table(data, **parms): # get parms configuration parameters (a dict) fgcolor = parms.pop('fgcolor', 'black') bgcolor = parms.pop('bgcolor', 'white ' ) width = parms.pop('width', None) ...
# No more options if parameters: raise TypeError(f'Unsupported configuration options {list(parms)}') make_table(items, fgcolor='black', bgcolor='white', border=1, borderstyle='grooved ', cellpadding=10, width=400)
A dictionary's pop() method removes an element from a dictionary and returns a possible default value if it is undefined. The expression parms.pop('fgcolor', 'black') used in this code mimics the behavior of a specified keyword argument with a default value.
Functions that accept any input Using * and ** together, you can write a function that accepts any combination of arguments. Positional arguments are passed as tuples and keyword arguments as a dictionary. For example: # accept a variable number of positional arguments or keywords def func(*args, **kwargs): # args is a tuple of positional arguments # kwargs is a dictionary of keywords args...
This combined use of *args and **kwargs is commonly used to write wrappers, decorators, proxies, and similar types of functions. Suppose you have a function to parse lines of text taken from an iterable: def parse_lines(lines, separator=',', types=(), debug=False): for line in lines: ... Statements .. .
Suppose you want to create a special case function that parses data from a file specified by filename. For this you could write:
def parse_file(filename, *args, **kwargs): mit open(filename, 'rt') als file: return parse_lines(file, *args, **kwargs)
An advantage of this approach is that the parse_file() function does not need to know about the various arguments of parse_lines(). It accepts and passes any additional arguments provided by the caller. This also simplifies the maintenance of the parse_file() function. For example, if new arguments are added to parse_lines(), those arguments will also magically work with the parse_file() function.
Positional Arguments Many of Python's built-in functions only accept positional arguments. This is indicated by the presence of a slash (/) in a function's call signature, which is exposed by various utilities and IDEs. For example, you might see something like func(x, y, /). This means that all arguments that come before the slash can only be specified by position. So you could call the function with func(2, 3) but not with func(x=2, y=3). For the sake of completeness, this syntax can also be used when defining functions. For example, you can write: def func(x, y, /): pass func(1, 2) func(1, y=2)
# Good # Failed
This form of definition is rarely found in most code, as it was not supported until Python 3.8. However, it can be a useful way to avoid potential naming conflicts between argument names. For example, carefully consider the following code: import time def after(seconds, func, /, *args, **kwargs): time.sleep(seconds) return func(*args, **kwargs)
def duration(*, seconds, minutes, hours): return seconds + 60 * minutes + 3600 * hours to (5, duration, seconds=20, minutes=3, hours=2)
In this code, seconds is passed as a keyword argument, but its purpose is to use it with the duration function passed to after(). Using positional-only arguments in after() avoids a naming conflict with the second argument appearing first.
Names, Document Strings, and Type Hints The standard naming convention for functions is to use lowercase letters with an underscore (_) as the word separator. For example read_data() and not readData(). When a function is not meant to be used directly because it is a help or some kind of internal implementation detail, the name usually has a single underscore. For example _helper(). However, these are just conventions. You can name a function anything you like, as long as the name is a valid identifier. The name of a function can be obtained using the __name__ attribute. This is sometimes useful when debugging. >>> def square(x): ... return x * x ... >>> square.__name__ 'square' >>>
It is common for a function's first declaration to be a docstring describing its use. Example: def factorial(n): ''' Calculates n factorial. For example: >>> factorial(6) 120 >>>
''' if n int: if n int: result: int = 1 # Escriba la variable local sugerida while n > 1: result *= n n -= 1 return result
Such hints are completely ignored by the interpreter. They are not controlled, stored or evaluated. Again, the purpose of the hint is to help third-party code review tools. However, adding type hints to functions is not recommended unless you also actively use code review tools that use them. It's very easy to mistype hints, and unless you're actively using a tool that checks them, errors won't be discovered until someone else decides to run a type checking tool on your code.
Function Application and Parameter Passing When a function is applied, the function's parameters are local names bound to the input objects passed. Python passes provided objects to the function "as is" with no additional copies. Care must be taken when passing mutable objects such as lists or dictionaries. When changes are made, those changes are reflected in the original object. Here is an example: def square(items): for i, x in enumerate(items): items[i] = x * x # Modify items in-place a = [1, 2, 3, 4, 5] square( a ) # Change a to [1, 4, 9, 16, 25]
Functions that mutate their input values or change the state of other parts of the program behind the scenes are said to have "side effects". As a general rule, it is best to avoid side effects, as they can become a source of subtle bugs as programs grow in size and complexity (e.g. when reading a function call it is not obvious whether a function has side effects or not ). ). These functions also interact poorly with programs that involve threads and concurrency, since the side effects generally require lock protection. It's also important to distinguish between changing an object and reassigning a variable name. For example, consider this function: def sum_squares(items): items = [x*x for x in items] return sum(items) a = [1, 2, 3, 4, 5] result = sum_squares(a) print( a) #[1, 2, 3, 4, 5]
# Rename "items".
(Unchanged)
In this example, it looks like the sum_squares() function is overriding the passed-items variable. Yes, a new value is assigned to the local element tag. However, the original input value (a) does not change with this operation. Instead, the elements of the local variable name are bound to an entirely different object: the result of the inner list
Understanding. There is a difference between assigning a variable name and changing an object. When you assign a value to a name, you are not overwriting the object that already exists. You reassign the name to another object. Stylistically, it is common for functions with side effects to return None as a result. As an example, consider the sort() method of a list: >>> >>> >>> [2, >>>
items = [10, 3, 2, 9, 5] items.sort() # Watch: no return value items 3, 5, 9, 10]
The sort() method performs a direct sorting of the list elements. It returns no results. A lack of a result is a strong indicator of an adverse reaction. In this case, the list items have been rearranged. Sometimes you already have data in a stream or mapping that you want to pass to a function. To do this, you can use * and ** in function calls. For example: def func(x, y, z): ... s = (1, 2, 3) # Pass a sequence as arguments result = func(*s) # Pass an assignment as keyword arguments d = { 'x ' :1, 'y':2, 'z':3 } result = function(**d)
If you take data from multiple sources, or even supply some of the arguments explicitly, anything will work as long as the function gets all the required arguments, there's no duplication, and everything is properly aligned in your call signature. You can even use * and ** more than once in the same function call. If you miss an argument or provide duplicate values for an argument, you will get an error message. Python will never let you call a function with arguments that don't match its signature.
Return Values The return statement returns a value from a function. If no value is specified or you omit the return statement, None is returned. To return multiple values, put them in a tuple: def parse_value(text): ''' Text of the form name=value in (name, value) ''' parts = text.split('=', 1) return (parts[0].strip(), parts[1].strip())
Return values in a tuple can be unpacked into individual variables: name, value = parse_value('url=http://www.python.org')
Named tuples are sometimes used as an alternative: when writing import NamedTuple class ParseResult(NamedTuple): name: str value: str def parse_value(text): ''' Splits the text in the form name=val into (name, val) ' ' ' parts = text.split('=', 1) returns ParseResult(parts[0].strip(), parts[1].strip()).
A named tuple works the same way as a regular tuple (it can perform the same operations and unpack), but it can also refer to return values using named attributes: r = parse_value('url=http://www .python.org' ) print(r.name, r.value)
error handling
An issue with the parse_value() function in the previous section concerns error handling. What to do if the input text is malformed and no correct result can be returned? One approach is to treat the result as optional, i. H. the function returns an answer or returns None, which is commonly used to indicate a missing value. For example, the function could be modified as follows: def parse_value(text): parts = text.split('=', 1) if len(parts) == 2: return ParseResult(parts[0].strip(), parts[ 1].strip()) otherwise: returns None
With this design, the burden of checking the optional result falls on the caller: result = parse_value(text) if result: name, value = result
Or more compactly in Python 3.8+ as follows: if result := parse_value(text): name, value = result
Instead of returning None, you could treat malformed text as an error by throwing an exception. For example: def parse_value(text): parts = text.split('=', 1) if len(parts) == 2: return ParseResult(parts[0].strip(), parts[1].strip() ) else: raise ValueError('Wrong value')
In this case, the caller has the option to tryexcept to handle bad values. For example:
try: name, value = parse_value(text) ... außer ValueError: ...
The choice of whether or not to use an exception is not always straightforward. Typically, exceptions are the most common way to deal with an abnormal result. However, exceptions are also expensive if they occur frequently. When writing code where performance is important, it might be better to return None, False, -1, or some other special value to indicate an error.
Scope rules A local namespace is created each time a function is executed. This namespace represents an environment that contains the names and values of the function's parameters, as well as any variables that are assigned within the function's body. Name binding is known in advance when a function is defined, and all names assigned within the function body are locale bound. All other names used in the function body but not assigned (free variables) are found dynamically in the global namespace, which is always the attached module in which a function was defined. There are two types of name-related errors that can occur during function execution. Searching for an undefined free variable name in the global environment results in a NameError exception. Searching for a local variable that has not yet been assigned a value results in an UnboundLocalError exception. This last error is usually the result of control flow errors. For example: def func(x): if x > 0: y = 42 returns x + y func(10) func(-10)
# and unassigned if condition is false
# Returns 52 # UnboundLocalError:
and referenced before assignment
it is also sometimes caused by careless use of the existing assignment operators. A statement like n += 1 is treated as n = n + 1. If used before n is assigned an initial value, it will fail. UnboundLocalError
def func(): n += 1
# Error: UnboundLocalError
It's important to emphasize that variable names never change scope: they are either global or local variables, and this is determined at function definition time. Here's an example that illustrates this: x = 42 def func(): print(x) x = 13
# fallen. UnboundLocalError
Function()
In this example, it might look like the print() function is printing the value of the global variable x. However, the assignment to x below marks x as a local variable. The error is the result of accessing a local variable that has not yet been assigned a value. If you remove the print() function, you end up with code that appears to be reassigning the value of a global variable. For example, consider this: x = 42 def func(): x = 13 func() # x is still 42
When this code runs, x keeps its value of 42, even though it appears to change the global variable x inside the func function. When variables are assigned within a function, they are usually bound as local variables; Consequently, the variable x in the function body refers to an entirely new object containing the value 13, not the external variable. To change this behavior, use the global statement. global declares names as belonging to the global namespace and is required when changing global variables. Here is an example:
x = 42 y = 37 def func(): global x # 'x' is in global namespace x = 13 y = 0 func() # x is now 13th and is still 37.
It should be noted that using the global declaration is generally considered bad Python style. If you're writing code that requires a function to change state behind the scenes, consider using a class definition and changing state by changing a class instance or variable instead. Example: class Config: x = 42 def func(): Config.x = 13
Python allows nested function definitions. Here's an example: def countdown(start): n = start def display(): # nested function definition print('T-minus', n) while n > 0: display() n -= 1
Variables in nested functions are bound by lexical scope. That is, names are resolved first to the local scope and then to consecutive contiguous scopes from the innermost scope to the outermost scope. Again, this is not a dynamic process: the name binding is determined once at function definition time based on the syntax. As with global variables, inner functions cannot reassign the value of a local variable defined in an outer function. For example, this code doesn't work: def countdown(start): n = start
def display(): print('T-minus', n) def decrement(): n -= 1 # Falla: UnboundLocalError while n > 0: display() decrement()
To fix this, you can declare n non-local like this: def countdown(start): n = start def display(): print('T-minus', n) def decrement(): nonlocal n n -= 1 # Change n externally as long as n > 0: show() decrement()
it cannot be used to refer to a global variable; must refer to a local variable in an outer scope. So if a function is assigned to a global, you should still use the global declaration as described above. not local
Using nested functions and non-local declarations is not a common programming style. For example, internal functions have no external visibility, which can make testing and debugging difficult. However, nested functions are sometimes useful for breaking complex calculations into smaller parts and hiding internal implementation details.
Recursion Python supports recursive functions. For example: def sum(n): if n == 0: return 0 else: return n + sum(n-1)
However, the depth of recursive function calls is limited. The sys.getrecursionlimit() function returns the current maximum recursion depth and the sys.setrecursionlimit() function can be used to change the value. The default value is 1000. Although it is possible to increase the value, programs are still constrained by the heap size limits imposed by the host operating system. If the recursion depth is exceeded, a RuntimeError exception is thrown. If the limit gets too high, Python may crash due to a segmentation error or some other operating system error. In practice, problems with the recursion limit only arise when working with deeply nested recursive data structures such as trees and graphs. Of course, many algorithms with trees lend themselves to recursive solutions, and if your data structure is too large, you may exceed the stack limit. However, there are some smart solutions. See Chapter 6 on generators for an example.
The Lambda Expression An anonymous (that is, unnamed) function can be defined with a lambda expression: lambda args: expression
is a comma-separated list of arguments, and expression is an expression containing those arguments. Here's an example: Arguments
a = lambda x, y: x + y r = a(2, 3) # r gives 5
Code defined with Lambda must be a valid expression. Multiple statements and other non-expression statements such as B. try and while cannot appear in a lambda expression. Lambda expressions follow the same scoping rules as functions. One of the main uses of Lambda is to define small callback functions. For example, you can see how it is used with built-in operations like sorted(). For example:
# Sort a list of words by the number of unique letters result = sorted(words, key=lambda word: len(set(word)))
Caution is advised when a lambda expression contains free variables (not specified as parameters). Consider this example: x = 2 f = lambda y: x * y x = 3 g = lambda y: x * y print(f(10)) # --> prints 30 print(g(10)) # --> prints 30
In this example, you can assume that calling f(10) returns 20, reflecting the fact that x was 2 at definition time. However, this is not the case. As a free variable, the evaluation of f(10) uses the value x has at the time of evaluation. It could be a different value than the one used when defining the lambda function. This behavior is sometimes referred to as late binding. If it is important to capture the value of a variable at definition time, use a standard argument: x = 2 f = lambda y, x=x: x * y x = 3 g = lambda y, x=x: x * y print (f(10)) # --> prints 20 print(g(10)) # --> prints 30
This works because the default argument values are only evaluated at function definition time and would therefore capture the current value of x.
Higher Order Functions Python supports the concept of "higher order functions". This means that functions can be passed as arguments to other functions, placed in data structures, and returned as a result by a function. Functions are sometimes called "premium," meaning there is no difference in how you interact with a function and other types of data. Here's an example of one
Function that accepts another function as input and calls it after a delay time. Maybe you're trying to emulate the performance of a microservice in "the cloud". import time def after(seconds, func): time.sleep(seconds) func() # Usage example def hello(): print('Hello World') after(10, hello)
# After 10 seconds, print 'Hello World'
In this example, the func argument to after() is an example of what is sometimes called a "function callback". This refers to the fact that the after() function "calls back" the function supplied as an argument. When a function is passed as data, it implicitly carries information regarding the environment in which the function was defined. For example, suppose the Greeting() function uses a variable like this: def main(): Name = 'Guido' def Greeting(): print('Hello', Name) after(10, Greeting)
# Produced: 'Hello Guido'
Principal()
In this example, the variable name from Greeting() is used, but it is a local variable of the external function main(). When the greeting is passed to after(), the function remembers its environment and uses the value of the required name variable. This is based on a feature known as "close". A closure is a function along with an environment that contains all the variables needed to run the body of the function. Nested functions and closures are often useful when you want to write code based on the concept of lazy or lazy evaluation. The function after() as
shown above is an illustration of this concept. You get a function that is not evaluated immediately, but only later. This is a common programming pattern that occurs in other contexts. For example, a program may have functions that run only in response to events (e.g., key press, mouse movement, network packet arrival, etc.). In all of these cases, the evaluation of the function is deferred until something interesting happens. When the function is finally executed, a closure ensures that the function gets everything it needs. You can also write functions that create and return other functions. For example: def make_greeting(name): def Greeting(): print('Hello', name) Response greeting
f = g = f() g()
make_greeting('Guido') make_greeting('Ada') # Producer: 'Hola Guido' # Producer: 'Hola Ada'
In this example, the make_greeting() function doesn't perform any interesting calculations. Instead, it creates and returns a Greeting() function that does the actual work. This only happens when this function is later evaluated. In this example, the two variables f and g contain two different versions of the believe() function. Although the make_greeting() function that created these functions is no longer executed, the Greeting() functions still remember the Name variable that was defined; is part of the functional completion. A caveat with closures is that binding to variable names is not a "snapshot" but a dynamic process, meaning that the closure points to the name variable and its most recently assigned value. That's subtle, but here's an example that illustrates where problems can arise: def make_greetings(names): funcs = []
for name in names: func.append(lambda: print('Hello', name)) return funcs # Test a, b, c = make_greetings(['Guido', 'Ada', 'Margaret']) a() # 'Hello Margaret' drunk b() # 'Hello Margaret' drunk c() # 'Hello Margaret' drunk
This example creates a list of various functions (using Lambda) and it appears that they all use a unique name value (since it changes with each iteration of a for loop). This is not the case: all functions end up using the same name value, but one that has been set when the outer function make_greetings() returns. This is probably unexpected and often not what you want. If you want to capture a copy of a variable, remember to capture it as a default argument as described above: def make_greetings(names): funcs = [] for name in names: funcs.append(lambda name=name: print(' Hello ', name)) return funcs # try a, b, c = make_greetings(['Guido', 'Ada', 'Margaret']) a() # Print 'Hello Guido' b() # Print 'Hello Ada ' c () # Print 'Hello Margaret'
In these last two examples, the functions were defined using Lambda. This is often a favorite shortcut for creating small callback functions. However, it is not a strict requirement. You could also have rewritten the code like this: def make_greetings(names): funcs = [] for name in names: def Greeting(name=name):
print('Hello', name) funcs.append(greeting) returns funcs
The choice of when and where to use Lambda is usually a matter of personal preference and code clarity. If it makes the code harder to read, maybe it should be avoided.
Passing Arguments in Callback Functions A challenging programming problem with callback functions is passing arguments to the provided function. Consider the after() function written above: import time def after(seconds, func): time.sleep(seconds) func()
In this code, func() is programmed to be called with no arguments. If you wanted to pass additional arguments, you're out of luck. For example, you can try: def add(x, y): print(f'{x} + {y} -> {x+y}') return x + y after (10, add(2, 3) )
# fail: add() is called immediately
In this example, the add(2, 3) function executes immediately and returns 5. The after() function fails 10 seconds later when trying to do 5(). This is definitely not what you intended. However, there doesn't seem to be an obvious way to get it to work when calling add() with your desired arguments. This issue is indicative of a larger design issue related to the use of functions and functional programming in general, particularly issues related to the composition of functions. When functions are mixed in different ways, there is often a need to think about how to connect the inputs and outputs of the functions. It's not always that easy.
In this case, one solution is to use lambda to wrap the computation into the required function with null arguments. For example: after(10, lambda: add(2, 3))
A small function with no arguments like this is sometimes called a "thunk". Basically, it represents an expression that will later be evaluated when it is finally called as a null-argument function. This can be a general way to postpone the evaluation of any expression until later: put the expression in a lambda and call the function when you really need the value. As an alternative to using Lambda, you can also use functools.partial() to create a partially evaluated function like this: from functools import Partial After(10, Partial(add, 2, 3))
creates an invocable with one or more of the arguments already specified and cached. This can be a useful way to match non-conforming functions with expected call signatures in callbacks and other applications. Here are some more examples of using partial(). partial()
def func(a, b, c, d): print(a, b, c, d) f = partial(func, 1, 2) f(3, 4) f(10, 20)
# Fijar a=1, b=2 # func(1, 2, 3, 4) # func(1, 2, 10, 20)
g = part(function, 1, 2, d=4) g(3) g(10)
# setze a=1, b=2, d=4 # func(1, 2, 3, 4) # func(1, 2, 10, 4)
Both partial() and lambda can be used for a similar purpose, but there is an important semantic difference between the two techniques. With partial(), the arguments are evaluated and bound at the time the partial function is first defined. For a lambda with zero arguments, the arguments are evaluated and bound when the lambda function is later executed (all evaluation is deferred). This can be illustrated with this example:
>>> ... ... >>> >>> >>> >>> >>> >>> >>> 30 >>> 5 >>>
def func(x, y): Rückgabe x + y a = b = f = g = a = b = f()
2 3 Lambda: func(a, b) partial(func, a, b) 10 20 # Use the current values of a, b
Gramm()
# Use the initial values of a, b
Because partials are fully evaluated, the calls made by partial() are objects that can be serialized into bytes, stored in files, and even sent over network connections (e.g. using the Pickle standard library module). This is not possible with a Lambda function. So in applications where functions are passed, perhaps to Python interpreters running in different processes or on different machines, you'll find that partial() is a bit more adaptable. That being said, partial function application is closely related to a concept known as "currying". Currying is a functional programming technique in which multi-argument functions are expressed as a chain of nested single-argument functions. Here's an example: # Function with three arguments def f(x, y, z): returns x + y + z # Curry version def fc(x): returns lambda y: (lambda z: x + y + z) # usage example a = f(2, 3, 4) b = fc(2)(3)(4)
# Function with 3 arguments # Curried version
This is not a common Python programming style, and there are few practical reasons for doing so. However, in conversations with programmers who have spent too much time racking their brains with things like lambda calculi, you will sometimes hear the word "currir". This technique for handling multiple arguments is named after the famous logician Haskell Curry. Knowing what's going on can come in handy when you come across a group of functional programmers at a social event waging a heated war. Coming back to the original argument-passing problem, another option for passing arguments to a callback function is to accept them separately as arguments to the calling outer function. Consider this version of the after() function: def after(seconds, function, *args): time.sleep(seconds) func(*args) after(10, add, 2, 3)
# calls add(2, 3) after 10 seconds
You will be careful to note that passing keyword arguments to func() is not supported. This is intentional. A problem with keyword arguments is that the argument names of the given function can clash with the argument names already in use (e.g. seconds and function). Keywords arguments can also be reserved to specify options for the after() function itself. For example: def after(seconds, func, *args, debug=False): time.sleep(seconds) if debug: print('About to call', func, args) func(*args)
All is not lost, however: if you need to supply keyword arguments to func(), you can still do so with partial(). For example: after(10, partial(add, y=3), 2)
If you want the after() function to accept keyword arguments, you can safely do so by using only positional arguments. For example:
def after(seconds, func, debug=False, /, *args, **kwargs): time.sleep(seconds) if debug: print('About to call', func, args, kwargs) func(*args, * *kwargs) nach (10, sum, 2, y=3)
Another potentially disturbing idea is realizing that after() actually represents two different function calls that have been merged. Perhaps the problem of passing arguments can be broken down into two functions as follows: def after(seconds, func, debug=False): def call(*args, **kwargs): time.sleep(seconds) if debug: print( ' About to call', func, args, kwargs) func(*args, **kwargs) callback to (10, add)(2, y=3)
Now there is no conflict between the arguments of after() and the arguments of func. However, there is a possibility that this could lead to conflicts between you and your colleagues.
Returning the Results of Callbacks Another issue not covered in the previous section was returning the results of the calculation. Consider this modified after() function: def after(seconds, function, *args): time.sleep(seconds) return func(*args)
This works, but there are some very subtle corner cases arising from the fact that two separate functions are involved (the After() function itself and the provided callback function).
One problem concerns exception handling. For example, try these two examples: after("1", add, 2, 3) after(1, add, "2", 3) to str)
# Fails: TypeError (expects an integer) # Fails: TypeError (cannot concatenate int
In this example, a TypeError is thrown in both cases, but for very different reasons and in different functions. The first error is due to a problem in the after() function itself (time.sleep() is passing an incorrect argument). The second error is due to a problem with the execution of the func(*args) callback function. If it's important to differentiate between these two cases, there are a few ways to go about it. One way is to rely on chained exceptions. The idea is to package the callback errors in a different way that allows them to be handled separately from other error types. For example: class CallbackError(Exception): pass def after(seconds, func, *args): time.sleep(seconds) try: return func(*args) except Exception as err: raise CallbackError('Callback function failed') from err
This modified code isolates errors from the provided callback into their own exception category. You use it like this: try: r = after(delay, add, x, y) except CallbackError as err: print("Failed. Reason", err.__cause__)
If there was a problem executing after(), that exception was propagated without being caught. On the other hand, problems related to the actual execution of the provided callback function are detected and reported as CallbackError. It's all pretty subtle, but in practice error handling is difficult. This approach makes assigning blame more difficult
more accurate and the behavior of after() easier to document. In particular, if there is a problem with the callback, this is always reported as a CallbackError. Another way is to wrap the result of the callback function in some kind of result instance that contains both a value and an error. For example, you could define a class like this: Result class: def __init__(self, value=None, exc=None): self._value = value self._exc = exc def result(self): if self._exc: raise self._exc else: returns self._value
Then use this class to return the results of the after() function: def after(seconds, function, *args): time.sleep(seconds) try: return Result(value=func(*args)) except Exception as err: return Result(exc=err) # Application example: r = after(1, add, 2, 3) print(r.result())
# Impressions 5
s = after("1", summe, 2, 3) sleep() arg.
# Throw TypeError immediately. Mean
t = after (1, add, "2", 3) print (t.result())
# Returns a "Result" # Raises a TypeError
This second approach works by moving the reporting of the callback function results to a separate step. If there is a problem with after(), it will be reported immediately. If there is a problem with the callback() function,
This is reported when a user tries to get the result by calling the result() method. This latter way of packing a result into a special instance and then unpacking it is an increasingly common pattern found in modern programming languages. One reason it's used is that it makes type checking easier. For example, if you put a type hint in after(), its behavior is fully defined (e.g. it always returns a result and nothing else): def after(seconds, func, *args) -> Result: . . . .
Although this type of pattern isn't that common in most Python code, it does appear with some regularity when working with concurrency primitives like threads and processes. For example, instances of a future call behave like this when working with thread groups. Example: from concurrent.futures import ThreadPoolExecutor pool = ThreadPoolExecutor(16) r = pool.submit(add, 2, 3) print(r.result())
# Returns a future # Unwraps the result of the future
Decorators A decorator is a function that creates a wrapper around another function. The main purpose of this wrapper is to change or improve the behavior of the wrapped object. Syntactically, decorators are denoted with the special @ symbol as follows: @decorate def func(x): ...
The code above is a shortcut for the following: def func(x): ...
func = decorate(function)
A func() function is defined in the example. However, immediately after its definition, the function object is passed to the decor() function, which returns an object that replaces the original function. To see a concrete implementation, here is a @trace decorator that adds debugging messages to a function: def trace(func): def call(*args, **kwargs): print('Calling', func.__name__ ) return func ( * args, **kwargs) return call # Usage example @trace def square(x): return x * x
In this code, trace() creates a wrapper function that writes some debug output and then calls the original function object. So when you call square(), you see the output of the print() function in the wrapper. If it just could be that easy! In practice, functions also contain metadata such as function name, docstring, and type hints. When you wrap a function, this information is hidden. When writing a decorator, it is recommended to use the @wraps() decorator, as shown in this example: from functools import wraps def trace(func): @wraps(func) def call(*args, **kwargs): print ( 'call', func.__name__) return func(*args, **kwargs) return call
The @wraps() decorator copies various function metadata into the override function. In this case, the metadata of the specified func() function is copied
the returned wrapper function call(). When decorators are applied, they must appear on their own line immediately before the function. More than one decorator can also be applied. Here is an example: @decorator1 @decorator2 def func(x): pass
In this case, the decorators are applied as follows: def func(x): pass func = decorator1(decorator2(func))
The order in which the decorators appear can be important. For example, decorators such as @classmethod and @staticmethod often need to be positioned in class definitions so that they are at the outermost level. For example: class SomeClass(object): @classmethod @trace def a(cls): pass @trace @classmethod def b(cls): pass
# Yes
# No mistake.
The reason for this placement constraint is because of the return values of Sometimes a decorator returns an object that is different from a normal function. If the ultimate decorator doesn't expect this, things can break. In this case, @classmethod creates a classmethod descriptor object, as described in Chapter 7. Unless the @trace decorator was written for this, it will fail if the decorators are enumerated in the wrong order. @class method.
A decorator can also accept arguments. Suppose you want to change the @trace decorator to allow a custom message like this: @trace("Called {func.__name__}") def func(): pass
If arguments are given, the semantics of the decoration process are as follows: def func(): pass # create the decoration function temp = trace("You called {func.__name__}") # apply it to func func = temp( func )
In this case, the outermost function that takes the arguments is responsible for creating a decoration function. This function is then called with the function to be decorated to get the final result. Here's what the decorator implementation would look like: from functools import wraps def trace(message): def decor(func): @wraps(func) def wrapper(*args, **kwargs): print(message.format(func= func ) ) return func(*args, **kwargs) return wrapper return decoration
An interesting feature of this implementation is that the outer function is actually a kind of "decoration factory". Suppose you write code like this: @trace('Called {func.__name__}') def func1():
pass @trace('Called {func.__name__}') def func2(): pass
That would get boring quickly. You could simplify it by calling the external decorator function once and reusing the result like this: logged = trace('Called {func.__name__}') @logged def func1(): pass @logged def func2(): pass
Decorators do not necessarily have to replace the original function. Sometimes a decorator only performs an action like registration. For example, when creating an event handler registration, you might want to define a decorator that works like this: @eventhandler('BUTTON') def handle_button(msg): ... @eventhandler('RESET') def handle_reset ( message ) : ...
Hier ist ein Decorator, der dies handhabt: # Event-Handler decorator _event_handlers = { } def eventhandler(event): def register_function(func): _event_handlers[event] = func return func return register_function
Assign, Filter, and Reduce Programmers familiar with functional languages often want to know common list operations such as assign, filter, and reduce. Much of this functionality is found in list comprehensions and expression builders. For example: def square(x): return x * x nums = [1, 2, 3, 4, 5] squares = [ square(x) for x in nums ]
# [1, 4, 9, 16, 25]
Technically you don't even need the one line short function, you could write this: squares = [x * x for x in numbers]
Filtering can also be done with a list comprehension: a = [ x for x in nums if x > 2 ]
# [3, 4, 5]
When you use a generator expression, you get a generator that produces the results incrementally through iteration. For example: squares = (x*x for x in numbers) for n in squares: print(n)
# Create a generator
Python provides a built-in map() function, which is the same as mapping a function to a generator expression. For example, the example above could be written like this: squares = map(lambda x: x*x, nums) for n in squares: print(n)
The built-in function filter() creates a generator that filters values: for n in filter(lambda x: x > 2, nums): print(n)
If you want to accumulate or reduce values, you can use functools.reduce(). For example: from functools import Reduce Total = Reduce(Lambda x, y: x + y, nums)
In its general form, Reduce() accepts a function with two arguments, an iterable and an initial value. Here are some examples: Numbers = [1, 2, 3, 4, 5] Total = Reduce(Lambda x, y: x + y, Numbers) Product = Reduce(Lambda x, y: x * y, Numbers, 1 )
#15 #120
Pair = Reduce ( Lambda x , y : ( x , y ) , None , None ) # ( ( ( ( ( ( None , 1 , 2 ) , 3 ) , 4 )
accumulates values from left to right in the provided iterable. This is sometimes referred to as a "left fold" operation. Here is the pseudo code for Reduce(Func, Items, Initial): Reduce()
def reduce(func, items, initial): resultado = initial for item in items: result = func(result, item) return result
Experience has shown that using Reduce() is often confusing. In addition, common reduce operations such as sum(), min() and max() are already integrated. Your code will be easier to follow (and probably run faster) if you use one of these instead of trying to implement common operations directly with Reduce().
Examine Functions, Attributes, and Signatures For Yourself As we saw earlier, functions are objects, which means they can be assigned to variables, placed in data structures, and used like any other type of data in a program. They can also be inspected in a variety of ways. Table 1 shows some common attributes of functions. Many of them
Attributes are useful in debugging, logging, and other types of operations that involve functions. Table 1: Function Attributes
The f.__name__ attribute contains the name used when defining a function. f.__qualname__ is a longer name that contains additional information about the surrounding definition environment. The f.__module__ attribute is a string containing the name of the module in which the function was defined. The f.__globals__ attribute is a dictionary that serves as the global namespace for the function. This is usually the same dictionary attached to the associated module object. contains the docstring of the function. The f.__annotations__ attribute is a dictionary containing type hints, if any. f.__doc__
contains references to the values of closure variables for nested functions. These are somewhat hidden, but the following example shows how to see them: f.__closure__
def add(x, y): def do_add(): returns x + y
return do_add >>> a = add(2, 3) >>> a.__cierre__ (, ) >>> a.__cierre__[0].cell_contents 2 >>>
The f.__code__ object represents the compiled shell bytecode for the function body. Any attributes can be attached to functions. Here is an example: def func(): Statements func.secure = 1 func.private = 1
Attributes are not visible in the body of the function; they are not local variables and do not appear as names in the runtime environment. The main use of function attributes is to store additional metadata. Sometimes frameworks or various metaprogramming techniques use function tagging (i.e. attaching attributes to functions). An example is the @abstractmethod decorator used for methods within abstract base classes. This decorator just adds one attribute: def abstractmethod(func): func.__isabstractmethod__ = True return func
Another piece of code (a metaclass in this case) looks for this attribute and uses it to add additional controls to the instantiation. If you want to know more about a function's parameters, you can get its signature with the inspect.signature() function. For example: import inspect def func(x:int, y:float, debug=False) -> float:
pass sig = inspect.signature(Function)
Signature objects provide many useful functions for printing and retrieving detailed information about parameters. For example: # Print the signature in a nice format print(sig) # Produce (x: int, y: float, debug=False) -> float # Get a list of argument names print(list(sig.pameters)) # Produce [ 'x', 'y', 'debug'] # Iterate over the parameters and print different metadata for p in sig.parameters.values(): print('name', p.name) print('annotation' , p .note) print('type', p.type) print('default', p.default)
A signature is metadata that describes the nature of a feature (e.g. what you would call it, type hints, etc.). In fact, there are several things you can do with a signature. A useful operation for signatures is comparison. For example, you can check if two functions have the same signature like this: def func1(x, y): pass def func2(x, y): pass assert inspect.signature(func1) == inspect.signature(func2 )
This kind of comparison can be useful in different frameworks. For example, a framework can use signature comparison to determine if you are writing functions or methods that conform to an expected prototype. When stored in a function's __signature__ attribute, a signature appears in help messages and is returned on subsequent uses of inspect.signature. For example:
def func(x, y, z=Keine): ... func.__signature__ = inspect.signature(lambda x,y: None)
In this example, considering func further, the optional argument z would be hidden. Instead, the attached signature would be returned by inspect.signature().
Environment inspection functions can inspect their execution environment using the built-in functions globals() and locals(). globals() returns the dictionary that serves as the global namespace. This is the same as the func.__globals__ attribute. This is usually the same dictionary that contains the contents of the attached module. locals() returns a dictionary containing the values of all local and closing variables. This dictionary is not the actual data structure used to hold these variables. Local variables can come from external functions (via a closure) or be defined internally. locals() collects all these variables and puts them into a dictionary for you. Changing an item in the locals() dictionary has no effect on the underlying variable. For example: def func(): y = 20 locs = locals() locs['y'] = 30 print(locs['y']) print(y)
# Try changing and # Print 30 # Print 20
If you want a change to take effect, you must copy it back to the local variable with normal assignment. def func(): y = 20 locs = local() locs['y'] = 30 y = locs['y']
A function can get its own stack frame with inspect.currentframe(). A function can retrieve its caller's stack frame by following the stack trace via the frame's f.f_back attributes. Here's an example: import inspect def spam(x, y): z = x + y grok(z) def grok(a): b = a * 10 # outputs: {'a':5, 'b':50 } print(inspect.currentframe().f_locals) # outputs: {'x':2, 'y':3, 'z':5 } print(inspect.currentframe().f_back.f_locals) spam(2, 3 )
Sometimes you see stack frames retrieved with the sys._getframe() function instead. Example: import sys def grok(a): b = a * 10 print(sys._getframe(0).f_locals) print(sys._getframe(1).f_locals)
# myself # my interlocutor
The attributes in Table 2 can be useful when checking frames: Table 2: Frame attributes
Looking at stack frames is a potentially useful thing to know about debugging and verifying your code. For example, here's an interesting debugging function that lets you see the values of the caller's selected variables: import inspect from collections import ChainMap def debug(*varnames): f = inspect.currentframe().f_back vars = ChainMap(f .f_locals , f.f_globals) print(f'{f.f_code.co_filename}:{f.f_lineno}') for name in varnames: print(f' {name} = {vars[name]!r} ') # Usage Example def func(x, y): z = x + y debug('x','y') returns z
# Print x and y along with file/line
Dynamic Code Creation and Execution The exec(str [, globals [, locales]]) function executes a string containing arbitrary Python code. The code passed to exec() is executed as if the code actually appeared instead of the exec operation. Here is an example: a = [3, 5, 10, 13] exec('for i in a: print(i)')
The code passed to exec() is executed within the caller's local and global namespace. Note, however, that changes to local variables have no effect. For example: def func(): x = 10 exec("x = 20") print(x) # print 10
The reasons for this have to do with locals being a dictionary of collected local variables, not the actual local variables. See the previous section for more details. Optionally, exec() can accept one or two Dictionary objects to serve as a global and local namespace, respectively, for executing code. Here's an example: globs = {'x': 7, 'y': 10, 'birds': ['parrot', 'swallow', 'albatross'] } locs = { } # Run using the above dictionaries as Dictionary global and local namespace exec('z = 3 * x + 4 * y', globs, locs) exec('for b in birds: print(b)', globs, locs)
If you omit one or both namespaces, the current values of the global and local namespaces are used. If you only provide a dictionary for globals, then this is the case
It is used for both globals and locals. A common use of dynamic code execution is to create functions and methods. For example, here's a function that creates an __init__() method on a class given a list of names: def make_init(*names): parms = ','.join(names) code = f'def __init__ (self, { parms }):\n' for name in names: code += f' self.{name} = {name}\n' d = { } exec(code, d) return d['__init__'] # example use class vector: __init__ = make_init('x','y','z')
This type of technique is used in various parts of the standard library. For example, namedtuple(), @dataclass, and similar functions rely on building dynamic code with exec().
Asynchronous Functions and Wait Python provides a number of language features related to asynchronous code execution. These include so-called "asynchronous" (or coroutine) or awaitable functions. They are mainly associated with issues related to concurrency and the Asyncio engine. However, other libraries can also build on it. An "async function" or "coroutine function" is defined by prefixing a normal function definition with the additional async keyword. For example: async def Greeting(Name): print(f'Hello {Name}')
When you call such a function, you'll find that it doesn't work as usual. actually it doesn't work at all. Instead, you get back an instance of a "coroutine" object. For example:
>>> Greeting('Guido')
>>>
In order for the function to run, it must run under the supervision of other code. A common option is asyncio. For example: >>> import asyncio >>> asyncio.run(greeting('Guido')) Hello Guido >>>
This example shows the most important characteristic of asynchronous functions: that they are never executed by themselves. It always requires some kind of manager or library code to run. It's not necessarily asynchronous, as shown, but there's always something wrong with getting asynchronous functions to run. An asynchronous function is not only managed, but also evaluated like any other Python function. The statements are executed in order and all the usual control flow functions work. If you want to return a result, use the usual return statement. Example: async def make_greeting(name): return f'Hello {name}'
The return value is returned by the external run() function used to run the asynchronous function. Example: >>> import asyncio >>> a = asyncio.run(make_greeting('Paula')) >>> a 'Hello Paula' >>>
Async functions can call other async functions using an await expression like this: async def make_greeting(name): return f'Hello {name}' async def main():
for name on ['Paula', 'Thomas', 'Lewis']: a = wait do_greet(name) print(a) # Run it. You see greetings for Paula, Thomas and Lewis asyncio.run(main())
Use of await is only valid within an attached asynchronous function definition. It is also a necessary part to run asynchronous functions. Once you get out of the wait, you'll find the code is breaking. The requirement to use await indicates a common usage issue related to asynchronous functions. That is, their different scoring model prevents them from being used in combination with other existing parts of Python. In particular, it is never possible to directly write code that calls an async function from a non-async function: async def twice(x): returns 2 * x def main(): print(twice(2)) print( wait twice ( 2))
# Mistake. Does not execute function #. Mistake. You can't wait here.
Putting asynchronous and non-asynchronous functionality together in the same application can introduce significant complexity. Especially when you consider some of the programming techniques involving higher-order functions, callbacks, and decorators. In most cases, support for asynchronous functions should be built in as a special case. Python does exactly this for the iterator and context manager protocols. For example, an asynchronous context manager can be defined with the __aenter__() and __aexit__() methods in a class like this: class AsyncManager(object): def __init__(self, x): self.x = x async def yow( self ): happen
async def __aenter__(self): volver self async def __aexit__(self, ty, val, tb): pasar
Note that these methods are asynchronous functions and can therefore execute other asynchronous functions with await . To use such a manager, you must use special async with a syntax that is only legal inside an async function like this: # Example use async def main(): async with AsyncManager(42) as m: await m.yow () asyncio. execute(main())
A class can similarly define an asynchronous iterator by defining the __aiter__() and __anext__() methods. These are used by the async declaration so that it can only appear within an async function. From a practical point of view, you should keep in mind that asynchronous functions behave just like a normal function. They just need to run in a managed environment (e.g. asyncio). Unless you have consciously chosen to work specifically in this environment, you should go ahead and ignore asynchronous functions. You will be much happier.
Final Words: Thoughts on Functions and Composition Any type of system is constructed as a composition of components. In Python, these components include different types of libraries and objects. Below that are all the functions. Functions are the glue that puts a system together and the basic mechanism for moving data. Much of the discussion in this section has focused on the nature of functions and their interfaces. How are inputs to a function represented? How are departures handled? How are bugs reported? How can all these things be more tightly controlled or better understood?
When working on larger projects, it pays to think about the interaction of features and possible sources of complexity. This can often mean the difference between a user-friendly and intuitive API and disaster.
Generators Generator functions are one of Python's most interesting and powerful features. Generators are often presented primarily as a convenient way to define new types of iteration patterns. However, generators fundamentally change the entire function execution model. As such, you can do a lot more with them. This chapter covers generators, generator delegation, generator-based coroutines, and other generator internals.
Generators and yield When a function uses the yield keyword, it defines an object called a generator. The primary use of a generator is to produce values for use in iterations. Here is an example: def countdown(n): print('Countdown from', n) while n > 0: yield n n -= 1 # Example with x in countdown(10): print('T-minus' , x )
When you call this function, you will find that none of your code is executed. For example: >>> c = countdown(10) >>> c >>>
Instead, a generator object is created. The Generator object, in turn, doesn't execute the function until you start iterating over it. One way to do this is
call next() on it. Here is an example: >>> next(c) Countdown from 10 10 >>> next(c) 9
When next() is called, the generator function executes statements until it reaches a return statement. The performance statement returns a result in which execution of the function is suspended until next() is called again. During the pause, the function retains all of its local variables and execution environment. Upon resuming, execution continues with the statement following the return. is a shortcut for calling the __next__() method on a generator. For example, you could also do: next()
>>> c.__next__() 8 >>> c.__next__() 7 >>>
Typically, you don't call next() directly in a generator, but use the for statement or some other operation that consumes the elements. For example: for n in countdown(10): statements a = sum(countdown(10))
A generator function produces items until it returns (i.e., by reaching the end of the function or by using a return statement). This results in the generation of a StopIteration exception, which ends a for loop. If a generator function returns a value other than None, it is attached to the StopIteration exception. Suppose you have this generator function that uses both yield and return: def func(): yield 37
return 42
Here's how the code would run: >>> f = func() >>> f
>>> next(f) 37 >>> next(f) Trace (last current call): File "", line 1, in StopIteration: 42 >>>
Notice carefully that the return value is appended to StopIteration. To capture this value, you would have to explicitly intercept StopIteration and extract the value: try: next(f) except StopIteration as e: value = e.value
Normally, generator functions do not return a value. Generators are almost always consumed by a for loop where there is no way to get the exception value. This means that the only practical way to get the value is to control the generator manually with explicit next() calls. Most code that involves generators just doesn't do that. A subtle problem with generators concerns the case where a generator function is only partially consumed. For example, consider this code that breaks a loop early: for n in countdown(10): if n == 2: break statements
In this example, the for loop is aborted by calling break and the associated generator is never fully executed. If it is important for your generator
to do some sort of cleanup action, make sure you use tryfinally or a context manager. For example: def countdown(n): print('Countdown from', n) try: while n > 0: yield n n = n - 1 final: print('Only reached', n)
Generators are guaranteed to eventually run the code in the block even if the generator is not fully consumed (it will run when the abandoned generator is cleaned up). Similarly, any cleanup code that involves a context manager is also guaranteed to run when a generator exits: def func(filename): with open(filename) as file: ... yield data .. .# file closed here, itself when the builder is terminated
Properly cleaning up resources is a tricky problem. As long as you use constructs like try-finally or context managers, generators are guaranteed to do the right thing, even if they end prematurely.
Resettable generators Normally a generator function is executed only once. For example: >>> c = countdown(3) >>> for n in c: ... print('T-minus', n) ... T-minus 3 T-minus 2 T-minus 1 > > > for n in c:
... ... >>>
print('T-minus', n)
If you want an object that allows repeated iterations. Define it as a class and make the __iter__() method a generator: class countdown: def __init__(self, start): self.start = start def __iter__(self): n = self.start while n > 0: yield n n - = 1
This works because __iter__() creates a new generator on each iteration.
Generator Delegation A key feature of generators is that a yield function is never executed alone; it must always be triggered by other code via a for loop or explicit next() calls. This makes it somewhat difficult to write library functions that require performance, since calling a generator function is not enough to run them. To fix this, the yield from statement can be used. For example: def countup(stop): n = 1 while n 0: yield n
n -= 1 def up_and_down(n): Countdown-Ertrag(n) Countdown-Ertrag(n)
effectively delegates the iteration process to an external iteration. For example, to drive iteration, you would write code like this: Performance of
>>> for x in above_y_below(5): ... print(x, end=' ') 1 2 3 4 5 5 4 3 2 1 >>>
It mainly saves you from driving the iteration yourself. If you didn't have this function, you would have to write up_and_down(n) like this: yield from
def up_and_down(n): for x in countup(n): yield x for x in countdown(n): yield x
This is particularly useful when writing code that needs to iterate through nested iterables recursively. For example, this code simplifies nested lists: performance of
def flatten(items): for i in items: if isinstance(i, list): yield from flatten(i) else: yield i
Here's an example of how this works: >>> a = [1, 2, [3, [4, 5], 6, 7], 8] >>> for x in flatten(a): ... print ( x, end=' ') ... 1 2 3 4 5 6 7 8 >>>
A limitation of this implementation is that it is still subject to Python's recursion limit (it could not handle deeply nested structures). This will be addressed in the next section.
Using Generators in Practice At first glance it may not be obvious how to use generators for practical problems beyond the definition of simple iterators. However, generators are particularly effective for structuring various types of data management problems related to pipelines and workflows. One useful use of generators is to restructure code made up of deeply nested for loops and conditionals. Consider for a moment this script, which searches a directory of Python files for any comments containing the word "spam": import pathlib import re for path in pathlib.Path('.').rglob('*.py ') : if path.exists(): with path.open('rt', encoding='latin-1') as file: for line in file: m = re.match('.*(#.*)$ ', line ) if m: comment = m.group(1) if 'spam' in comment: print(comment)
Notice the number of levels of nested control flow and how your eyes hurt just looking at the code. Now consider this version with generators: import pathlib import re def get_paths(topdir, pattern): for path in pathlib.Path(topdir).rglob(pattern) if path.exists(): yield path
def get_files(paths): für Pfad in Pfade: mit path.open('rt', encoding='latin-1') als Datei: yield file def get_lines(files): für Datei in files: yield from file def get_comments( lineas): para linea en lineas: m = re.match('.*(#.*)$', linea) if m: yield m.group(1) def print_matching(lineas, subcadena): para linea en lineas: if substring in lines: print(substring) paths = get_paths('.', '*.py') files = get_files(pypaths) lines = get_lines(pyfiles) comments = get_comments(lines) print_matching(comments, 'spam')
This section breaks the problem down into smaller, more self-contained components. Each component deals only with a specific task. For example, the get_paths() generator only deals with pathnames, the get_files() generator only deals with opening files, etc. Only at the end are the different generators connected into a workflow to solve a problem. The fact that each component is small and isolated is an interesting abstraction technique. For example, consider the get_comments() generator. It takes as input any line of iterable text that it encounters. This text can come from almost anywhere (a file, a list, a generator, etc.). As a result, this functionality is much more powerful and customizable than when it was first released.
embedded in the middle of a deeply nested for loop with files as before. Generators encourage a useful way of code reuse by breaking down problems into small, well-defined computations like this one. Smaller tasks are also easier to justify, debug, and test. Generators are also useful for changing the function app's normal scoring rules. Normally, when you apply a function, it executes immediately and produces a result. Generators don't do that. When a generator function is applied, its execution is deferred until another piece of code calls next() on it (either explicitly or via a for loop). As an example, consider the nested list flattening generator function presented earlier: def flatten(items): for i in items: if instance(i, list): yield flatten(i) plus: yield i
A problem with this implementation is that it does not work with deeply nested structures due to Python's recursion limitation. This can be addressed by controlling iteration in a different way using a stack. Consider this version: def flatten(items): stack = [ iter(items) ] while stack: try: item = next(stack[-1]) if isinstance(item, list): stack.append(iter(item) ) else: create element except StopIteration: stack.pop()
This implementation creates an internal stack of iterators. It doesn't suffer from the usual limitations of Python's recursion boundary, as it writes data to an internal list instead of creating frames in the internal interpreter.
Stack. So if you need to flatten a few million layers of an incredibly deep data structure, you'll find that it works well. Do these examples mean rewriting all your code with wild generator patterns? no The main point is that by deferred evaluation of the generators you can change the spatiotemporal dimensions of normal function evaluation. There are several real world scenarios where these techniques can be useful and applied in unexpected ways.
Advanced Generators and Performance Expressions Within a generator function, the performance declaration can also be used as an expression appearing on the right side of an assignment operator. For example: def receiver(): print('Ready to receive') while True: n = yield print('Got', n)
A function that uses power in this way is sometimes referred to as an "extended generator" or "generator-based routine". Unfortunately, the terminology used is somewhat imprecise and becomes even more confusing as "routines" are more modernly associated with asynchronous functions. . To avoid this confusion, we'll use the term "extended generator" just to clarify that we're still talking about standard functions that use power. A function that uses yield as an expression is still a generator, but its usage is different. Instead of producing values, it runs in response to the values sent to it. For example: >>> r = receiver() >>> r.send(None) Ready to receive >>> r.send(1) I have 1 >>> r.send(2) I have 2 >>> right send('Hello')
# Advances to premiere
I have hi >>>
In this example, the initial call to r.send(None) is required to allow the factory to execute instructions that result in the first performance expression. At this point, the generator is halted, waiting for a value to be sent to it using the send() method of the associated generator object r. The value passed to send() is returned by the yield expression in the generator. Upon receiving a value, a generator executes instructions until the next return is found. As written, the function runs indefinitely. The generator can be shut down with the close() method as follows: >>> r.close() >>> r.send(4) Trace (last last call): File "", line 1, at StopIteration >>>
The close() operation throws a GeneratorExit exception inside the generator at the current performance. This usually results in the generator silently exiting (although you can intercept it to do some cleanup if you wish). After closing, a StopIteration exception is thrown if more values are sent to a generator. Exceptions can be thrown within a routine using the throw(ty [,val [,tb]]) method, where ty is the exception type, val is the exception argument (or tuple of arguments), and tb is an optional trace. For example: >>> r = receiver() Ready to receive >>> r.throw(RuntimeError, "Dead") Trace (last call): File "", line 1, to file "receiver.py", line 14, at Sink n = RuntimeError yield: Dead >>>
Exceptions thrown in any way are propagated from the currently executing power instruction in the factory. A generator can catch the exception and handle it appropriately. If a generator doesn't handle the exception, it is propagated out of the generator to be handled at a higher level.
Applications of Advanced Generators Advanced generators are a strange programming construct. Unlike a simple generator that feeds naturally into a for loop, there is no central language function driving an advanced generator. So why would you want a function that is passed values to? Is it purely academic? In the past, advanced generators were widely used in the context of concurrency libraries, especially those based on asynchronous I/O. In this context, they are often referred to as "routines" or "generator-based routines". However, much of this functionality has been packed into Python's asynchronous and await functions. So there is little practical reason to use the power for this particular use case. That being said, there are still some practical uses. Like generators, an advanced generator can be used to implement different types of evaluations and control flows. An example of this is the @contextmanager decorator found in the contextlib module. For example: from contextlib import contextmanager @contextmanager def manager(): print("Entering") try: yield 'somevalue' except exception as e: print("An errorUGED", e) final: print("Exiting")
Here a generator is used to put the two halves of a context manager together. Remember that context managers are defined by objects that implement the following protocol:
Class manager: def __enter__(self): return a value def __exit__(self, ty, val, tb): if ty: # An exception occurred... # Return true/ if handled. wrong otherwise
With the @contextmanager generator, everything before the performance statement is executed when the manager enters (via the __enter__() method). Everything that follows the performance is executed when the manager exits (via the __exit__() method). If an error has occurred, this is reported as an exception in the benefit statement. Here is an example: >>> with manager() as value: ... print(val) ... enter a value Exit >>> with manager() as value: ... print(int(val)) At int() with base 10 an invalid literal error occurred: 'some value' Leaving >>>
A wrapper class is used to implement this. Here's a simplified implementation that illustrates the basic idea: Class Manager: def __init__(self, gen): self.gen = gen def __enter__(self): # Run until yield return self.gen.send(None)
def __exit__( self , ty , val , tb ): # propagate an exception (if any) try : if ty : try : self .gen .throw ( ty , val , tb ) except ty : return False else : self .gen . . . . send ( None ) except StopIteration : return True
Another use of advanced generators is to use functions to encapsulate some sort of "worker" task. One of the key characteristics of a function call is that it builds an environment of local variables. Accessing local variables is highly optimized, much faster than accessing class and instance attributes. Because builders stay alive until explicitly closed or destroyed, a builder could be used to set up a long-running task. Here's an example of a generator that receives blocks of bytes and joins them into lines: @consumer def line_receiver(): data = bytearray() line = None linecount = 0 while True: part = yield line linecount += part.count ( b '\n') data.extend(part) if linecount > 0: index = data.index(b'\n') line = bytes(data[:index+1]) data = data[index+1: ] linecount -= 1 plus: line = None
In this example, a generator has been programmed to receive byte chunks, which are collected in a byte array. If the array contains a new row, it is extracted and returns a row. Otherwise, None is returned. Here's an example that illustrates how it works: >>> r = line_receiver() >>> r.send(b'hello') >>> r.send(b'world\nit ') b'hello world \n' > >> r.send(b'works!') >>> r.send(b'\n') b'works!\n'' >>>
Se podría escribir un codigo similar como a clase como esta: class LineReceiver: def __init__(self): self.data = bytearray() self.linecount = 0 def send(self, part): self.linecount += part.count( b'\n') self.data.extend(part) if self.linecount > 0: index = self.data.index(b'\n') line = bytes(self.data[:index+1]) self .data = self.data[index+1:] self.linecount -= 1 linea de retorno sonst: return Ninguno
Although writing a class might be more familiar, the code is more complex in some ways. It also runs slower. When testing on the author's machine, getting a large collection of fragments into a listener is 40-50% faster with a generator than with this class code. Most of these savings are due to the elimination of instance attribute lookups (local variables are faster).
While there are many other potential uses, the most important thing to note is that if you see yield being used in a context that doesn't involve iteration, you're probably using advanced functions like send() or throw() .
Generators and the Waiting Bridge A classic use of generator functions is in libraries related to asynchronous I/O, such as the standard asynchronous module. However, much of this functionality (since Python 3.5) has been moved to another language feature related to asynchronous functions and the wait statement (see the last part of Chapter 5). The Wait statement involves interacting with a cloaked generator. Here's an example that illustrates the underlying protocol used by await: class Awaitable: def __await__(self): print('About to await') yield # Must be a generator print('Resuming') # With await compatible function. Returns a function. define "awaitable"(): returns Awaitable() async def main(): function await()
Here's how to test the code with asyncio: >>> import asyncio >>> asyncio.run(main()) About to wait Resuming >>>
Is it absolutely necessary to know how this works? Probably not. All these machines are normally invisible. However, if you ever use asynchronous functions, you know that there is a generator function
buried somewhere deep inside. You will eventually find it if you keep digging the technical debt hole deep enough.
Final Words: A Brief History of Generators and Outlook Generators are one of Python's most interesting success stories. However, they are part of a larger story about iteration. Iteration is one of the most common programming tasks. In early versions of Python, iteration was implemented using sequence indexing and the __getitem__() method. This later developed into the current iteration protocol, which is based on the __iter__() and __next__() methods. Generators soon appeared as a more convenient way to implement an iterator. In modern Python there is almost no reason to implement an iterator with anything other than a generator. Even with iterable objects that you can define yourself, the __iter__() method itself is conveniently implemented this way. In later versions of Python, generators took on a new role as they developed "improved" functions related to coroutines (such as the send() and throw() methods). These no longer only affected the iteration, but opened up possibilities to use generators in other contexts. In particular, this formed the basis of many so-called "asynchronous" frameworks used for programming and network concurrency. However, as asynchronous programming has evolved, most of this has morphed into later features related to the async/await syntax. Therefore, it is not common for generator functions to be used outside of the context of iteration, their original purpose. If you are defining a generator function and NOT iterating, you should probably stop for a moment and reconsider your approach. There may be a better or more modern way to accomplish what you're doing.
Classes and Object-Oriented Programming Classes are used to create new types of objects. This chapter covers the details of the classes, but is not intended to be an extensive reference to object-oriented programming and design. Some programming patterns common to Python are discussed, as well as how you can customize classes to behave in interesting ways. The general structure of this chapter is from top to bottom. First, general concepts and techniques for using classes are described. The material gets more technical and focuses on internal implementation in later parts of the chapter.
Objects Almost all code in Python involves creating and executing actions on what are called "objects". For example, you can create a String object and manipulate it like this: >>> s = "Hello World" >>> s.upper() 'HELLO WORLD' >>> s.replace('Hello', 'Hello Cruel ') 'hello cruel world' >>> s.split() ['hello', 'world'] >>>
oder ein Listenobjekt: >>> names = ['Paula', 'Thomas'] >>> names.append('Lewis') >>> names ['Paula', 'Thomas', 'Lewis'] >> > names[1] = 'Tom' >>>
A key feature of objects is that there is usually some state (e.g. the characters in a string, the elements in a list, etc.) and methods that operate on that state. Methods are invoked on the object itself and appear as functions appended with the dot (.) operator. Objects always have an associated type. You can see it with type() like this: >>> type(names)
>>>
An object is called an instance of its type. For example, names is an instance of list.
The class statement New objects are defined with the class statement. A class typically consists of a collection of functions that make up the methods. Here's an example: class Account: def __init__(self, owner, balance): self.owner = owner self.balance = balance def __repr__(self): return f'Account({self.owner!r}, {self.owner !r}, {self.owner!r} balance !r})' def deposit(auto, amount): auto.balance += amount def withdrawal(auto, amount): auto.balance -= amount def query(auto) : return auto.balance
It is important to note that a class statement itself does not create instances of the class (e.g. no accounts are created in the example above). Rather, a class simply contains the methods that are available to instances that are created later. You could think of it as a model. Functions defined within a class are called methods. An instance method is a function that operates on an instance of the class and is passed as the first argument. By convention, this argument is called self. In the example above, deposit(), draw(), and query() are examples of instance methods. The __init__() and __repr__() methods of the class are examples of so-called "special" or "magic" methods. These methods have a special meaning for the runtime of the interpreter. The __init__() method is used to initialize the state when new instances are created. The __repr__() method returns a string to indicate an object. Defining this method is optional, but it makes debugging easier and makes it easier to view objects from the interactive prompt. A class definition can optionally include both a documentation string and type hints. For example: class Account: ''' A simple bank account ''' owner: str balance: float def __init__(self, owner:str, balance:float): self.owner = owner self.balance = balance def __repr__(self ) : return f'Account({self.owner!r}, {self.balance!r})' def Deposit(self, Amount:Float): self.Balance += Amount def Payout(self, Amount:Float):
self.balance -= cantidad def consulta(self) -> float: self.balance zurückgeben
Tip hints do not change any aspect of how a class works. That is, they do not introduce any additional verification or validation. It's pure metadata that can be useful for third-party tools, an IDE, or certain advanced programming techniques. They are not used in most of the following examples.
Instances Instances of a class are created by invoking a class object as a function. This creates a new instance, which is then passed to the __init__() method. The arguments to __init__() consist of the newly created instance along with the arguments provided when the class object is invoked. For example: # Create some accounts a = Account('Guido', 1000.0) # Call account.__init__(a, 'Guido', 1000.0) b = Account('Eva', 10.0) # Call account.__init__(b , ' Eve', 10.0)
Inside __init__(), attributes are stored in the instance by assigning them to themselves. For example, self.owner = owner stores an attribute on the instance. After the newly created instance is returned, these attributes, as well as the methods of the class, are accessed using the dot operator (.) as follows: a.deposit(100.0) b.withdraw(50.00 owner = a.owner
# call account. deposit(a, 100.0) # call account. withdrawal(b, 50.0) # Get the account holder
It is important to emphasize that each instance has its own state. You can view instance variables using the vars() function. For example: >>> a = Count('Guido', 1000.0) >>> b = Count('Eve', 10.0) >>> vars(a)
{'owner': 'Guido', 'credit': 1000.0} >>> vars(b) {'owner': 'Eve', 'credit': 10.0} >>>
Note that the methods are not shown here. The methods are in the class instead. Each instance maintains a connection to its class through its associated "type". For example: enter >>>
>>> Typ(b)
>>> Type(a).Deposit
>>> type(a).see
>>>
A later section discusses the implementation details of attribute binding and the relationship between instances and classes.
Attribute Access There are only three basic operations that can be performed on an instance; get, set and remove an attribute. For example: >>> a = Account('Guido', 1000.0) >>> a.owner # get 'Guido' >>> a.balance = 750.0 # set >>> from a.balance # delete >>> a .balance traceback (last current call): File "", line 1, at AttributeError: Object 'Account' has no attribute 'balance' >>>
Everything in Python is a dynamic process with very few restrictions. If you want to add a new attribute to an object after it's created, it's free
To do that. For example: >>> a = Account('Guido', 1000.0) >>> a.creation_date = '2019-02-14' >>> a.nickname = 'Ex BDFL' >>> a.creation_date '2019- 02-14' >>>
Instead of using the period (.) to perform these operations, they are sometimes performed by passing an attribute name as a string to the getattr(), setattr(), and delattr() functions. You can use the hasattr() function to check if an attribute is present. For example: >>> a = Account('Guido', 1000.0) >>> getattr(a, 'owner') 'Guido' >>> setattr(a, 'balance', 750.0) >>> delattr(a, 'balance') >>> hasattr(a, 'balance') False >>> getattr(a, 'withdraw')(100) >>> a Account('Guido', 650.0) >>>
# Methodenaufruf
Syntax like a.attr and getattr(a, 'attr') are interchangeable. So code like getattr(a, 'remove')(100) is the same as a.remove(100). It doesn't matter that draw() is a method. The getattr() function is notable for taking an optional default value. If you want to search for an attribute that may or may not exist, you can do: >>> a = Account('Guido', 1000.0) >>> getattr(s, 'balance', 'unknown') 1000.0 > >> getattr(s , 'creation_date', 'unknown') 'unknown' >>>
When you access a method as a simple attribute, you get an object called a bound method. For example: >>> a = Account('Guido', 1000.0) >>> w = a.withdraw >>> w
>>> w(100) >>> a Count('Guido', 900.0) >>>
A bound method is an object that contains both an instance (the self) and the function that implements the method. When you invoke a bound method by adding parentheses and arguments, execute the method and pass the attached instance as the first argument. For example, the call to w(100) above becomes a call to Account.withdraw(a, 100).
Scope Rules Although classes define an isolated namespace for methods, this namespace does not serve as a scope for resolving names used in methods. Therefore, when you implement a class, references to attributes and methods must be fully qualified. For example, in methods you always reference the attributes of the instance via self. So use self.balance, not balance. This also applies if you want to call a method from within another method. For example, let's say you want to implement draw() to deposit a negative amount: class Account: def __init__(auto, owner, balance): auto.owner = owner auto.balance = balance def __repr__(auto): return f ' Account({self.owner!r}, {self.balance!r})' def Deposit(self, Amount): self.balance += Amount
def withdrawal(auto,amount): auto.deposit(-amount)
# You must use self.deposit()
def query(auto): returns auto.balance
The class-level missing area is one area where Python differs from C++ or Java. If you have used these languages, the autoparameter in Python is the same as the so-called "this" pointer. In Python, you just always have to use it explicitly.
Operator and Protocol Overloading In Chapter 4, we provided information about the Python data model. Particular attention was paid to several so-called "special methods" that implement Python operators and protocols. For example, the len(obj) function calls obj.__len__() and obj[n] obj.__getitem__(n). When defining new classes, it is common to define some of these methods. The __repr__() method in the Account class was one such example. Improves the print output. You can define more of these methods if you're doing something more complicated, like B. a custom wrapper. Suppose you want to create a portfolio of accounts. Class PortfolioAccount: def __init__(self): self.accounts = [] def add_account(self, account): self.accounts.append(account) def total_funds(self): return sum(account.query()) for account in self. accounts) def __len__(auto): return len(auto.accounts) def __getitem__(auto, index):
return self.accounts[index] def __iter__(self): return iter(self.accounts) # Beispiel port = AccountPortfolio() port.add_account(Account('Guido', 1000.0)) port.add_account(Account('Eva', 50.0)) print(port.total_funds()) len(port) .
# -> 1050.0 # -> 2
# Print out the accounts for the port account: print (account) # Access a single account per port index[1]. query() # -> 50.0
Special methods at the end like __len__(), __getitem__() and __iter__() make an AccountPortfolio work with different Python operators like indexing and iteration as shown. You will sometimes hear the word "Pythonic" when describing the code (e.g. "this code is Pythonic"). The term is informal, but generally refers to whether or not an object works well with the rest of the Python environment. This means that basic Python features like iteration, indexing, and other operations are supported where it makes sense. You almost always do this by having your class implement special predefined methods, as described in Chapter X.
Inheritance Inheritance is a mechanism for creating a new class that specializes or modifies the behavior of an existing class. The original class is called the base class, superclass, or parent class. The new class is called a derived class, subclass, subclass, or subtype. When a class is created through inheritance,
inherits the attributes defined by its base classes. However, a derived class can override any of these attributes and add new attributes of its own. Inheritance is specified with a comma-separated list of base class names in the class declaration. If no base class is specified, a class implicitly inherits from the object. object is a class that is the root of all Python objects and provides the default implementation of some common methods like __str__() and __repr__(). One use of inheritance is to add new methods to an existing class. For example, let's say you wanted to add a panic() method to the account that would withdraw all funds. Here's how you would do it: class MyAccount(Account): def panic(self): self.withdraw(self.balance) # Example a = MyAccount('Guido', 1000.0) a.withdraw(23.0) # a.balance = 977 ,0 a.panic() # a.balance = 0
Inheritance can also be used to override existing methods. For example, here's a specialized version of Account that redefines the query() method to periodically overestimate account balances in hopes that someone inattentive will overdraw your account and suffer a hefty fine if they do Making payments on your mortgage (Account): def query(self): if random.randint(0,4) == 1: return self.balance * 1.10 else: return self.balance a = EvilAccount('Guido', 1000.0 ) a.deposit (10.0 ) # calls to Account.deposit(a, 10.0) available = a.inquiry() # calls to EvilAccount.inquiry(a)
In this example, the EvilAccount instances are identical to the Account instances except for the newly defined query() method. Occasionally, a derived class will re-implement a method but also want to call the original implementation. To do this, a method can explicitly call the original method with super(), as shown here: class EvilAccount(Account): def query(self): if random.randint(0,4) == 1: return 1.10 * super ( ). query() else: returns super().query()
In this example, super() gives you access to a previously defined method. The super().inquiry() call uses the original query() definition that was in use before EvilAccount redefined it. It's less common, but inheritance can also be used to add additional attributes to instances. Suppose you want to convert the factor 1.10 from the previous example to an instance-level customizable attribute. You could do it like this: class EvilAccount(Account): def __init__(self, owner, balance, factor): super().__init__(owner, balance) self.factor = factor def query(self): if random.randint ( 0,4) == 1: return self.factor * super().inquiry() otherwise: return super().inquiry()
A tricky problem when adding attributes is dealing with existing __init__() methods. In this example, we define a new version of __init__() that includes our additional instance variable factor. However, if __init__() is redefined, it is the responsibility of the child to initialize its parent with super().__init__() as shown. If you forget this, you will end up with a semi-initialized object and everything will break.
Since the initialization of the parent requires additional arguments, these must still be passed to the child method __init__(). Inheritance can crack code in subtle ways. Consider the __repr__() method of the Account class: class Account: def __init__(self, owner, balance): self.owner = owner self.balance = balance def __repr__(self): return f'Account({self.owner! r}, {auto.balance!r})'
The purpose of this method is to help debugging by generating nice output. However, the method is hardcoded to use the Account name. When you start using inheritance you will find that the output is incorrect and confusing: >>> class EvilAccount(Account): ... pass ... >>> a = EvilAccount('Eve', 10.0) > >> a Account( 'Eve', 10.0) # Misleading exit warning >>> type(a)
>>>
To fix this you need to change the __repr__() method to use the correct type name. For example: class Account: ... def __repr__(car): return f'{type(car).__name__}({car.owner!r}, {car.balance!r})'
It's a subtle change, but now you're seeing a more accurate result. Inheritance is not always used with all classes. However, if it's an expected use case for the class you're writing, you should pay attention to small details like this. As a general rule, you should avoid hardcoding class names.
Inheritance establishes a relationship in the type system in which each subclass inspects the type as a superclass. For example: >>> a = EvilAccount('Eve', 10) >>> type(a)
>>> isinstance(a, Count) True >>>
This is the so-called "is a" relationship (e.g. "EvilAccount is an account"). Sometimes the "is a" inheritance relationship is used to define taxonomies or ontologies of object types. For example: Food Class: Pass Sandwich(Food) Class: Pass RoastBeef(Sandwich) Class: Pass GrilledCheese(Sandwich) Class: Pass Taco(Food) Class: Pass
In practice, arranging objects this way can be quite difficult and dangerous. Suppose you want to add a HotDog class to the above hierarchy. where is he going Since a hot dog has a bun, you might be inclined to make it a subclass of sandwich. However, due to the overall curved shape of the bun with a tasty filling inside, a hot dog might be more like a taco. You can decide to make it a subclass of both: class HotDog(Sandwich, Taco): pass
At this point, everyone's heads explode and the office engages in a heated argument. This might also be a good time to mention that Python supports multiple inheritance. To do this, list more than one class as a parent class. The resulting child class inherits all of the combined characteristics of the parent class. More on multiple inheritance is covered in a later section.
Avoiding inheritance through composition A precautionary use of inheritance concerns the so-called implementation inheritance. To illustrate, suppose you want to create a stack data structure. A "quick" way would be to inherit from a list and add a new method to it: class Stack(list): def push(self, item): self.append(item) # example s = Stack() s.push( 1 ) s.push(2) s.push(3) s.pop() # -> 3 s.pop() # -> 2
This data structure works effectively like a stack, but it also has all the other properties of lists. For example inserting, classifying, sector remapping, etc. This is implementation inheritance: you used inheritance to reuse some code that you built something else on top of, but you also got a lot of functionality relevant to the problem you actually solve are not relevant. Users are likely to find the foreign object. Why does a stack have methods for ordering? Often it is better to rely on the composition. Instead of building a stack by inheriting from a list, it builds a stack as a separate class containing a list. The fact that there is a list in it is an implementation detail. For example:
class Stack: def __init__(self): self._items = list() def push(self, item): self._items.append(item) def pop(self): return self._items.pop() def __len__(self ): return len(self._items) # Ejemplo de uso s = Stack() s.push(1) s.push(2) s.push(3) s.pop() # -> 3 s.pop() # -> 2
This object works the same as before, but focuses solely on being a stack. There are no weird list methods or non-stack functions. There is much more clarity of purpose. A slight extension of this implementation could accept the inner list class as an optional argument: class Stack: def __init__(self, *, container=None): if container is None: container = list() self._items = container def push ( self, item ): self._items.append(item) def pop(self): return self._items.pop()
def __len__(self): return len(self._items)
An advantage of this approach is that it encourages loose coupling of components. Suppose you want to create a stack that stores its elements in a typed array instead of a list. You could do it like this: import array s = Stack(container=array.array('i')) s.push(42) s.push(23) s.push('a lot') # TypeError.
This is also an example of what is known as "dependency injection". Instead of wiring the stack to depend on the list, you can make it dependent on whatever container a user wants to pass, as long as it implements the required interface. Making the inner list a hidden implementation detail has a broader impact on the data abstraction problem. You might decide later that you don't even want to use a list. The old design makes it easier to switch. For example, if you changed the implementation to use bound tuples like this, stack users wouldn't even notice: class Stack: def __init__(self): self._items = None self._size = 0 def push(self, item ) : self._items = (item, self._items) self._size += 1 def pop(self): (item, self._items) = self._items self._size -= 1 return item
def __len__(self): returns self._size
When deciding whether or not to use inheritance, you should step back and ask yourself whether the object you're creating is a specialized version of the superclass, or whether you're just using it as a component to create something else . If the latter, don't use inheritance.
Avoiding Inheritance Through Functions Sometimes when writing classes, you find yourself with just one method that needs to be customized. For example, you might have written a data parsing class like this: class DataParser: def parse(self, lines): records = [] for line in lines: row = line.split(',') record = self.make_record (row ) records .append(row) Return records def make_record(self, row): raise NotImplementedError() class PortfolioDataParser(DataParser): def make_record(self, row): return { 'name': row[0], 'shares ': int( row[1]), 'price': float(row[2]) } parser = PortfolioDataParser() data = parser.parse(open('portfolio.csv'))
There are too many pipes here. If you are writing many single-method classes, consider using functions instead. For example:
def parse_data(lines, make_record): records = [] for line in lines: row = line.split(',') record = make_record(row) records.add(row) return records def make_dict(row): return { ' name': row[0], 'stocks': int(row[1]), 'price': float(row[2]) } data = parse_data(open('portfolio.csv'), make_dict)
This code is much simpler and just as flexible. In addition, simple functions are easier to test. If you need to extend it to classes, you can always do that later. Premature abstraction is often not good.
Dynamic Linking and Duck Typing Dynamic linking is the run-time mechanism Python uses to find the attributes of objects, and it allows Python to work with instances regardless of their type. In Python, variable names do not have an associated type. Therefore, the attribute binding process is independent of what kind of object obj it is. When you do a search like obj.name, it works on any object that has a name attribute. This behavior is sometimes referred to as duck writing, in reference to the saying "If it looks, quacks, and walks like a duck, then it is a duck." Python programmers often write programs based on this behavior. For example, if you want to create a custom version of an existing object, you can inherit from it, or you can create an entirely new object that looks and behaves like this but has nothing else to do with it. This latter approach is often used to maintain loose coupling of program components. For example, the code can be written to work with any type of object.
as long as you have a specific set of methods. One of the most common examples are various "iterable" objects defined in the standard library. There are all kinds of objects that use the for loop to produce values (lists, files, generators, strings, etc.). However, none of these inherit from a specific iterable base class type. You just implement the necessary methods to do the iteration and everything works.
The Danger of Inheriting from Built-in Types Python allows you to inherit from built-in types. However, this poses dangers. Suppose you have chosen to subdivide dict to force all keys to be uppercase. To do this, you can override the __setitem__() method as follows: class udict(dict): def __setitem__(self, key, value): super().__setitem__(key.upper(), value)
In fact, it even seems to work at first: >>> u = udict() >>> u['name'] = 'Guido' >>> u['number'] = 37 >>> u { 'NAME' : ' Guido', 'NUMBER': 37 } >>>
Except that further use shows that it only seems to work partially. In fact, it doesn't seem to work at all: >>> u = udict(name='Guido', number=37) >>> u { 'name': 'Guido', 'number': 37 } > >> u.update (color='blue') >>> u { 'name': 'Guido', 'number': 37, 'color': 'blue' } >>>
The problem with this is the fact that Python's built-in types aren't implemented like a regular Python class: they're implemented in C. Within this c
When implemented, most of the methods work exclusively in the C world. For example, the dict.update() method directly manipulates the dictionary data without ever routing through the __setitem__() method redefined in your custom udict class above. The Collections module has special UserDict, UserList, and UserString classes that can be used to create safe subclasses of the dict, list, and str types. For example, you will find that this solution works much better: From collections, import the UserDict class udict(UserDict): def __setitem__(self, key, value): super().__setitem__(key.upper(), value)
Here is an example of this new version in action: >>> u = udict(name='Guido', num=37) >>> u.update(color='Blue') >>> u {'NAME': ' Guido', 'NUM': 37, 'COLOR': 'Blue'} >>> v = udict(u) >>> v['title'] = 'BDFL' >>> v {'NAME': ' Guido ', 'NUM': 37, 'COLOR': 'Blue', 'TITLE': 'BDFL'} >>>
Most of the time, the need to subdivide built-in types can be avoided. For example, when creating new containers, it's probably best to create a new class, as for the Stack class on S.X. If you really need to derive a built-in function, remember that it might be a lot more work than you think.
Class Variables and Methods In a class definition, all functions should operate on an instance, which is always passed as the first self parameter. However, the class itself is also an object that can carry states and can also be manipulated. As an example, you could keep track of how many instances were created using a num_accounts class variable, as shown here:
class Account: num_accounts = 0 def __init__(self, owner, balance): self.owner = owner self.balance = balance Account.num_accounts += 1 def __repr__(self): return f'{type(self).__name__}( {self.owner!r}, {self.balance!r})' def deposit(self, amount): self.balance += amount def withdrawal(self, amount): self.deposit(-amount)
# You must use self.deposit()
def query(auto): returns auto.balance
Class variables are defined outside of the normal __init__() method. When changed, it uses the class, not itself. For example: >>> a = Account('Guido', 1000.0) >>> b = Account('Eve', 10.0) >>> Account.num_accounts 2 >>>
It's a bit unusual, but class variables can also be accessed through instances. For example: >>> 2 >>> >>> 3 >>>
a.num_accounts c = Account('Ben', 50.0) Account.num_accounts a.num_accounts
3 >>>
This is because the attribute lookup on instances checks the associated class if there is no matching attribute on the instance itself. This is the same mechanism Python normally uses to find methods. It is also possible to define a so-called "class method". A class method is a method applied to the class itself, not instances. A common use of class methods is to define alternative instance constructors. Suppose there was a requirement to create account instances from a legacy enterprise-level input format: data = '''
Guido 1000.0
'''
To do this you can write a @classmethod like this: class Account: def __init__(self, owner, balance): self.owner = owner self.balance = balance @classmethod def from_xml(cls, data): from xml.etree .ElementTree import XML doc = XML(data) return cls(doc.findtext('owner'), float(doc.findtext('amount'))) # example use data = '''
Guido 1000.0
''' a = Account.from_xml(data)
The first argument of a class method is always the class itself. By convention, this argument is often referred to as cls. In this example, cls is set to Account. If the purpose of a class method is to create a new instance, explicit steps must be taken to do so. In the last line of the example, calling cls(..., ...) is equivalent to calling Account(..., ...) for both arguments. The fact that the class is passed as an argument solves an important problem related to inheritance. Suppose you define a subclass of Account and now you want to create an instance of that class. You'll find that it still works: class EvilAccount(Account): pass e = EvilAccount.from_xml(data)
# Create an 'EvilAccount'
The reason this code works is that EvilAccount is now passed as cls. Therefore, the last declaration of the class method from_xml() now creates an instance of EvilAccount. Class variables and class methods are sometimes used together to configure and control how instances work. As another example, consider the following Date class: import time class Date: datefmt = '{year}-{month:02d}-{day:02d}' def __init__(self, year, month, day): self.year = year self.month = month self.day = day def __str__(self): return self.datefmt.format(year=self.year, month=self.month, day=self.day)
@classmethod def from_timestamp(cls, ts): tm = time.localtime(ts) return cls(tm.tm_year, tm.tm_mon, tm.tm_mday) @classmethod def today(cls): return cls.from_timestamp(time.time( ))
This class introduces a datefmt class variable that wraps the output of the __str__() method. This can be customized through inheritance: class MDYDate(Date): datefmt = '{month}/{day}/{year}' class DMYDate(Date): datefmt = '{day}/{month}/ {year }' # Example a = Date(1967, 4, 9) print(a) # 1967-04-09 b = MDYDate(1967, 4, 9) print(b) # 9/4/1967 c = DMYDate( 1967, 4, 9 ) print (c) # 4/9/1967
Configuration via class and inheritance variables like this is a common tool for tweaking the behavior of instances. Using class methods is crucial to make it work as they ensure the correct type of object is created. Example: a = MDYDate.today() b = DMYDate.today() print(a) # 2/13/2019 print(b) # 2/13/2019
Alternative instantiation is by far the most common use of class methods. A common naming convention for such methods is to include the word from_ as a prefix, such as B. from_timestamp(). You'll see this naming convention used to name class methods throughout the standard library and third-party packages. For example, dictionaries have a class method to create a pre-initialized dictionary from a set of keys: >>> dict.from_keys(['a','b','c'], 0) {'a' : 0 , 'b ': 0, 'c': 0} >>>
A caveat about class methods is that Python doesn't maintain the methods in a separate namespace from the instance methods. Therefore, they can still be called in an instance. For example: d = Date(1967,4,9) b = d.today()
# Calls Date.now(Date).
This is potentially quite confusing since a call to d.today() really has nothing to do with the d instance. However, you may see today() as a valid method in your IDE instances and in the help.
Static Methods Sometimes a class is simply used as a namespace for functions declared as static methods using @staticmethod. Unlike a normal method or a class method, a static method does not require an additional self or cls argument. A static method is just an ordinary function defined within a class. Example: class Ops: @staticmethod def add(x, y): returns x + y @staticmethod def sub(x, y): returns x - y
Typically, you don't create instances of such a class. Instead, call the functions directly from the class: a = Ops.add(2, 3) b = Ops.sub(4, 5)
#a=5 #a=-1
Sometimes other classes use a collection of static methods like this to implement "switchable" or "tunable" behavior, or to loosely mimic the behavior of an import module. Consider using inheritance in the example above from Account: class Account: def __init__(me, owner, balance): me.owner = owner me.balance = balance def __repr__(me): return f'{type(me) . __name__ }({self.owner!r}, {self.balance!r})' def deposit(self, amount): self.balance += amount def withdrawal(self, amount): self.balance -= def amount query ( self): return self.balance # A special account class "Evil" EvilAccount(Account): def deposit(self, amount): self.balance += 0.95 * amount def query(self): if random.randint( 0, 4 ) == 1: return 1.10 * self.balance else: return self.balance
The use of inheritance is a bit odd here. Introduces two different types of objects (Account and EvilAccount). There is also no obvious way to change an existing Account instance to an EvilAccount or vice versa, as this involves changing the instance type. Maybe it's better that the evil manifests itself as some kind of account policy. Here is an alternative formulation of Account that does this with static methods: class StandardPolicy: @staticmethod def deposit(account, amount): account.balance += amount @staticmethod defdraw(account, amount): account.balance -= amount @ staticmethod def query(account): return account.balance class EvilPolicy(StandardPolicy): @staticmethod def deposit(account,amount): account.balance += 0.95*amount @staticmethod def query(account): if random.randint(0, 4 ) == 1: return 1.10 * account.balance else: return account.balance class Account: def __init__(self, owner, balance, *, policy=StandardPolicy): self.owner = owner self.balance = balance self.policy = policy def __repr__(self):
return f'Account({self.policy}, {self.owner!r}, {self.balance!r})' def deposit(self, cantidad): self.policy.deposit(self, cantidad) def retirar(self , cantidad): self.policy.withdraw(self, cantidad) def consulta(self): return self.policy.inquiry(self)
In this reformulation, only one instance type, Account, is created. However, it has a special policy attribute that provides the implementation of various methods. If needed, the policy can be dynamically changed on an existing account instance: >>> a = Account('Guido', 1000.0) >>> a.policy
>>> a.deposit(500) >>> a.inquiry() 1500.0 >>> a.policy = EvilPolicy >>> a.deposit(500) >>> a.inquiry() # Could happen to be 1.10x more be 1975.0 >>>
One of the reasons @staticmethod makes sense here is that there is no need to instantiate StandardPolicy or EvilPolicy. The main purpose of these classes is to organize a method package, not to store additional account-related instance data. However, the loosely coupled nature of Python could certainly make it possible to update a policy to include your own data. Change the static methods to normal instance methods as follows. For example: class EvilPolicy(StandardPolicy): def __init__(self, deposit_factor, query_factor): self.deposit_factor = deposit_factor
self.inquiry_factor = question_factor def deposit(self, account, amount): account.balance += self.deposit_factor * amount def query(self, account): if random.randint(0,4) == 1: return self.inquiry_factor * account.balance else: return account.balance # Usage example a = Account('Guido', 1000.0, policy=EvilPolicy(0.95, 1.10))
This approach of delegating methods to support classes is a common implementation strategy for state machines and similar objects. Each operational state can be encapsulated in its own class of (often static) methods. A mutable instance variable, such as B. the policy attribute in this example, can be used to store implementation-specific details related to the current operational state.
A Word About Design Patterns When writing object-oriented programs, programmers sometimes become obsessed with implementing named design patterns, such as the "strategy pattern", "lightweight pattern", "singleton pattern", etc. Many of these are taken from the famous book Design Patterns. by Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides. If you are familiar with such patterns, then common design principles used in other languages can certainly be applied to Python. However, many of these documented patterns are intended to address specific problems arising from the strictly static type system of C++ or Java. The dynamic nature of Python makes many of these patterns obsolete or simply unnecessary (or excessive if used). That being said, there are some general principles for writing "good software". For example, strive to create code that is debuggable, testable, and extensible. Basic strategies as writing courses with useful __repr__()
Methods of favoring composition over inheritance and allowing dependency injection can go a long way towards achieving these goals. Python programmers also enjoy working with code that is considered "Pythonic". Typically, this means creating objects that obey various built-in protocols, such as B. iteration, containers, context management, etc. For example, rather than trying to implement an exotic data traversal pattern from a Java programming book, a Python programmer would probably implement it with a generator function that ends in a for loop. Or they simply replaced an entire pattern with a few dictionary lookups.
Data Encapsulation and Private Attributes In Python, all attributes and methods of a class are "public". This means that they are all fully accessible. This is often undesirable in object-oriented applications because internal implementation details should be hidden or encapsulated. To address this problem, Python relies on naming conventions to signal intended usage. One convention is to use names beginning with a single underscore (_) to indicate the internal implementation. For example, here is a version of the Account class with balance made a "private" attribute: Account class: def __init__(self, owner, balance): self.owner = owner self._balance = balance def __repr__( self ) : return f'Account({self.owner!r}, {self._balance!r})' def deposit(self, amount): self._balance += amount def withdrawal(self, amount): self._balance - = height
def consulta(self): return self._balance
In this code, the _balance attribute is meant to be an internal detail. Nothing prevents a user from logging in directly, but the leading underscore is a strong indication that a user should seek a more public interface. For example, use the Account.inquiry() method instead. A gray area indicates whether or not internal attributes are available for use in a subclass. For example, is the above inheritance example allowed to directly access the parent's _balance attribute? class EvilAccount(Account): def query(self): if random.randint(0,4) == 1: return 1.10 * self._balance else: return self._balance
As a general rule, this is considered acceptable in Python. IDEs and other tools are also likely to expose such attributes. Coming from C++, Java, or some other similar object-oriented language, it's similar to seeing _balance as an attribute named "protected". If you want a "more private" attribute, precede the name with two underscores (__). All names like __name are automatically renamed to a new name of the form _classname__name. This ensures that private names used in a superclass are not overridden by identical names in a subclass. Here's an example that demonstrates this behavior: Class A: def __init__(self): self.__x = 3
# Shattered himself._A__x
def __spam(self): # Destrozado a _A__spam() print('A.__spam', self.__x) def bar(self): self.__spam() clase B(A):
# Just call A.__spam()
def __init__(auto): A.__init__(auto) auto.__x = 37
# Shattered himself._B__x
def __spam(self): # Destrozado a _B__spam() print('B.__spam', self.__x) def grok(self): self.__spam()
# Calls B.__spam()
In this example, there are two different assignments to an __x attribute. Also, it looks like class B is trying to override the __spam() method through inheritance. However, this is not the case. Name manipulation causes unique names to be used for each definition. Try this example: >>> b = B() >>> b.bar() A.__spam 3 >>> b.grok() B.__spam 37 >>>
You can see the changed names more directly by looking at the underlying instance variables: >>> vars(b) { '_A__x': 3, '_B__x': 37 } >>> b._A__spam() A.__spam 3 > >> b._B__spam B.__spam 37 >>>
Although this scheme gives the illusion of hiding data, there is no strict mechanism to prevent access to a class's "private" attributes. In particular, if the class name and corresponding private attribute are known, they can still be accessed under the changed name. If such access to private attributes is still an issue, you might consider a more painful code review process.
At first glance, the name manipulation might seem like an additional editing step. However, the manipulation process actually takes place only once, at the time a class is defined. It does not occur during the execution of the methods, nor does it add any additional overhead to the execution of the program. Note that no name manipulation occurs in functions such as getattr(), hasattr(), setattr(), or delattr(), where the attribute name is specified as a string. For these functions, you would have to explicitly use the changed name as "_classname__name" to access the attribute. In practice, it's probably best not to think too much about the privacy of names. Single underscore names are widely used; Names with double underscores are less so. Although you can take additional steps to try to actually hide the attributes, the extra effort and complexity is hardly worth the benefits gained. Perhaps most helpful is just to know that underscores in a name are almost certainly an internal detail best left untouched.
Type-hint attributes of custom classes have no restrictions on their type or value. In fact, you can set an attribute to anything you like. For example: >>> a = Account('Guido', 1000.0) >>> a.owner 'Guido' >>> a.owner = 37 >>> a.owner 37 >>> b = Account('Eve' , 'a lot') >>> b.deposit('more') >>> b.inquiry() 'a lot more' >>>
If this is a practical problem, there are a few possible solutions. One is easy, don't do that! Another possible solution is to rely on external tools like linters and type checkers. For this purpose, you can use classes to specify optional type hints for selected attributes. For example:
class account: owner: str _balance: float
# Tip # Tip
def __init__(self, owner, balance): self.owner = Eigentümer self._balance = balance ...
The inclusion of type hints does not change the actual run-time behavior of a class. That is, no additional validation is performed, and nothing prevents a user from setting incorrect values in your code. However, hints can provide users with more useful information in your editor, preventing unwary application errors before they occur. In practice, specifying types can be difficult. For example, does the Account class allow someone to use an int instead of a float? Or how about a decimal number? You'll find that both work, even if the hint suggests otherwise. from decimal import Decimal a = Account('Guido', Decimal('1000.0')) a.withdraw(Decimal('50.0')) print(a.inquiry()) # -> 950.0
Knowing how to properly order types in such situations is beyond the scope of this book. When in doubt, it's probably best not to guess unless you're actively using tools to check your code.
Properties As mentioned in the previous section, Python imposes no run-time constraints on the values or types of attributes. However, such an application is possible if you put an attribute under the management of a so-called "property". A property is a special type of attribute that intercepts access to attributes and handles them via user-defined methods. These methods can manage the attribute as they wish. Here is an example:
import string class Account: def __init__(self, owner, balance): self.owner = owner self._balance = balance @property def owner(self): return self._owner @owner.setter def owner(self, value): if is not instance(value, string): Increase TypeError('Expected String') if not all(c in string.ascii_uppercase for c in value): Increase ValueError('Must ASCII largecase') if len(value) > 10 : raise ValueError('Must be 10 characters or less') self._owner = value
Here the owner attribute is limited to a very factual 10 character upper case ASCII string. Here's how it works when you try to use the class: >>> a = Account('GUIDO', 1000.0) >>> a.owner = 'EVA' >>> a.owner = 42 Trace(last most recent call) : . ..TypeError: str expected >>> to.owner = 'Carol' trace (last call is last): ... ValueError: must be uppercase ASCII >>> to.owner = 'RENÉE' trace (newer last Call): ... ValueError: Must be uppercase ASCII >>> a.owner = 'RAMAKRISHNAN'
Trace(last most recent call): ...ValueError: must not be longer than 10 characters >>>
The @property decorator is used to set an attribute as a property. In this example, it applies to the owner attribute. This decorator is always applied first to a method that gets the value of the attribute. In this case, the method returns the actual value stored in the _owner private attribute. The following @owner.setter decorator is used to optionally implement a method for setting the value of the attribute. This method performs several type and value checks before storing the value in the _owner private attribute. A key feature of properties is that the associated name (eg owner in the example) becomes "magical". That is, any use of this attribute is automatically routed through the getter/setter methods you implement. You don't need to change any pre-existing code for this to work. For example, you don't need to make any changes to the Account.__init__() method. This seems strange because __init__() does the assignment self.owner = owner instead of using the private attribute self._owner. This is by design: the purpose of the owner property was to validate attribute values. You will definitely want to do this when the instances are created. You will see that it works exactly as expected: >>> a = Account('Guido', 1000.0) Trace (last last call): file "account.py", line 5, in __init__ self.owner = owner file "account .py", line 15, in Owner Raise ValueError('Must be uppercase ASCII') ValueError: Must be uppercase ASCII >>>
Because each access to a property attribute automatically invokes a method, the actual value must be stored under a different name. This is why _owner is used in getter and setter methods. You can't use the owner as the location because that would result in an infinite loop.
In general, properties allow any specific attribute name to be intercepted. You can implement methods to get, set, or remove the value of the attribute. For example: class SomeClass: @property def attr(self): print('Getting') @attr.setter def attr(self, value): print('Setting', value) @attr.deleter def attr(self): print('Delete') # Example s = SomeClass() s.attr # Get s.attr = 13 # Set s.attr # Delete
Not all parts of a property need to be implemented. In fact, it's common to use properties to implement read-only computed data attributes. For example: class Box(object): def __init__(self, width, height): self.width = width self.height = height @property def area(self): return self.width * self.height @property def perimeter( self): returns 2*self.width + 2*self.height
# Application example b = Box(4, 5) print(b.area) print(b.perimeter) b.area = 5
# -> 20 # -> 18 # Error: attribute cannot be set
One thing to keep in mind when defining a class is the idea of making the programming interface as consistent as possible. Without properties, some values would be accessed as simple attributes like b.width or b.height, while other values would be accessed as methods like b.area() and b.perimeter(). Keeping track of when to add the extra() creates unnecessary confusion. A property can help solve this. Python programmers are often unaware that the methods themselves are implicitly treated as a property type. Consider this class: class SomeClass: def yow(self): print('Yow!')
If a user creates an instance as s = SomeClass() and then accesses s.yow, the original function object yow is not returned. Instead you get a bound method like this: >>> s = SomeClass() >>> s.yow
>>>
How did it happen? It turns out that functions behave like properties when placed in a class. Specifically, the functions magically intercept access to the attributes and create the bound method behind the scenes. When you define static methods and class methods using @staticmethod and @classmethod you actually change this process. @staticmethod returns the function of the method "as is" with no special wrapping or processing. More information on this process is covered later in the section on accessing attributes and descriptors.
Types, interfaces and abstract base classes
When you create an instance of a class, the type of that instance is the class itself. To check class membership, use the built-in function isinstance(obj, cls). This function returns True if an object, obj, belongs to class cls or any class derived from cls. Here is an example: Class A: Pass Class B(A): Pass Class C: Pass a = A() b = B() c = C()
# instance of 'A' # instance of 'B' # instance of 'C'
type(a) esinstancia(a, A) esinstancia(b, A) esinstancia(b, C)
# # # #
Returns Returns Returns Returns
the object of class A True True, B is derived from A False, B is not derived from C
Likewise, the built-in function issubclass(A) is a subclass of class B. Here is an example: issubclass(B, A) issubclass(C, A)
B)
returns True if class A
# Returns true # Returns false
A common use of class type relationships is in the specification of programming interfaces. For example, a top-level base class could be implemented to specify the requirements for a programming interface. This base class could be used as a means of type suggestion or type defense via isinstance(): class Stream: def Receive(self): raise NotImplementedError() def send(self, msg):
raise NotImplementedError() def close(self): raise NotImplementedError() # example. def send_request(flow, request): if not instance(flow, stream): raise TypeError('Stream expected') stream.send(request) return stream.receive()
The expectation with such code is not that Stream is used directly. Instead, various classes would inherit from Stream and implement the required functionality. A user would instead instantiate one of these classes. For example: Class SocketStream(Stream): def Receive(Auto): ... Def Send(Auto, Message): ... Def Close(Auto): ... Class PipeStream(Stream): Def Receive(Auto): ... def send(self, msg): ... def close(self): ... # Example
s = SocketStream() send_request(s, Anfrage)
A possible issue in this example could be the runtime's implementation of the interface in send_request(). Should one use a type hint instead? # Specify an interface as a type hint def send_request(stream:Stream, request): stream.send(request) return stream.receive()
Since type hints don't apply, the decision on how to validate an argument against an interface really comes down to when to do it (at runtime, as a code review step, or not). This use of interface classes is more common when organizing large frameworks and applications. However, a problem with this approach is ensuring that the subclasses actually implement the required interface. For example, if a subclass doesn't implement one of the required methods, or has a simple typo, the effects may go unnoticed at first (the code might still work normally). However, later the program would fail if the unimplemented method was ever called. In production, of course, that wouldn't happen until 3:30 a.m. To avoid this problem, it is common to define interfaces as an "abstract base class" using the abc module. This module defines a base class (ABC) and a decorator (@abstractmethod) that are used together to describe an interface. Here is an example: from abc import ABC, abstractmethod class Stream(ABC): @abstractmethod def receive(self): pass @abstractmethod def send(self, msg): pass
@abstractmethod def close (auto): pass
An abstract class is not meant to be instantiated directly. If you try to instantiate Stream you will get an error: >>> s = Stream() Trace (last current call): File "", line 1, at TypeError: Cannot create abstract instance of Stream class with abstract close methods, receive, send >>>
The error message tells you exactly which methods a stream should implement. This serves as a guide for writing subclasses. Suppose you write a subclass but make a mistake: class SocketStream(Stream): def read(self): # misnamed ... def send(self, msg): ... def close(self): . ..
An abstract base class catches the error when creating an instance. This is useful because errors are detected early. >>> s = SocketStream() Trace (last current call): File "", line 1, at TypeError: Cannot instantiate abstract class SocketStream with abstract methods >>>
Although you cannot instantiate an abstract class, you can define methods and properties for use in subclasses. Also an abstract method in the base
can still be called from a subclass. For example, calling super().receive() from a subclass is allowed.
Multiple Inheritance, Interfaces, and Mixtures Python supports multiple inheritance. If a child class enumerates more than one parent, the child inherits all of the parent's characteristics. For example: class Duck: def walk(self): print('Waddle') class Trombonist: def noise(self): print('Blat!') class DuckBonist(Duck, Trombonist): pass d = DuckBonist() d. walk() # -> Waddle d.noise() # -> Blat!
Conceptually it's a good idea, but then the practical realities start to set in. For example, what happens if Duck and Trombonist each define an __init__() method? Or if both define the noise() method? Suddenly he realizes that multiple inheritance is fraught with danger. To better understand the actual use of multiple inheritance, it's best to step back and think of it as a highly specialized tool for code organization and reuse, rather than a general-purpose programming technique. In particular, it's not common practice to take a collection of arbitrary, unrelated classes and combine them with multiple inheritance to create weird mutant musician ducks. Actually never do that. A more common use of multiple inheritance is to organize types and interface relationships. For example, the last section introduced the concept of an abstract base class. The purpose of an abstract basis is to specify a
Programming Interface For example, you could have several abstract classes like this: from abc import ABC, abstractmethod class Stream(ABC): @abstractmethod def receive(self): pass @abstractmethod def send(self, msg): pass @abstractmethod def close (self ): pass class Iterable(ABC): @abstractmethod def __iter__(self): pass
For these classes, multiple inheritance could be used to indicate which interfaces a subclass implemented: class MessageStream(Stream, Iterable): def receive(self): ... def send(self): ... def close( me) : ... def __iter__(me): ...
Again, this use of multiple inheritance isn't about implementation, it's about type relationships. For example, none of the inherited methods in this example do anything. There is no code reuse. First and foremost, the inheritance relationship allows you to do type checks like this:
m = MessageStream() isinstance(m, Stream) isinstance(m, Iterable)
# -> True # -> True
The other use of multiple inheritance is to define what are called "mixin" classes. A mixin class is a class that modifies or extends the functionality of other classes. To illustrate, consider the following class definitions: class Duck: def noise(self): return 'Quack' def waddle(self): return 'waddle' class Trombonist: def noise(self): return 'Blat!' def march(self): return 'Clomp' class Cyclist: def noise(self): return 'To your left!' def pedal(self): return 'Pedaling'
These classes are not related to each other. There is no inheritance relationship and they implement different methods. One thing they have in common, however, is that they each define a Noise() method. Using this as a guide, you could define the following "modifier" classes: Class LoudMixin: def Noise(self): return super().noise().upper()
class AnnoyingMixin: def noise(self): return 3*super().noise()
At first glance, these classes appear to be wrong. There is only one isolated method, and it uses super() to delegate to a nonexistent parent class. The classes don't work at all: >>> a = AnnoyingMixin() >>> a.noise() Traceback (last call): ...AttributeError: object 'super' has no 'noise' attribute >>>
These are mixed classes. They only work in combination with other classes that implement the missing functionality. For example: class LoudDuck(LoudMixin, Duck): pass class AnnoyingTrombonist(AnnoyingMixin, Trombonist): pass class AnnoyingLoudCyclist(AnnoyingMixin, LoudMixin, Cyclist): pass d = LoudDuck() d.noise() # -> 'QUACK' t = annoying trombonist() t.noise() # -> 'Blat!Blat!Blat!' c = AnnoyingCyclistLoud() c.noise() # -> 'ON YOUR LEFT! ON YOUR LEFT! ON YOUR LEFT!'
Since mixin classes are defined the same as regular classes, it's a good idea to include the word "mixin" as part of the class name. This naming convention provides greater clarity of purpose.
To fully understand mixins, you need to know a bit more about how inheritance and the super() function work. First, whenever you use inheritance, Python creates a linear chain of classes known as the "Method Resolution Order" or MRO for short. It is available as a __mro__ attribute on a class. Here are some examples of simple inheritance: class Base: pass class A(Base): pass class B(A): pass Base.__mro__ # -> (, ) A.__mro__ # -> (, , ) B.__mro__ # - > (, , , )
The MRO specifies the search order for the attribute search. Specifically, when you look up an attribute in an instance or class, each class in the MRO is checked in the order listed. The search ends when the first match is found. Note: The object class is included in the MRO because all classes inherit from the object regardless of whether it is listed as a parent or not. To support multiple inheritance, Python implements what is known as "cooperative multiple inheritance". In cooperative inheritance, all classes are placed in the MRO list according to two main ordering rules. The first rule states that a child class must always be checked before any of its parents. The second rule is that if a class has multiple parents, those parents must be checked in the same order in which they appear in the child's inheritance list. For the most part, these rules result in an MRO that makes “reasonable”. However, the exact algorithm that sorts the classes is quite complex and does not rely on any "simple" algorithm like depth-first search or breadth-first search. Instead, the order is determined using the C3 linearization algorithm described in the article "A Monotonic Superclass Linearization for Dylan" (K. Barrett,
et al., presented at OOPSLA'96). A subtle aspect of this algorithm is that Python rejects certain class hierarchies with a TypeError. Here is an example: Class X: Pass Class Y(X): Pass Class Z(X,Y): Pass
# Typing error. # Unable to create a consistent MRO
In this case, the method resolution algorithm rejects class Z because it cannot determine a meaningful order of the base classes. For example, class X appears before class Y in the inheritance list, so it needs to be checked first. However, class Y inherits from X. Therefore, checking X first violates the rule that children are checked first. In practice, these problems should rarely occur, and when they do, it usually indicates a more serious design problem. As an example of a real-world MRO, here is the MRO for the AnnoyingLoudCyclist class shown above: class AnnoyingLoudCyclist(AnnoyingMixin, LoudMixin, Cyclist): pass AnnoyingLoudCyclist.__mro__ # (, , # , , )
In the MRO you can see how both rules are fulfilled. In particular, each subclass always appears before its parent. The object class appears last because it is the parent of all other classes. Multiple parents are listed in the order they appear in the code. The behavior of the super() function is tied to the underlying MRO. Specifically, its role is to delegate attributes to the next class in the MRO. This is based on the class where super() is used. For example, when the AnnoyingMixin class uses super(), it looks at the instance's MRO to find its own position. From there, delegate the attribute lookup to the next class. In this example, using super().noise() in the AnnoyingMixin class calls LoudMixin.noise(). This is because LoudMixin is the next class listed in the MRO for AnnoyingLoudCyclist. The super().noise()
The operation of the LoudMixin class is then delegated to the Cyclist class. Each time super() is used, the choice of the next class varies depending on the instance type. For example, if you created an instance of AnnoyingTrombonist, super().noise() would call Trombonist.noise() instead. Designing for multiple cooperative inheritance and mixins is challenging. However, here are some design guidelines. First, subclasses are always checked before any base classes in the MRO. Therefore, it is common for mixins to share a common parent, and that parent to provide an empty implementation of methods. If multiple merge classes are used at the same time, they are aligned one after the other. The common parent is shown last, where you can provide a default implementation or error checking. For example: class NoiseMixin: def noise(self): raise NotImplementedError('noise() not implemented') class LoudMixin(NoiseMixin): def noise(self): return super().noise().upper() class AnnoyingMixin( NoiseMixin): def noise(self): return 3 * super().noise()
The second guideline is that all implementations of a mixin method must have an identical function signature. A problem with mixins is that they are optional and often mixed in an unpredictable order. For this to work, you need to ensure that operations with super() succeed regardless of the class that follows it. To do this, all methods in the call chain must have a compatible call signature. Finally, you need to make sure you use super() everywhere. Sometimes you'll find a class that calls its parent directly: class Base: def yow(self): print('Base.yow')
clase A(Base): def yow(self): print('A.yow') Base.yow(self)
# Direct call to parents
Klasse B(Basis): def yow(self): print('B.yow') super().yow(self) Klasse C(A, B): pass c = C() c.yow() # Salidas: # A.yow # Basis.yow
Such classes cannot safely be used with multiple inheritance. This breaks the orderly chain of method calls and creates confusion. For example, in the example above, no output from B.yow() ever appears, even though it is part of the inheritance hierarchy. If you do almost everything with multiple inheritance, you should use super() instead of making direct method calls on superclasses.
Type-Based Dispatch Sometimes you want to write code that dispatches based on a specific type. For example: if isinstance(obj, Duck): handle_duck(obj) elif isinstance(obj, Trombonist): handle_trombonist(obj) elif isinstance(obj, Cyclist): handle_cyclist(obj)
else: runtimeError('Unknown object')
Writing a large if-elif-else block as shown is clumsy and brittle. A common solution is to dispatch via a dictionary: handlers = { Duck: handle_duck, Trombonist: handle_trombonist, Cyclist: handle_cyclist } # Dispatch def dispatch(obj): func = handlers.get(type(obj)) if func: return func (obj) else: runtimeError(f'No handler for {obj}')
This solution assumes an exact type match. If inheritance is also supported in such dispatch, you need to go through the MRO: def dispatch(obj): for ty in type(obj).__mro__: func = handlers.get(ty) if func: return func( obj ) raise RuntimeError( f'There is no controller for {obj}')
Sometimes dispatch is implemented via a class-based interface using getattr() like this: class Dispatcher: def handle(self, obj): for ty in type(obj).__mro__: meth = getattr(self, f 'handle_{ty. __name__ }', None) if meth: return meth(obj) raise RuntimeError(f'No handle for {obj}')
def handle_Duck(self, obj): ... def handle_Posaunist(self, obj): ... def handle_Cyclist(self, obj): ... # Beispiel dispatcher = Dispatcher() dispatcher.handle(Duck()) Dispatcher. handle(radfahrer())
# -> drive_Duck() # -> drive_Cyclist()
This last example of using getattr() to send methods of a class is a fairly common programming pattern.
Class Decorators Sometimes you want to perform additional processing after a class has been defined. For example, adding a class to a registry or generating additional supporting code. One approach to solving this type of problem is to use a class decorator. A class decorator is a function that takes a class as input and returns a class as output. For example, consider the problem of keeping a registry: _registry = { } def register_decoder(cls): for mt in cls.mimetypes: _registry[mt.mimetype] = cls return cls # factory function using the registry def create_decoder( mimetype ) : returns _registry[mimetype]()
In this example, the register_decoder() function looks for a mimetypes attribute within a class. If found, it is used to add the class to a dictionary
Mapping MIME types to class objects. To use this function, use it as a decorator just before the class definition. For example: @register_decoder class TextDecoder: mimetypes = [ 'text/plain' ] def decode(self, data): ... @register_decoder class HTMLDecoder: mimetypes = [ 'text/html' ] def decode(self, data): ... @register_decoder class ImageDecoder: mimetypes = [ 'image/png', 'image/jpg', 'image/gif' ] def decode(self, data): ... # Usage example decoder = create_decoder('image / jpg ')
Class decorators are free to modify the content of the class given to them. For example, they could even rewrite existing methods. This is a common alternative to writing mixed classes and using multiple inheritance. For example, consider the following decorators: def noisy(cls): orig_noise = cls.noise def noise(self): return orig_noise(self).upper() cls.noise = Noise return cls def annoying(cls): orig_noise = cls .noise def Noise(self): return 3 * orig_noise(self)
cls.noise = noise return cls @nerving @loud class Cyclist(object): def noise(self): return 'On your left!' def pedal(self): return 'Pedaling'
This example produces the same result as the merge example in the previous section. However, there is no multiple inheritance and no use of super(). In each decorator, searching for cls.noise performs the same action as super(). However, since this only happens once when the decorator is applied (at definition time), the resulting calls to noise() execute slightly faster. Class decorators can also be used to create entirely new code. For example, a common task when writing classes is to write useful __repr__() methods to improve debugging: class Point: def __init__(self, x, y): self.x = x self.y = y def __repr__(self): return f'{type(self).__name__}({self.x!r}, {self.y!r})'
Writing such methods is often tedious. Perhaps a class decorator could create the method for you: import inspect def with_repr(cls): args = list(inspect.signature(cls).parameters) argvals = ', '.join('{self.%s!r} ' % arg for arg in args) code = 'def __repr__(self):\n' code += f' return f"{cls.__name__}({argvals})"\n'
locs = { } exec(code, locs) cls.__repr__ = locs['__repr__'] return cls # Beispiel @with_repr class Point: def __init__(self, x, y): self.x = x self.y = y
In this example, a __repr__() method is generated from the call signature of the __init__() method. The method is constructed as a string and passed to exec() to create a function. This function is attached to the class. Similar code generation techniques are used in parts of the standard library. For example, a convenient way to define data structures is to use a data class, as shown here: from dataclasses import dataclass @dataclass class Point: x: int y: int
A data class automatically creates methods such as __init__() and __repr__() from class type hints. The methods are created with exec() similar to the previous example. Here's how the resulting Point class works: >>> p = Point(2, 3) >>> p Point(x=2, y=3) >>>
However, a disadvantage of this approach is poor take-off performance. Dynamic code generation with exec() bypasses the compilation optimizations that Python normally applies to modules. So define a
A large number of classes in this way significantly slows down the import of your code. The examples shown in this section illustrate common uses of class decorators (ie, registration, code rewriting, code generation, validation, etc.). One problem with class decorators is that they must be explicitly applied to every class that uses them. This is not always desired. The next section describes a function that allows implicit manipulation of classes.
Supervised Inheritance As you saw in the previous section, sometimes you want to define a class and perform additional actions. A class decorator is a mechanism to do this. However, a parent class can also perform additional actions on its subclasses. This is achieved by implementing a class method __init_subclass__(cls). For example: class Base: @classmethod def __init_subclass__(cls): print('Initializing', cls) # example (you should see the message 'Initializing' for each class) class A(Base): pass class B(A): passport
If an __init_subclass__() method is present, it is automatically triggered by the definition of any subclass. This happens even when the child is buried deep in an inheritance hierarchy. Many of the tasks normally done with class decorators can be done with __init_subclass__() instead. Example: Class registration: class DecoderBase: _registry = { } @classmethod
def __init_subclass__(cls): für mt in cls.mimetypes: DecoderBase._registry[mt.mimetype] = cls # Factory-Funktion, die das Register verwendet def create_decoder(mimetype): return DecoderBase._registry[mimetype]() class TextDecoder(DecoderBase ) : mimetypes = ['text / plain' ] def decode ( self , data ): ... class HTMLDecoder ( DecoderBase ): mimetypes = [ 'text / html ' ] def decode ( self , data ): ... class ImageDecoder ( DecoderBase): mimetypes = [ 'image/png', 'image/jpg', 'image/gif' ] def decode(self, data): ... # Beispielverwendung decoder = create_decoder('image/jpg');
Here's an example of a class that automatically creates __repr__() methods from the signature of the __init__() class method: import inspect class Base: @classmethod def __init_subclass__(cls): # Create __repr__ method args = list(inspect .signature (cls).parameters) argvals = ', '.join('{self.%s!r}' % arg for arg in args) code = 'def __repr__(self):\n' code += f' return f "{cls.__name__}({argvals})"\n' locs = { }
exec(código, locs) cls.__repr__ = locs['__repr__'] class Point(Base): def __init__(self, x, y): self.x = x self.y = y
If multiple inheritance is used, you should use super() to ensure that all classes that implement __init_subclass__() are called. For example: Class A: @classmethod def __init_subclass__(cls): print('A.init_subclass') super().__init_subclass__() Class B: @classmethod def __init_subclass__(cls): print('B.init_subclass') super( ).__init_subclass__() # You should see the output of both classes here class C(A, B): pass
Monitoring inheritance with __init_subclass__() is one of the most powerful customization features in Python. Much of their power comes from their implicit nature. A top-level base class can use this to monitor an entire hierarchy of child classes behind the scenes. Such monitoring can register classes, rewrite methods, perform validations, and more.
Object Lifecycle and Memory Management When a class is defined, the resulting class is a factory for creating new instances. For example: account class: def __init__(self, owner, balance):
self.owner = owner self.balance = balance # Create some instances of Account a = Account('Guido', 1000.0) b = Account('Eve', 25.0)
Creating an instance is a two-step process with the special method __new__(), which creates a new instance, and __init__(), which initializes it. For example, the operation a = Account('Guido', 1000.0) performs these steps: a = Account.__new__(Account, 'Guido', 1000.0) if isinstance(a, Account): Account.__init__('Guido', 1000.0 )
__new__() normally takes the same arguments as __init__(), except for the first argument, which takes the class instead of an instance. However, the default implementation of __new__() usually just ignores them. Sometimes you see __new__() being called with a single argument. For example, this code also works: a = Account.__new__(Account) Account.__init__('Guido', 1000.0)
Using the __new__() method directly is rare, but sometimes used to create instances while omitting the __init__() method call. One such usage is in class methods. For example: import time class Date: def __init__(self, year, month, day): self.year = year self.month = month self.day = day @classmethod def today(cls): t = time.localtime() itself = cls.__new__(cls)
# Create instance
self.año = t.tm_year self.month = t.tm_month self.day = t.tm_day return self
Modules that perform object serialization, such as B. pickle also use __new__() to re-instance when objects are deserialized. This happens without ever calling __init__(). Sometimes a class will define __new__() when it wants to change some aspect of instantiation. Typical uses include instance caching, singletons, and immutability. Suppose you want the Date class to perform date switching, that is, to cache Date instances and reuse existing objects that have the same year, month, and day. It could be implemented like this: class Date: _cache = { } @staticmethod def __new__(cls, year, month, day): self = Date._cache.get((year,month,day)) if not self: self = super ().__new__(cls) self.year = year self.month = month self.day = day Date._cache[year,month,day] = self return self def __init__(self, year, month, day) : pass # Example d = Date(2012, 12, 21) e = Date(2012, 12, 21) assert that d is equal to e
# Same object
In this example, the class maintains an internal dictionary of previously created date instances. When creating a new date, the cache is queried first. when a
a match is found, that instance is returned. Otherwise, a new instance is created and initialized. A subtle facet of this solution is the void __init__() method. Even though caching is taking place, each call to Date() still calls __init__(). To avoid duplication, the method simply does nothing: instantiation actually happens in __new__() when an instance is first created. There are ways around the extra call to __init__(), but it requires sneaky tricks. One way to get around this is to make __new__() return an instance of a completely different type (e.g. belonging to a different class). The other solution is to use a metaclass, described later. Once created, the instances are managed by reference counting. When the reference count reaches zero, the instance is immediately destroyed. When the instance is about to be destroyed, the shell first looks for and calls a __del__() method associated with the object. For example: class Account(object): def __init__(self, owner, balance): self.owner = owner self.balance = balance def __del__(self): print('Deleting Account') >>> a = Account(' Guido', 1000.0) >>> from to delete account >>>
Occasionally a program will use the del statement to remove a reference to an object as shown. When this causes the object's reference count to reach zero, the __del__() method is called. In general, however, the del statement does not call __del__() directly, since there may be other object references elsewhere. There are also many other ways to delete an object. Example: reassigning a variable name or a variable going out of scope of a function: >>> a = Count('Guido', 1000.0) >>> a = 42 Count Delete
>>> def func(): ... a = Count('Guido', 1000.0) ... >>> func() Deleting Count >>>
In practice, it is rarely necessary for a class to define a __del__() method. The only exception is when destroying an object requires an additional cleanup action, e.g. B. closing a file, ending a network connection or freeing other system resources. Even in these cases, it's dangerous to rely on __del__() for a graceful shutdown, since there's no guarantee that this method will be called when you suspect it. To cleanly close resources, you must explicitly provide the object with a close() method. You must also ensure that your class supports the context manager protocol in order to use it with the with statement. Here's an example that covers all cases: class SomeClass: def __init__(self): self.resource = open_resource() def __del__(self): self.close() def close(self): self.resource.close( ) def __enter__ (self): return self def __exit__(self, ty, val, tb): self.close() # Closed via __del__() s = SomeClass() del s # Explicit close
s = SomeClass() s.close() # Closed at the end of a context block with SomeClass() as s: ...
Again, it should be emphasized that writing a __del__() is almost never necessary in most classes. Python already has garbage collection and there's just no need to do it unless additional action needs to be taken after the object has been destroyed. Even then, you still might not need __del__() (the object might already be scheduled to be cleaned up correctly even if it doesn't do anything). As if there weren't enough dangers in reference counting and object destruction, there are certain types of programming patterns, particularly those involving parent-child relationships, graphing, or caching, where objects can create what is called a "reference cycle" of reference ". Here is an example: class SomeClass: def __del__(self): print('Deleting') parent = SomeClass() child = SomeClass() # Create a child-parent reference loop parent.child = child child.parent = parent # Attempt to delete (no output of __del__ appears) of the child's parent
In this example, the variable names are destroyed, but you never see the __del__() method executed. Each of the two objects has an internal relation to each other. Therefore, there is no way to decrement the reference count to 0. To handle this scenario, a special loop-detection garbage collector runs from time to time. Eventually, the items will be claimed, but it's hard to predict when that will happen. If you want to force the garbage
Collection can call gc.collect(). The gc module has a variety of other functions related to the cyclic garbage collector and monitor memory. Due to the unpredictable timing of garbage collection, the __del__() method has some limitations. First, any exceptions propagated by __del__() are dumped to sys.stderr, but are otherwise ignored. Second, the __del__() method should avoid operations that involve things like acquiring locks or other resources. This could lead to a deadlock if __del__() is triggered unexpectedly in the middle of executing an unrelated function within the seventh inner circle of the signal and thread handling callback. If you need to define __del__(), keep it simple.
Weak References Sometimes objects stay alive when you'd rather see them die. An earlier example showed a date class with internal instance caching. A problem with this implementation is that there is no way to remove an instance from the cache. Therefore, the cache keeps growing over time. One way to work around this problem is to create a weak reference using the weak reference module. A weak reference is a way to create a reference to an object without increasing the reference count. To work with a weak reference, you need to add additional code to check if the referenced object still exists. Here is an example how to create a weak reference: >>> a = Count('Guido', 1000.0) >>> import weak reference >>> a_ref = weak reference.ref(a) >>> a_ref
>>>
Unlike a normal reference, a weak reference causes the original object to die. For example: >>> from to >>> to_ref
>>>
A weak reference contains an optional reference to an object. To get the actual object, you need to call the weak reference as a function with no arguments. This returns the object pointed to or None. For example: count = a_ref() if count is None: count.withdraw(10) # Alternative if count := a_ref(): count.withdraw(10)
Weak references are often used in conjunction with caching and other advanced memory management. Here is a modified version of the Date class that automatically removes objects from the cache when there are no more references: Import the weak Date ref class: _cache = { } @staticmethod def __new__(cls, year, month, day): selfref = Date . _cache .get((year,month,day)) if not selfref: self = super().__new__(cls) self.year = year self.month = month self.day = day Date._cache[year,month, day ] = ref.weak.ref(self) else: self = selfref() return self def __init__(self, year, month, day): pass
def __del__(self): del Date._cache[self.year,self.month,self.day]
This may take some study, but here's an interactive session showing how it works. Notice how an entry is removed from the cache once there are no more references to it: >>> Date._cache {} >>> a = Date(2012, 12, 21) >>> Date._cache {(2012 , 12 , 21): } >>> b = Date(2012, 12, 21) >>> a is b True >>> del to >>> Date._cache {(2012, 12, 21): } >> > del b >>> Date._cache {} >>>
As mentioned above, a class's __del__() method is only called when an object's reference count reaches zero. In this example, the first statement of a decrements the reference count. However, since there is another reference to the same object, the object remains in Date._cache. When the second object is deleted, __del__() is called and the cache is flushed. Weak reference support requires instances to have a mutable __weakref__ attribute. Custom class instances usually have such an attribute by default. However, built-in types and certain types of special data structures do not (e.g. named tuples, slotted classes, etc.). If you want to create weak references to these types, you can do so by defining variants with an added __weakref__ attribute:
clase wdict(dict): __slots__ = ('__weakref__',) w = wdict() w_ref = debilref.ref(w)
# Now work
The use of slots is used here to avoid unnecessary memory overhead and is discussed briefly.
Internal Object Representation and Attribute Binding The state associated with an instance is stored in a dictionary accessible as the __dict__ attribute of the instance. This dictionary contains the data that is unique to each instance. Here's an example: >>> a = Account('Guido', 1100.0) >>> a.__dict__ {'owner': 'Guido', 'balance': 1100.0}
New attributes can be added to an instance at any time as follows: a.number = 123456 # Add the 'number' attribute to a.__dict__ a.__dict__['number'] = 654321
Changes to an instance are always reflected in the local attribute __dict__ unless a property manages the attribute. Similarly, if you make changes to __dict__ directly, those changes will be reflected in the attributes. Instances are linked to their class by a special attribute __class__. The class itself is also just a thin layer on top of a dictionary found in its own __dict__ attribute. The class directory is where the methods can be found. For example: >>> a.__class__
>>> Account.__dict__.keys() dict_keys(['__module__', '__init__', '__repr__', 'deposit', 'withdraw', 'query',
'__dict__', '__weakref__', '__doc__']) >>> account.__dict__['withdraw']
>>>
Classes are linked to their base classes by a special attribute __bases__ which is a tuple of the base classes. The __bases__ attribute is for information only. The actual implementation of inheritance at runtime uses the __mro__ attribute, which is a tuple of all parent classes listed in the search order. This underlying structure is the basis for all operations that get, set, and remove the attributes of instances. Whenever an attribute is set with obj.name = value, the special method obj.__setattr__('name', value) is called. When removing an attribute with del obj.name, the special method obj.__delattr__('name') is called. The default behavior of these methods is to change or remove values from the local __dict__ of obj unless the requested attribute matches a property or descriptor. In this case, the set and remove operation is performed by the set and remove functions associated with the property. To look up attributes like obj.name, the special method obj.__getattribute__('name') is called. This method performs the lookup process to find the attribute, which typically involves inspecting the properties, looking up the local __dict__ attribute, looking up the class dictionary, and looking up the MRO. If this lookup fails, a final attempt is made to find the attribute by calling the class's obj.__getattr__('name') method (if defined). If this fails, an AttributeError exception is thrown. Custom classes can implement their own versions of attribute access functions if needed. For example, here is a class that restricts the settable attribute names: class Account: def __init__(self, owner, balance): self.owner = owner self.balance = balance def __setattr__(self, name, value) : if name is not in {'owner', 'balance'}:
raise AttributeError(f'No attribute {name}') super().__setattr__(name, value) # Beispiel a = Account('Guido', 1000.0) a.balance = 940.25 # Ok a.amount = 540.2 # AttributeError. Keine Attributmenge
A class that reimplements these methods must rely on the default implementation provided by super() to do the actual work of manipulating an attribute. This is because the default implementation takes care of the advanced features of classes, such as B. Descriptors and properties. If you don't use super(), you'll have to take care of these details yourself.
Proxies, Wrappers, and Delegation Sometimes classes implement a wrapper layer around another object to create some sort of proxy object. A proxy is an object that provides the same interface as another object, but is not connected to the original object through inheritance for various reasons. This is a bit different from composition, where an entirely new object is created from other objects, but the new object has its own unique set of methods and attributes. There are many real world examples where this could occur. For example, in distributed computing, the actual implementation of an object can reside on a remote server in the cloud. Clients interacting with this server can use a proxy that looks like the object on the server, but actually delegates all of its method calls to network messages in the background. A common implementation technique for proxies involves using the __getattr__() method. Here's a simple example: Class A: def spam(self): print('A.spam') def grok(self): print('A.grok')
def yow(self): print('A.yow') class LoggedA: def __init__(self): self._a = A() def __getattr__(self, name): print("Accessing", name) # Delegate to internal Una instancia devuelve getattr(self._a, nombre) # Ejemplo de uso a = LoggedA() a.spam() a.yow()
# prints "Accessing spam" and "A.spam" # prints "Accessing yow" and "A.yow"
Delegation is sometimes used as an alternative to inheritance. Here is an example: Class A: def spam(self): print('A.spam') def grok(self): print('A.grok') def yow(self): print('A.yow') Class B: def __init__(self): self._a = A() def grok(self): print('B.grok') def __getattr__(self, name):
return getattr(self._a, nombre) # Beispiel use b = B() b.spam() # -> A.spam b.grok() # -> B.grok b.yow() # -> A.yow
(redefined method)
In this example, it looks as if class B could inherit from class A and override a single method. This is the observed behavior, but inheritance is not used. Instead, B has an internal reference to an internal A. Certain methods of A can be redefined. However, all other methods are delegated via the __getattr__() method. The technique of pipelining the attribute lookup via __getattr__() is a common technique. Note, however, that this does not apply to operations assigned to special methods. For example, consider this class: class ListLike: def __init__(self): self._items = list() def __getattr__(self, name): return getattr(self._items, name) # Example a = ListLike() a.append ( 1) a.insert(0, 2) a.sort()
# Works # Works # Works
only one[0]
# Mistake. No __len__() # method fails. No __getimem__() method
Here the class successfully passes all standard list methods (e.g. list.sort(), list.append(), etc.) to an internal list. However, none of the standard Python operators work. In order for them to work, you would have to implement the required special method explicitly. For example:
class ListLike: def __init__(self): self._items = list() def __getattr__(self, nombre): Rückgabe getattr(self._items, nombre) def __len__(self): Rückgabe len(self._items) def __getitem__(self , índice): return self._items[índice] def __setitem__(self, índice, valor): self._items[índice] = valor
Reducing memory usage with __slots__ As mentioned earlier, instances store their data in a dictionary. Creating a large number of instances can result in large memory overhead. If you know the attribute names are fixed, you can specify the names in a special class variable called __slots__. Here's an example: class Account(object): __slots__ = ('owner', 'balance') ...
Slots is a definition hint that allows Python to optimize performance for both memory and execution speed. Instances of a class with __slots__ no longer use a dictionary to store instance data. Instead, a much more compact matrix-based data structure is used. In programs that create a large number of objects, using __slots__ can result in a significant reduction in memory usage and a small improvement in execution time. The only entries in __slots__ are instance data attributes. It contains no methods, properties, class variables, or other class-level attributes. These are essentially the same names that would normally appear as dictionary keys in the __dict__ instance.
Note that using __slots__ has a complicated interaction with inheritance. If a class inherits from a base class that uses __slots__ , it must also define __slots__ to store its own attributes (even if you don't add any) to take advantage of __slots__ . If you forget to do this, the derived class will run slower and use even more memory than if __slots__ were not used in either class. It is incompatible with multiple inheritance. If multiple base classes are specified, each with non-empty slots, you'll get a TypeError. __Key__
Using __slots__ can also break code that expects instances to have an underlying __dict__ attribute. Although this is often not the case for user code, helper libraries and other object support tools can be programmed to look at __dict__ for debugging, object serialization, and other operations. The presence of __slots__ does not affect the invocation of methods such as __getattribute__(), __getattr__(), and __setattr__() when redefined in a class. However, when implementing such methods, note that there is no longer a __dict__ instance attribute. Your implementation must take this into account.
Descriptors Normally, access to attributes corresponds to dictionary operations. When more control is needed, access to attributes can be routed through custom get, set, and delete functions. The use of properties has already been described. In fact, however, a property is implemented using a sub-construct called a descriptor. A descriptor is a class-level object that manages access to an attribute. By implementing one or more of the special methods __get__(), __set__(), and __delete__(), you can connect directly to the attribute access mechanism and customize these operations. Here's an example: class type: expected_type = object def __set_name__(self, cls, name): self.key = name
def __get__(self, instance, cls): if instance: return instance.__dict__[self.key] else: return self def __set__(self, instance, value): if not isinstance(value, self.expected_type): raise TypeError( f'Expected {self.expected_type}') instance.__dict__[self.key] = value def __delete__(self,instance): raise AttributeError("Attribut kann nicht gelöscht werden") class Integer(Typed): Expected_type = int class Float(Written ): erwarteter_Typ = float class String(Written): erwarteter_Typ = str # Verwendungsbeispiel: class Account: owner = String() balance = Float() def __init__(car, owner, balance): car.owner = owner own. Gleichgewicht = Gleichgewicht
In this example, the Typed class defines a descriptor where type checking is performed when an attribute is assigned and an error is thrown when an attempt is made to remove the attribute. The Integer, Float, and String subclasses specialize Type to match a specific type. Using these classes in another class (e.g. Account) causes these attributes to automatically call the
appropriate methods __get__(), __set__() or __delete__() on the accessor. For example: a = Account('Guido', 1000.0) b = a.owner # calls to Account.owner.__get__(a, Account) a.owner = 'Eve' # calls to Account.owner.__set__(a, ' Eva' ) from f.owner # Calls Account.owner.__delete__(a)
Descriptors can only be instantiated at the class level. It is not legal to create per-instance descriptors by creating descriptor objects inside __init__() and other methods. A descriptor's __set_name__() method is called after a class has been defined but before instances have been created, to inform a descriptor of the name used within the class. For example, the definition balance = Float() calls Float.__set_name__(Account, 'balance') to tell the class descriptor and name used. Descriptors with a __set__() method always take precedence over instance dictionary elements. For example, if a descriptor has the same name as a key in the instance dictionary, the descriptor takes precedence. In the Account example above, you can see that the descriptor applies type checking even though the instance dictionary contains a matching entry: >>> a = Account('Guido', 1000.0) >>> a.__dict__ {'owner' : 'Guido' , 'balance': 1000.0 } >>> a.balance = 'a lot' Trace (last last call): file "", line 1, in file "descrip.py", line 63, in __set__ increment TypeError( f' Expected {self.expected_type}') TypeError: Expected >>>
The __get__(instance, cls) method of a descriptor takes arguments for both the instance and the class. It's possible that __get__() can be called at the class level, in which case the instance argument is None. In most cases, __get__() returns the descriptor if no instance is provided. For example:
>>> Balance
>>>
A descriptor that implements only __get__() is sometimes called a method descriptor and has weaker binding than a descriptor with get/set capabilities. In particular, the __get__() method of a method descriptor is only called if there is no matching entry in the instance dictionary. The reason why it is called "method descriptor" is that this kind of descriptor is mainly used to implement different kinds of Python methods including instance methods, class methods and static methods. For example, here is an implementation skeleton showing how @classmethod and @staticmethod could be implemented from scratch (hint: actual implementation is more efficient): import types class classmethod: def __init__(self, func): self.__func__ = func # return a method connected with cls as the first argument def __get__(self, instance, cls): return types.MethodType(self.__func__, cls) class staticmethod: def __init__(self, func): self.__func__ = func # Return the base function def __get__(self, instance, cls): returns self.__func__
Since method descriptors only act when there is no matching entry in the instance dictionary, they can also be used to implement various forms of lazy attribute evaluation. For example: class Lazy: def __init__(self, func): self.func = func
def __set_name__(self, cls, nombre): self.key = nombre def __get__(self, instancia, cls): if instancia: valor = self.func(instancia) instancia.__dict__[self.key] = valor valor devuelto else: return self class Rectangle: def __init__(self, ancho, alto): self.width = ancho self.height = height area = Lazy(lambda self: self.width * self.height) perímetro = Lazy(lambda self: 2*self .Breite + 2*Selbst.Höhe)
In this example, area and scope are attributes that are calculated on demand and stored in the instance dictionary. Once computed, the values are returned directly from the instance dictionary. >>> r = Rectangle(3, 4) >>> r.__dict__ {'width': 3, 'height': 4 } >>> r.area 12 >>> r.perimeter 14 >>> r.__dict__ { 'Width': 3, 'Height': 4, 'Area': 12, 'Perimeter': 14 } >>>
Class definition process The definition of a class is a dynamic process. When you define a class using the class declaration, a new dictionary is created that serves as
Local class namespace. The body of the class is then executed as a script within this namespace. Finally, the namespace becomes the __dict__ attribute of the resulting class object. Any legal Python declaration in the body of a class is legal. Usually you only define functions and variables, but you are allowed to control the flow, imports, nested classes and everything else. For example, here is a class that conditionally defines methods: debug = class true Count: def __init__(self, owner, balance): self.owner = owner self.balance = balance if debug: import logging log = logging.getLogger (f '{__module__}.{__qualname__}') def deposit(self, amount): Account.log.debug('deposit %f', amount) self.balance += amount def withdrawal(self, amount): Account.log. debug('Payout %f', amount) self.balance -= amount else: def deposit(self, amount): self.balance += amount def withdrawal(self, amount): self.balance -= amount
In this example, a global variable debug is used to conditionally define methods. The __qualname__ and __module__ variables are predefined strings that contain information about the name of the class and the attached module. These can be used through declarations in the body of the class. In this example they are used to configure the logging system. There are probably cleaner ways to organize the above code, but the bottom line is you can put whatever you want in one class.
A critical point in class definition is that the namespace containing the contents of the class body is not variable space. Any name used within a method must be fully qualified. For example, using Account.log in the previous example. When a function like locals() is used in the body of a class (but not inside a method), it returns the dictionary used for the class's namespace.
Dynamic Class Creation Typically, classes are created with the class declaration, but this is not a requirement. As mentioned in the previous section, classes are defined by running the body of a class to populate a namespace. If you can fill a dictionary with your own definitions, you can create a class without using the class declaration. You can use types.new_class() to do this. Here is an example: import types # Some methods (not in a class) def __init__(self, owner, balance): self.owner = owner self.balance = balance def deposit(self, amount): self.balance -= amount def withdrawal(auto, amount): auto.balance += amount methods = { '__init__': __init__, 'deposit': deposit, 'withdrawal': withdrawal, } account = types.new_class('account', (), exec_body = Lambda ns: ns.update(methods))
# You now have a class a = Account('Guido', 1000.0) a.deposit(50) a.withdrawal(25)
The new_class function requires a class name, a tuple of base classes, and a callback function that is responsible for populating the class namespace. This callback takes the class's namespace dictionary as an argument. You should update this dictionary on the spot. The return value of the callback is ignored. Dynamic class creation can be useful when you want to create classes from data structures. For example, in the descriptor section, the following classes have been defined: class integer(typed): expected_type = int class float(typed): expected_type = float class String(typed): expected_type = str
This code is very repetitive. Maybe a data based approach would be better: typed_classes = [('Integer',int),('Float',float),('String',str),('Bool',bool),('Tuple', tupla), ] globals().update( (name, types.new_class(name, (Typed),), exec_body=lambda ns: ns.update(expected_type=ty))) for name, ty in typed_classes);
This example updates the global module namespace with dynamically created classes using types.new_class() . If you want
more classes, it puts an appropriate entry in the typed_classes list. Sometimes you will see that type() is used instead to create a class dynamically. For example: Account = type('Account', (), Methods)
This works, but doesn't take into account some of the more advanced class mechanics, such as metaclasses (which will be discussed shortly). In modern code, you should try using types.new_class() instead.
Metaclasses When you define a class in Python, the class definition itself becomes an object. Here is an example: class account: def __init__(self, owner, balance): self.owner = owner self.balance = balance def deposit(self, amount): self.balance += amount def draw(self, amount): self .balance -= amount is instance (account, object)
# -> True
If you think about it long enough, you'll realize that when Account is an object, something had to create it. This creation of the class object is controlled by a special type of class called a metaclass. Simply put, a metaclass is a class that creates instances of classes. In the example above, the metaclass that Account created is a built-in class called type. If you look at the account type, you'll see that it's an instance of type: >>> Account.__class__
>>>
It's a bit tricky, but it's similar to whole numbers. For example, if you type x = 42 and then look at x.__class__, you get int, the class that creates integers. type creates instances of types or classes. When you define a new class with the class declaration, several things happen. First a new namespace for the class is created. The body of the class is then executed in this namespace. Finally, the class name, base classes, and full namespace are used to create the class instance. The following code illustrates the low-level steps that are performed: # Step 1: Create the class namespace namespace = type.__prepare__('Account', ()) # Step 2: Execute the class body exec('' ' def __init__( self, owner, balance): self.owner = owner self.balance = balance def deposit(self, amount): self.balance += amount def withdrawal(self, amount): self.balance -= amount ' '', globals (), namespace) # Step 3: create the final class object Account = type('Account', (), namespace)
The definition process interacts with the type class to create the class namespace and the final class object. The type of use can be selected individually. In fact, a class can choose to be handled by a class of a different type by specifying a different metaclass. This is done with the metaclass keyword argument in the inheritance: class Account(metaclass=type): ...
If no metaclass is provided, the class statement examines the type of the first entry in the base class tuple (if any) and uses it as the metaclass.
So if you write the class Account(Object), the resulting Account class has the same type as the object (which is type). Note: Classes that do not specify a parent always inherit from the object, so this still applies. To create a new metaclass, define a class that inherits from type. Within that class, you override one or more methods used during the class creation process. Typically these are the __prepare__() method used to create the class namespace, the __new__() method used to instantiate the class, the __init__() method called after a class was created and the method __call__() method used to create new instances. The following example implements a metaclass that simply prints out the input arguments for each method so you can experiment: class mytype(type): # Namespace the class @classmethod def __prepare__(meta, clsname, bases): print( "Preparing: " , clsname, basen) return super().__prepare__(clsname, bases) # Create class instance after body executes @staticmethod def __new__(meta, clsname, bases, namespace): print( "Creating:", clsname, bases, namespace ) return super().__new__(meta, clsname, bases, namespace) # Initialize the class instance def __init__(cls, clsname, bases, namespace): print("Initializing: ", clsname, bases, namespace) super(). __init__(clsname, bases, namespace) # Create new instances of class def __call__(cls, *args, **kwargs): print("Instancing:", args, kwargs) return super().__call__(*args, ** kwargs) # example class Base(metaclass=mytype): pass
# Defining Base produces the following output # Preparation: Base() # Creation: Base() {'__module__': '__main__', '__qualname__': 'Base'} # Initialization: Base() {'__module__': ' __main__' , '__qualname__': 'Base'} b = Base() # Instantiation: () {}
A tricky facet of working with metaclasses is naming the variables and keeping track of the different entities involved. In the code above, the metaname refers to the metaclass itself. The name cls refers to a class instance created by the metaclass. Although not used here, the proper noun refers to a normal instance created by a class. Metaclasses are passed through inheritance. So if you have defined a base class to use another metaclass, all subclasses will use that metaclass as well. If you try this example, you'll see your custom metaclass at work: class Account(Base): def __init__(self, owner, balance): self.owner = owner self.balance = balance def deposit(self, amount): self .balance + = amount def payout(self, amount): self.balance -= amount print(type(account))
# ->
Metaclasses are mainly used in situations where you want to exercise extreme, low-level control over the class definition environment and the build process. Before proceeding, however, note that Python already provides a large number of functions for monitoring and modifying class definitions (e.g. the __init_subclass__() method), class decorators, descriptors, mixins, and so on. In most cases, you probably don't need a metaclass. That
With that in mind, the following examples cover some coding situations where a metaclass is the only viable solution. One use of a metaclass is to rewrite the contents of the class namespace before creating the class object. Certain properties of classes are set at definition time and cannot be changed later. One of these features is __slots__. As mentioned above, __slots__ is a performance optimization related to the memory layout of the instances. Here is a metaclass that automatically sets the __slots__ attribute based on the call signature of the __init__() method. import inspect class SlotMeta(type): @staticmethod def __new__(meta, clsname, bases, methods): if '__init__' in methods: sig = inspect.signature(methods['__init__']) __slots__ = tuple(sig.parameters) [1:] else: __slots__ = () methods['__slots__'] = __slots__ return super().__new__(meta, clsname, bases, methods) class Base(metaclass=SlotMeta): pass # example class Point(Base): def __init__(self, x, y): self.x = x self.y = y
In this example, the created Point class is automatically created with __slots__ of ('x', 'y'). The resulting instances of Point now receive memory savings without knowing that the slots are being used. It does not have to be stated directly. This kind of trick is not possible with class decorators or with __init_subclass__() because those functions only work
in a class after it is created. By then it's too late to apply the __slots__ optimization. Another use of metaclasses is to change the class definition environment. For example, duplicate definitions of a name during a class definition usually produce a silent error: the second definition overwrites the first. Suppose you wanted to catch this. Here's a metaclass that does that by defining another type of dictionary, the class's namespace: class NoDupeDict(dict): def __setitem__(self, key, value): if key in self: raise AttributeError(f'{key } already defined') super().__setitem__(key, value) class NoDupeMeta(type): @classmethod def __prepare__(meta, clsname, bases): return NoDupeDict() class Base(metaclass=NoDupeMeta): pass # example class SomeClass(Base ): def yow(self): print('Yow!') def yow(self, x): print('Different Yow!')
# Malfunction. Already defined
This is just a small sample of what is possible. For framework builders, metaclasses provide the ability to tightly control what happens during class definition, allowing classes to serve almost as a kind of domain-specific language. Historically, metaclasses were used to accomplish a variety of tasks that are now possible in other ways. In particular, the __init_subclass__() method can be used to address a variety of use cases where metaclasses were previously applied. This includes class registration.
with central logging, automatic method decoration and code generation.
Built-in Objects for Classes and Instances This section gives some details about the low-level objects used to represent types and instances. This information can be useful for low-level metaprogramming and code that needs to manipulate types directly. Table 1 shows the frequently used attributes of a cls object type: Table 1: Attributes of types
The cls.__name__ attribute contains a short class name. The cls.__qualname__ attribute contains a fully qualified name with additional information about the surrounding context (this can be useful when a class is defined inside a function or when you create a nested class definition). The cls.__annotations__ dictionary contains class-level type hints (if any). Table 2 shows the special attributes of an instance i:
Table 2: Instance Attributes
The __dict__ attribute normally stores all the data associated with an instance. However, when a custom class uses __slots__ , a more efficient internal representation is used and instances do not have a __dict__ attribute.
Final Words: Keep It Simple This chapter has covered a lot of information about classes and the various customizations and controls that are possible. However, when writing classes, keeping it simple is often a good strategy. Yes, you can use abstract classes, metaclasses, descriptors, class decorators, properties, multiple inheritance, mixins, patterns, and type hints. You could also just write a simple class. Chances are this course is good enough and everyone understands what you are doing. On the whole, it's often helpful to step back and consider some generally desirable qualities of code. First of all, readability counts, which is often compromised by the accumulation of too many layers of abstraction. Second, you should try to create code that is easy to watch and debug. Also don't forget to use the REPL. After all, making code testable is usually a good driver of good design. If your code is untestable or testing is too cumbersome, there might be a better way to organize your solution.
Modules and Packages Python programs are organized into modules and packages, which are loaded with the import statement. This chapter describes the module and package system in more detail. The focus is on programming with modules and packages, not the process of bundling code to implement in others. For the latter, you should refer to the latest documentation at https://packaging.python.org/tutorials/packaging-projects/.
Modules and the import Statement Any Python source file can be imported as a module. For example, consider the following code: # module.py a = 37 def func(): print(f'func says a is {a}') class SomeClass: def method(self): print('method does hello ' ) print('loaded module')
This file contains common programming elements, including a global variable, a function, a class definition, and an isolated declaration. This example is used to demonstrate some important (and sometimes subtle) features of module loading. To load a module, use the import >>> import module built module statement
Module.
For example:
>>> module.a 37 >>> module.func() func says a is 37 >>> s = module.SomeClass() >>> s.method() method says hello >>>
When you run an import, several things happen: 1. The source code of the module is found. If it cannot be found, an ImportError exception is thrown. 2. A new module object is created. This object serves as a container for all global definitions contained in the module. It is sometimes referred to as a "namespace". 3. The module source code is executed within the newly created module namespace. 4. If no errors occur, a name is created within the caller that points to the new module object. This name corresponds to the module name, but without the file suffix. For example, if the code is in a module.py file, the module name is module. Of these steps, the first (locating modules) is the most complicated. A common source of error for newbies is using the wrong filename or placing code in an unknown location. Modules must be defined by a properly named file that uses the same rules as variable names (letters, digits, and underscore) and must have a .py suffix. Example: module.py. Specify the name without the file suffix when importing. So it's an import module and not an import.py module (the latter produces a rather confusing error message). The file must also be placed in one of the directories found in sys.path. The remaining steps all relate to a module that defines an isolated environment for the code. All definitions that appear in a module remain isolated from that module. Therefore there is no risk that the names used for variables, functions and classes will clash with identical names in other modules.
When accessing definitions in a module, use a fully qualified name such as module.func(). executes all statements in the loaded source file. When a module performs a calculation or produces an output in addition to defining objects, you see the result. For example, the "Module loaded" message printed in the example. A common confusion with modules is accessing classes. A module always defines a namespace. So if a module.py file defines a SomeClass class, use the name module.SomeClass to refer to the class. import
To import multiple modules with a single import, use a comma-separated list of names like this: import socket, os, re
Sometimes the local name used to refer to a module is changed by using the as qualifier for import. For example: import module like mo mo.func()
This latter type of import is standard practice in the world of data analysis. For example, you often see: import numpy as np import pandas as pd import matplotlib as plt...
When a module is renamed, the new name is applied only to the context in which the import statement appeared. Other unrelated program modules can still load the module under its original name. Renaming the imported module can be a useful tool to manage different implementations of common functionality or to write extensible programs. Suppose you have two modules unixmodule.py and winmodule.py that have defined a func() function but include platform dependent implementation details. You could write code that selectively imports the module like this:
if platform == 'unix': import unixmodule as module elif platform == 'windows': import winmodule as module ... r = module.func()
Modules are first-class objects in Python. This means they can be assigned to variables, placed in data structures, and passed to a program as data. For example, in the example above, the module name is a variable that refers to the corresponding module object.
Module caching A module's source code is loaded and executed only once, regardless of how many times you use the import statement. Subsequent import statements bind the module name to the module object already created by the previous import. A common confusion for newbies arises when a module is imported into an interactive session, its source code is modified (eg to fix a bug) and a repeated import fails to load the modified code. This is due to the module cache. Python will never reload a previously imported module, even if the underlying source code has been updated. You can find the cache of all currently loaded modules in sys.modules, a dictionary that maps module names to module objects. The content of this dictionary is used to determine whether or not the import loads a new copy of a module. Removing a module from the cache forces it to be reloaded on the next import statement. However, this is for reasons explained in the module reloading section beginning on S. X. Sometimes import is used inside a function like this: def f(x): import math return math.sin(x) + math.cos(x)
At first glance, such an implementation seems painfully slow - a module is loaded with each call. In fact, the cost of importing is minimal: there is the hassle of looking up a single dictionary, since Python immediately finds the module in the cache. The main objection to having the import inside a function is one of style: it's more common to list all module imports at the top of a file where they can be easily seen. On the other hand, if you have a specialized function that is infrequently called, placing the function's import dependencies inside the function body speeds up program loading. In this case, it would only load the required modules when they are really needed.
Import Selected Module Names The from module import name statement is used to load specific definitions within a module into the current namespace. It is identical to import, except that instead of placing a name pointing to the newly created module namespace, it places references to one or more of the objects defined in the module in the current namespace: from module import func namespace func () module. Function ( )
# Import module and set func to current # Call module.func() # Fail. Naming error: module
The from statement accepts a comma-separated list of names if you want multiple definitions. For example: from the module's import function, SomeClass
Semantically, the import module's name declaration makes a copy of the name from the module cache to the local namespace. That is, Python first runs the import module in the background. It is then mapped from the cache to a local name, e.g. E.g. name = sys.modules['module'].name. A common misconception is that importing way is more efficient because it might only load part of a module. That's not the case. Each time a module is loaded, the entire module is loaded and cached.
Importing functions using the import form does not change their scoping rules. When functions look for variables, they only look in the file in which the function was defined, not in the namespace into which a function is imported and called. For example: >>> from module import func >>> a = 42 >>> func() func says that a is 37 >>> func.__module__ 'module' >>> func.__globals__['a'] 37 > >>
A related confusion concerns the behavior of global variables. For example, consider this code that imports both func and a global variable a: from module import a, func a = 42 func() print(a)
# Change variable # Print "func says that a is 37" # Print "42"
Variable assignment in Python is not a memory operation. That is, the name a in this example does not represent a type of box in which to store a value. The initial import associates the local name a with the original object module.a. However, the reassignment after a = 42 moves the local name a to a completely different object. At this point, a is no longer tied to the value in the imported module. Because of this behavior, it is not possible to use the from statement in a way that makes variables behave like global variables in a language like C. If you want to have changeable global program parameters in your program, put them in a module and use the module name explicitly with the import statement (e.g. use module.a). The asterisk (*) wildcard character is sometimes used to load all definitions into a module except those that begin with an underscore. Here is an example:
# Load all definitions in the current namespace from the module import *
The from module import * statement can only be used at the top level of a module. In particular, it is illegal to use this form of import within the body of a function. Modules can precisely control the set of names imported by import * by defining the __all__ list. Here is an example:
Module
# module: module.py __all__ = [ 'func', 'SomeClass' ] a = 37
# not exported
Function def(): ...
# exported
Class SomeClass: ...
# exported
In practice, using from module import * is frowned upon. Excessive use can cause a lot of confusion and pollute the local namespace. For example: from math import * from random import * from statistic import * a = gauss(1.0, 0.25)
# of which module???
It's usually much better to be explicit with names: import from math sin, cos, sqrt from random import gauss import from stats mean a = gauss(1.0,0.25)
circular imports
A particular problem arises when two modules import each other. For example, suppose you have two files like this: # ----------------------# moda.py import modb def func_a(): modb.func_b () base class: pass # ----------------------------------------# modb.py import mode def func_b(): print( 'B') Class Child(mode.Base): pass
There is a strange import job dependency in this code. In particular, using import modb first works fine, but if you ever use import modb first, it explodes with an error that moda.Base is undefined. To understand what is happening, you need to follow the flow of control. import moda starts the execution of the moda.py file. The first statement it encounters is import modb. Therefore, control changes to modb.py. The first statement in this file is import fashion. Instead of entering a recursive loop, the module cache ensures that import and control continue with the next statement in modb.py. That's good: circular imports don't cause Python to crash or enter a new dimension of spacetime. However, at this point in execution, the mod module has only been partially evaluated. When the control reaches the class Child(mode.Base) statement, it explodes. The required base class has not yet been defined.
One way to work around this problem is to move the import-modb statement to a different location. For example, you could move the import to func_a(), where the definition is actually needed: # moda.py def func_a(): import modb modb.func_b() class Base: pass
You can also move the import later in the file: # moda.py def func_a(): modb.func_b() class Base: pass import modb
# Must be after Base is defined
Both solutions should cause a stir in a code review. Most of the time, module imports don't appear at the end of a file. The presence of circular imports almost always indicates a problem in the organization of the code. A better way to handle this might be to move the base definition to a separate base.py file and rewrite modb.py like this: # modb.py import base def func_b(): print('B' ) class Child( base . base): pass
Reload and download the module
There is no reliable support for reloading or unloading previously imported modules. Although you can remove a module from sys.modules, doing so does not flush a module from memory. This is because cached module object references still exist in other modules that imported this module. Additionally, when instances of classes are defined in the module, those instances contain references to their class object, which in turn contains references to the module in which it was defined. The fact that module references exist in many places generally makes it impractical to reload a module after making changes to its implementation. For example, removing a module from sys.modules and reloading it with import does not retroactively change all previous references to the module used in a program. Instead, it contains a reference to the new module created by the last import statement, and a series of references to the old module created by imports elsewhere in the code. This is rarely what you want, and it's never safe to use in reasonable production code unless you can carefully control the overall execution environment. There is a reload() function to reload a module, found in the importlib library. Pass it the already loaded module as an argument. For example: >>> import module >>> import importlib >>> importlib.reload(module) Module loaded
>>>
It works by loading a new version of the module's source code and then running it on the existing module's namespace. This is done without deleting the old namespace. For visualization, it's literally the same as writing new source code on top of the old code without restarting the interpreter. load()
If other modules had previously imported the reloaded module using a standard import statement as an import module, reloading will magically show them the updated code. However, there are still many dangers lurking. First, reloading does not reload any of the modules that could be imported
for the newly loaded file. It's not recursive, it only applies to the single module passed to reload(). Second, if modules have used the module import name import form, those imports will not see the effect of the reload. Finally, if classes have been instantiated, reloading does not update their underlying class definition. In fact, you now have two different definitions of the same existing class in the same program (the old one used for all existing instances at reload time and the new one used for new instances). This is almost always confusing. Finally, it should be noted that C/C++ extensions for Python are in no way safe to download or reload. Support for this is not provided and may be prohibited by the underlying operating system anyway. So your best bet for this scenario is to restart the Python interpreter process.
Compiling Modules When modules are first imported, they are compiled into interpreter bytecode. This code is written to a .pyc file in a special __pycache__ directory. This directory is usually in the same directory as the original .py file. If the same import is performed again in a different execution of the program, the compiled bytecode is loaded instead. This speeds up the import process significantly. Bytecode caching is an automatic process that you hardly need to worry about. The files are automatically regenerated if you change the original source code. It just works. That being said, there are still a few reasons to be aware of this caching and compilation process. First, Python files are sometimes (often accidentally) installed in an environment where users do not have permissions from the operating system to create the necessary __pycache__ directory. Python will continue to work, but each import now loads the original source code and compiles it to bytecode. Loading the program will be much slower than necessary. Likewise, when deploying or packaging a Python application, it can be beneficial to ensure that it also includes the compiled bytecode. This can speed up the program start considerably.
The other good reason to learn about module caching is to learn about programming techniques that interfere with it. Advanced metaprogramming techniques involving dynamic code generation and the exec() function negate the benefits of bytecode caching. A notable example is the use of data classes: from dataclasses import dataclass @dataclass class Point: x: float y: float
Data classes work by generating method functions as snippets of text and executing them via exec(). The import system does not cache any of these generated codes. With a single class definition, you won't notice it. However, if you had a module that consisted of 100 data classes, you might find that it imports almost 20 times slower than a comparable module where you just wrote the classes in the normal, albeit less compact, way.
The Module Search Path When importing modules, the interpreter searches the list of directories in sys.path. The first entry in sys.path is usually an empty string '', pointing to the current working directory. Alternatively, if you are running Python in a script, the first entry in sys.path is the directory where the script resides. The other entries in sys.path generally consist of a combination of directory names and .zip archive files. The order in which entries are listed in sys.path determines the search order used when importing modules. To add new entries to the search path, add them to this list. This can be done directly or by setting the PYTHONPATH environment variable. Example: On Unix bash $ env PYTHONPATH=/some/path python3 script.py
ZIP archives provide a convenient way to bundle a collection of modules into a single file. Suppose you created two modules, foo.py and bar.py, and put them in a mymodules.zip file. The file could be added to the Python search path as follows:
import sys sys.path.append('mymodules.zip') import foo, bar
Specific locations within the directory structure of a .zip file can also be used for the path. Also, .zip files can be combined with normal pathname components. Here is an example: sys.path.append('/tmp/modules.zip/lib/python')
A ZIP file does not need to have a .zip file suffix to use it. Historically, it was also common to find so-called .egg files in the path. The .egg files are from an early Python package management tool called setuptools. However, an .egg file is nothing more than a regular .zip file or directory with some extra metadata added (e.g. version number, dependencies, etc.).
Running as the main program Although this section covers the import statement, Python files are often run as the main script. Example: % python3 module.py
Each module defines a variable, __name__, that contains the name of the module. Your code can examine this variable to determine the module in which they are running. The interpreter's top-level module is called __main__. Programs specified on the command line or entered interactively run inside the __main__ module. Sometimes a program can change its behavior depending on whether it's imported as a module or running in __main__. For example, a module may contain test code that runs when the module is used as the main program, but does not run when the module is simply imported by another module. # Check if it's running as a program if __name__ == '__main__': # Yes. Otherwise, it runs as the main script:
# No, it must have been imported as a module declaration
Source files intended to be used as libraries use this technique if they also contain some sort of optional test or sample code. For example, if you are developing a module, you could put debugging code to test your library's functions in an if statement, as shown, and run Python in your module as the main program. This code will not run for users who import your library. Once you've created a directory of Python code, you can run the directory as long as it contains a special __main__.py file. For example, if you create a directory like this: myapp/ foo.py bar.py __main__.py
You can run python on it by typing python3 myapp . Execution begins with the __main__.py file. This also works if you convert the myapp directory to a zip file. If you type python3 myapp.zip it will look for a top level __main__.py file and run it if found.
Packages In all but the simplest programs, Python code is organized into a "package." A package is a collection of modules grouped under a common top-level name. This grouping helps resolve conflicts between module names used in different applications and keeps your code separate from others' code. A package is defined by creating a directory with a unique name and placing an initially empty __init__.py file in that directory. It then places additional Python files and subpackages in that directory as needed. For example, a package might be organized like this: graphics/ __init__.py primitive/
__init__.py lines.py fill.py text.py ... graph2d/ __init__.py plot2d.py ... graph3d/ __init__.py plot3d.py ... formate/ __init__.py gif.py png.py tiff. in jpeg.py
The import statement is used to load modules from a package in the same way as for simple modules, except that they now have longer names. For example: # Full path import graphics.primitive.fill ... graphics.primitive.fill.floodfill(img, x, y, color) # Load a specific submodule of graphics.primitive import fill ... fill.floodfill(img , x, y, color) # Load a specific function from a submodule of graphics.primitive.fill import floodfill ... floodfill(img, x, y, color)
Whenever part of a package is imported for the first time, the code in the __init__.py file is executed first. As already mentioned, this file can be empty, but it can also contain code to perform package-specific initializations. When a deeply nested submodule is imported, any __init__.py files found while traversing the directory structure are executed. Therefore, the import graphics.primitive.fill statement would first execute the __init__.py file in the graphics/ directory, followed by the __init__.py file in the primitive/ directory. A critical feature of the import statement is that all module imports require an absolutely or fully qualified package path. This includes import statements used within a package itself. Suppose the graphics.primitive.fill module wants to import the graphics.primitive.lines module. A simple statement like import rows will not work; You will get an ImportError exception. Instead, you should fully qualify the import like this: # graphics/primitives/fill.py # Fully qualified submodule import from the graphics.primitives import lines
Unfortunately, writing such a full package name is tedious and brittle. For example, sometimes it makes sense to rename a package (you might want to rename it so you can use different versions). If the package name is hardcoded, you can't do this. So it's better to use a package-relative import like this: # graphics/primitives/fill.py # package-relative import of . Import rows
Here this . used in the declaration of . The import lines point to the same directory as the import module. Therefore, this statement looks for the lines of a module in the same directory as the fill.py file. Relative imports can also specify submodules that are in different directories of the same package. If for example the module
I wanted to import graphics.primitive.lines, I would use a statement like this: graphics.graph2d.plot2d
# graphics/graph2d/plot2d.py from ..primitive import lines
Here the .. goes up one directory level and the primitive down into another subpackage directory. Relative imports can only be specified using the import token form of the from module of the import statement. Therefore, statements like import ..primitive.lines or import .lines are a syntax error. Also, the symbol must be a simple identifier. So a statement like from .. import native.lines is also illegal. Finally, relative imports can only be used within a package; It is illegal to use a relative import to refer to modules that are simply in a different directory in the filesystem.
Executing a Package Submodule as a Script Code organized in a package has a different runtime environment than simple scripts. There is an appended package name, submodules, and the use of relative imports (which only work within a package). One feature that no longer works is the ability to run Python directly from a package source file. Suppose you are working on the graphics/graph2d/plot2d.py file and added some test code at the end: # graphics/graph2d/plot2d.py from ..primitive import lines, text class Plot2D: ... if __name__ == '__main__ ': print('Testing Plot2D') p = Plot2D() ...
If you try to run it directly, you'll get a crash complaining about the relative import statements:
bash $ python3 graphics/graph2d/plot2d.py Trace (last last call): file "graphics/graph2d/plot2d.py", line 1, of ... primitive import line, text ValueError: relative import attempt beyond the upper level party of the package $
Also, you can't go to the package directory and run it there: bash $ cd graphics/graph2d/ bash $ python3 plot2d.py Trace (last last call): file "plot2d.py", line 1, from .. primitive import line, text ValueError: relative import attempt beyond top-level package bash $
To run a submodule as the main script, you must use the -m option on the interpreter. For example: bash $ python3 -m graphics.graph2d.plot2d testing Plot2D bash $
specifies a specific module or package as the main program. Python runs the module with the appropriate environment to ensure the imports work. Many of Python's built-in packages have "secret" functions that can be used via -m. One of the most popular is using python3 -m http.server to run a web server from the current directory. You can provide similar functionality with your own packages. If the name of python -m name matches a package directory, Python will check if a __main__.py exists in that directory and run it as a script. -Subway
Package Namespace Control The primary purpose of a package is to serve as a container for high-level code. Sometimes users import the top level name and nothing else. Example: import graphics
This import does not specify a specific submodule. It also does not expose any other part of the package. For example, you will find that code like this fails: import graphics graphics.primitive.fill.floodfill(img,x,y,color)
# ¡Falla!
If only a top-level package import is provided, the only file that is imported is the associated __init__.py file. In this example it is the graphics/__init__.py file. The main purpose of an __init__.py file is to create and/or manage the contents of the top-level package namespace. Often this involves importing selected functions, classes, and other objects from lower-level submodules. Suppose the graphics package in this example consists of hundreds of low-level functions, but most of these details have been encapsulated in a handful of high-level classes. The __init__.py file might choose to expose only these classes: # graphics/__init__.py from .graph2d.plot2d import Plot2D from .graph3d.plot3d import Plot3D
With this __init__.py file, the names Plot2D and Plot3D would appear at the top level of the package. So a user could use these names as if the graphics were a simple module like this: from graphics import Plot2D plt = Plot2D(100, 100) plt.clear() ...
This is often much more convenient for the user since they don't need to know how you actually organized your code. In a way, you put a top layer of abstraction over the structure of your code. Many of the so-called "modules" in the Python standard library are built this way. For example, the popular collections module is actually a package. The collections/__init__.py file consolidates the definitions from a few different places and presents them to the user as a single consolidated namespace.
Package export control One organizational issue concerns the low-level interaction between an __init__.py file and submodules. For example, different submodules of a package know which symbols to export to the top level. However, the real work of exporting the symbols is done in __init__.py. This makes it confusing to read a package's code and fully understand its overall organizational structure. To manage this better, package submodules often declare an explicit list of exports by defining an __all__ variable. This is a list of names that must go up one level in the package namespace. For example: # graphics/graph2d/plot2d.py __all__ = ['Plot2D'] class Plot2D: ...
The associated __init__.py file then imports its submodules with an *import as follows: # graphics/graph2d/__init__.py # Load only the names explicitly listed in __all__ variables from .plot2d import * # Pass __all__ to the next level (if want ) __all__ = plot2d.__all__
This promotion process continues up to the top-level package __init__.py. Example: # graphics/__init__.py from .graph2d import * from .graph3d import * # Consolidated Exports __all__ = [ *graph2d.__all__,
*graph3d.__all__ ]
The gist is that each component of a package explicitly sets its exports using the __all__ variable. The __init__.py files then propagate the exports accordingly. In practice it can get complicated, but this approach avoids the problem of including specific export names in the __init__.py file. If a submodule wants to export something instead, its name appears in only one place, the __all__ variable. It then magically propagates to its rightful place in the package namespace. It's worth noting that while using * imports in user code is frowned upon, it is common practice in the __init__.py package files. The reason it works in packages is that it's generally much more controlled and restrained as it's driven by the contents of the __all__ variable and not some careless "let's import everything" attitude.
Package Data Sometimes a package contains (inside it) data files that need to be loaded (as opposed to the source code). Within a package, the __file__ variable gives you information about the location of a specific source file. However, packages are complicated. They can be bundled in ZIP compressed files or loaded from unusual environments. The __file__ variable itself can be unreliable (or even undefined). Therefore, loading a data file is often not a matter of simply passing a filename to the open() built-in function and reading some data. To read the data from the package use pkgutil.get_data(package, your package looked like this:
Resource).
Assume
mycode/ resources/ data.json __init__.py spam.py yow.py
To load the data.json file from the spam.py file, you would do the following:
# mycode/spam.py import pkgutil import json def func(): rawdata = pkgutil.get_data(__package__, 'resources/data.json') textdata = rawdata.decode('utf-8') data = json.loads(textdata ) imprimir (Daten)
The get_data() function tries to find the given resource and returns the content as an unformatted byte string. Any other decoding (e.g. bytes to text) and interpretation is up to you. In the example, the data is decoded and parsed from the JSON encoding into a Python dictionary. A package is not a good place to put huge data files. Reserve package resources for configuration data and other miscellaneous items required to make your package work.
Module Objects Modules are first-class objects. Table 1 lists the attributes commonly found in modules. Table 1: Module Attributes
The __dict__ attribute is a dictionary representing the module namespace. Everything defined in the module is placed here. The __name__ attribute is commonly used when writing screenplays. A check like if __name__ == '__main__' is often done to see if a file is running as the main program. The __package__ attribute contains the name of the attached package, if any. When set, the __path__ attribute is a list of directories to search to find the package's submodules. It usually contains a single entry consisting of the directory where a package resides. Sometimes major frameworks manipulate __path__ to include additional directories to support plugins and other advanced features. Not all attributes are available in all modules. For example, built-in modules may not have a set of __file__ attributes. Likewise, package-related attributes are not set for top-level modules (that are not contained in a package). The __doc__ attribute is the module's documentation string (if any). This is a string that appears as the first declaration in a file. The __annotations__ attribute is a module-level annotation dictionary. These look something like this: # mymodule.py ''' The doc string '''
# write notes (placed in __annotations__) x: int y: float ...
As with other type hinting, module-level hinting does not change any part of Python's behavior. You also don't actually define the variables. It is pure metadata that other tools can view if desired.
Accessing Module Attributes Discuss the getattr() module.
Implementing Python Packages The final frontier for modules and packages is the problem of sharing code with others. A broad topic that has been the focus of continuous active development for many years. Rather than documenting a process that is likely outdated by the time you're reading this, you should turn your attention to the documentation at https://packaging.python.org/tutorials/packaging-projects/. For the purposes of day-to-day development, the most important thing is to keep your code isolated as a separate project. All your code should be contained in an appropriate package. Try to give your package a unique name so it doesn't conflict with other possible dependencies. Look at the Python package index at https://pypi.org to help choose a name. When structuring your code, try to keep things simple. As shown, there are many very nifty things that can be done with the module and package system. There's a time and place for it, but it shouldn't be your starting point. With absolute simplicity in mind, the most minimalistic way to distribute pure Python code is to use the Configuration Tools module. Suppose you have written code and are in a project that looks like this: project-spam/ README.txt
Documentación.txt spam/ __init__.py foo.py bar.py runpam.py
# A code bundle
# A script to run as: python runningspam.py
To create a distribution, create a setup.py file in the top-level directory (spam-project/ in this example). In this file place the following code: # setup.py from setuptools import setup setup(name = "spam", version = "0.0" packages = ['spam'], scripts = ['runspam.py'], )
For the setup() call, packages is a list of all package directories and scripts is a list of script files. Any of these arguments can be omitted if your software has no appropriate components (ie no scripts). name is the name of your package and version is the version number as a string. The setup() call supports a variety of other parameters that provide various metadata about your package. See the full list at . . . Creating a setup.py file is enough to create a source distribution of your software. Enter the following shell command to perform a source distribution: bash $ python setup.py sdist ... bash $
This will create an archive file like spam-1.0.tar.gz or spam-1.0.zip in the spam/dist directory. This is the file you would give others to install your software. To install, a user can use a command like pip. For example: shell $ python3 -m pip install spam-1.0.tar.gz
This installs the software into the local Python distribution and makes it available for general use. The code is usually installed in a
named directory in the Python library. To find the exact location of this directory, check the value of sys.path. Scripts are typically installed in the same directory as the Python interpreter on UNIX-based systems, or in a "scripts" directory on Windows (C:\Python38\Scripts in a typical installation). Site Packages
If the first line of a script begins with #! and contains the text "python", the installer rewrites the line to point to the local Python installation. So if you've written scripts coded in a specific Python location, such as B. /usr/local/bin/python, they should still work if installed on other systems where Python is in a different location. It should be emphasized that the use of configuration tools described here is absolutely minimal. Larger projects can contain C/C++ extensions, complicated package structures, examples and more. It is beyond the scope of this book to cover all the tools and ways to implement such code. You should check various resources at https://python.org and https://pypi.org for the latest advice.
The Final Word: Start with a Package When you first start a new program, it's easy to start with a single, simple Python file. For example, you can write a script called "program.py" and start with it. Although this works well for throwaway programs and short tasks, eventually your "script" will start developing functions. Finally, you may want to split it into multiple files. This is where problems often arise. Against this background, it might make more sense to get into the habit of starting all programs as a package from the start. For example, instead of creating a file named program.py, you might want to create a program package directory named program: program/ __init__.py __main__.py
Paste your startup code into __main__.py and run your program with a command like python -m program. As you add more code, add new files to your package and use package-relative imports. An advantage of using a package is that all your code stays isolated. You can name the files whatever you want, and you don't have to worry about collisions with other packages, standard library modules, or code written by your colleagues. Although setting up a package requires a bit more work initially, it will likely save you a lot of headaches later on.
Input and output Input and output (I/O) is part of all programs. This chapter describes the basics of Python I/O, including data encoding, command-line options, environment variables, file I/O, and data serialization. Particular attention is paid to programming techniques and abstractions that promote proper handling of I/O. At the end of this section you will find an overview of the common I/O-related standard library modules.
Data Representation The main problem with I/O is the outside world. In order to communicate with it, the data must be adequately represented so that it can be manipulated. At the most basic level, Python works with two basic data types; Bytes representing raw, uninterpreted data of any type and text representing Unicode characters. Two built-in types bytes and bytearray are used to represent bytes. bytes is an immutable string of integer byte values. bytearray is a mutable byte array that behaves like a combination of a byte string and a list. Because of its mutability, it lends itself to the incremental construction of byte pools, as is often the case when assembling data from chunks. The following example illustrates some features of bytes and bytearray: # Specify a byte literal (hint: prefix b') a = b'hello' # Specify bytes from a list of integers b = bytes([0x68, 0x65 , 0x6c, 0x6c , 0x6f]) # Create and fill a byte array from parts c = bytearray() c.extend(b'world') # d = 'world' c.append(0x21) # d = 'world!'
# access byte values print(a[0]) # --> print 104 for x in b: print(x)
# Departures 104 101 108 108 111
Accessing individual elements of byte and bytearray objects produces integer byte values, not single-character byte strings. This is different from strings and is a common usage error. Text is represented with data type str and stored as an array of Unicode code points. For example: d = 'Hello' len(d) print(d[0])
# Text (Unicode) # --> 5 # print 'h'
Python maintains a strict separation between bytes and text. There is never an automatic conversion between the two types, mixed-type comparisons evaluate to false, and any operation that mixes bytes and text throws an error. For example: a = b'hello' b = 'hello' c = 'world'
#bytes #text #text
imprimir(a == b) d = a + c e = b + c
# -> False # TypeError: cannot concatenate str with bytes # -> 'helloworld' (ambas son cadenas)
When performing I/O, you must ensure that you are working with the correct type of data representation. When editing text, use text strings. When manipulating binary data, use bytes.
Encoding and decoding text When working with text, any data read from the input must be decoded and any data written to the output must be encoded. For explicit conversion between text and
bytes there are methods encode(text[, error]) and decode(bytes[, error]) for the text and bytes object respectively. For example: a = 'Hello' b = a.encode('utf-8')
# Text # Encode in bytes
c = b'Welt' d = c.decode('utf-8')
# Bytes # In the Text decoder
Both encode() and decode() require the name of an encoding such as "utf8" or "latin-1". The codes in Table 1 are common. Table 1: Common encodings
In addition, the encoding methods accept an optional error argument that specifies behavior in the presence of encoding errors. errors is one of the values in Table 2. Table 2 Error handling options
The backslashreplace and xmlcharreplace error policies render nonprintable characters in a format that makes them easier to display as plain ASCII text or as an XML character reference. This can be useful when debugging. The surrogateescape error-handling policy allows degenerate byte data (that is, data that does not follow expected encoding rules) to survive a round-trip decode/encode cycle unscathed, regardless of the encoding of the text.
be used. In particular, s.decode(enc, 'surrogateescape').encode(enc, 'surrogateescape') == s. This round-trip data preservation is useful for certain types of system interfaces where text encoding is expected, but due to problems that beyond Python's control cannot be fully guaranteed. Instead of discarding/destroying data due to bad encoding, Python embeds it "as is" using surrogate encoding. The following example shows the behavior for an incorrectly encoded UTF-8 string: >>> a = b'Spicy Jalapeño\xf1o' # Invalid UTF-8 >>> a.decode('utf-8') Traceback (last call last) : File "", line 1, at UnicodeDecodeError: codec 'utf-8' cannot decode byte 0xf1 at position 12: invalid continuation byte >>> a.decode('utf-8', 'surrogateescape') ' Spicy Jalapeño \udcf1o ' >>> # Encode the resulting string back into bytes >>> _.encode('utf-8', 'surrogateescape') b'Spicy Jalapeño\xf1o' >>>
Text and byte formatting A common problem when working with text and byte strings is string conversion and formatting. For example, converting a floating-point number to a string of a specified width and precision. To format a single value, you can use the format() function: x = 123.456 format(x, '0.2f') format(x, '10.4f') format(x, '>>
An interesting side effect of this approach is that it "offloads" the actual I/O operations that need to be performed to fetch the input data. In particular, the implementation of line_receiver() does not contain any I/O operations. This means it can be used in different contexts. For example with sockets: r = line_receiver() data = None while True: while not (line:=r.send(data)): data = sock.recv(8192) # processing the line...
oder mit Archiven: r = line_receiver() data = None while True: while not (line:=r.send(data)): data = file.read(10000) # Procesar la linea...
or even in asynchronous code:
async def reader(ch): r = line_receiver() data = None while True: while not (line:=r.send(data)): data = await ch.receive(8192) # Procesar la línea ...
Object Serialization Sometimes it is necessary to serialize the representation of an object so that it can be transmitted over the network, stored in a file, or in a database. One way is to convert the data to a standard encoding like JSON, XML, etc. However, a common Python-specific data serialization format is "pickle". The pickle engine serializes an object into a stream of bytes that can be used to recreate the object at a later time. The pickle interface is simple and consists of a dump() and a load() operation. For example, the following code writes an object to a file: import pickle obj = SomeObject() with open(filename, 'wb') as file: pickle.dump(obj, file) # Save object to f
To restore the object, use the following code: with open(filename, 'rb') as file: obj = pickle.load(file) # Restore the object
The data format used by Pickle has its own dataset framework. Thus, a sequence of objects can be stored by performing a series of dump() operations one after the other. To restore these objects, simply use a similar sequence of load() operations. It is common for network programming to use pickle to construct byte-coded messages. To do this, use dumps() and load() instead. Instead of
Reading/writing data to a file, these functions work with byte strings. obj = SomeObject() # convert an object to bytes data = pickle.dumps(obj) ... # convert bytes back to an object obj = pickle.loads(data)
Normally, custom objects don't need to do anything else to work with Pickle. However, certain types of objects cannot be stained. These are usually objects that contain runtime state, such as open files, threads, closures, generators, and so on. To handle these complicated cases, a class can define special methods __getstate__() and __setstate__(). The __getstate__() method, if defined, is called to create a value that represents the state of an object. The return value from __getstate__() is usually a string, tuple, list, or dictionary. The method __setstate__() receives this value when unpainting and should restore the state of an object from it. When coding an object, Pickle does not include the underlying source code itself. Instead, it encodes a name reference to the definition class. When stripping, this name is used to perform a source code lookup on the system. It is important that the recipient of a strip already has the correct source code installed for the strip to work. It's also important to emphasize that Pickle is inherently insecure: removing untrusted data is a known vector for remote code execution. Therefore, Pickle should only be used if you can fully secure the runtime environment.
Locking of Operations and Concurrency A fundamental aspect of I/O is the concept of "locking". I/O is inherently connected to the real world. It's often a matter of waiting for the input or devices to be ready. For example, imagine that code reading data over the network could perform a receive operation on a socket like this: data = sock.recv(8192)
When executed, this statement may return immediately if data is available. However, if it isn't, it halts and waits for the data to arrive. That blocks. Nothing else happens while the program is locked. If you're just writing a data analysis script or a simple program, you don't have to worry about crashes. However, if you want your program to do something else while an operation is blocked, you need to take a different approach. This is the fundamental problem of concurrency: making a program work on more than one thing at a time. A common example is the problem of a program reading on two or more different network sockets at the same time: def reader1(sock): while (data := sock.recv(8192)): print('reader1 got:', data) def reader2(sock): while (data := sock.recv(8192)): print('reader2 got:', data) # Problem: How to run reader1() and reader2() # at the same time ?
The rest of this section describes a few different approaches to solving this problem. However, it is not intended to be a complete tutorial on concurrency. To do this, you need to consult other resources.
Non-blocking I/O One approach to avoiding blocking is to use so-called "non-blocking" I/O. This is a special mode that needs to be activated. For example on a socket: sock.setblocking(False)
Once enabled, an exception is now thrown if an operation has been blocked. Example: test: data = sock.recv(8192)
except BlockingIOError as e: # No data available...
In response to a BlockingIOError, the program may decide to work on something else. It could try the I/O operation again later to see if any data arrived. For example, you can read in two sockets at the same time: def reader1(sock): try: data = sock.recv(8192) print('reader1 got:', data) except BlockingIOError: pass def reader2(sock) : try: data = sock.recv(8192) print('reader2 got:', data) except BlockingIOError: pass def run(sock1, sock2): sock1.setblocking(False) sock2.setblocking(False) while True: reader1( socket1 ) reader2 ( sock2)
In practice, relying solely on non-blocking I/O is cumbersome and inefficient. For example, the core of this program is the run() function at the end. It runs in a fast busy loop as it keeps trying to read from the sockets. This works, but it's not a good design.
I/O Scanning Instead of relying on exceptions and twists, I/O channels can be polled to see if data is available. The selection module or selectors can be
used for this purpose. For example, here is a slightly modified version of the run() function. from selectors import DefaultSelector, EVENT_READ, EVENT_WRITE def run(sock1, sock2): selector = DefaultSelector() selector.register(sock1, EVENT_READ, data=reader1) selector.register(sock2, EVENT_READ, data=reader2) # Wait for something happens while True: for key, possibly in selector.select(): func = key.data func(key.fileobj)
In this code, the loop sends the reader1() and reader2() functions as a callback each time I/O is detected on the corresponding socket. The selector.select() operation itself blocks, waiting for I/O to occur. So, unlike the example above, the CPU won't spin wildly. This I/O approach is the basis of many so-called "asynchronous" frameworks, such as B. asyncio, although you don't usually see the inner workings of the so-called "event loop".
Threads In the last two examples, parallelism required the use of a special run() function to drive the computation. Alternatively, you can use thread scheduling and the thread engine. Think of a thread as a separate task running in your program. Here is an example of code that reads data on two sockets at the same time: import threading def reader1(sock): while (data := sock.recv(8192)): print('reader1 got:', data) def reader2( sock ):
while (data := sock.recv(8192)): print('reader2 got:', data) t1 = thread.Thread(target=reader1, args=[sock1]).start() t2 = thread.Thread(target =reader2, args=[sock2]).start() # Warten, bis Threads beendet sind t1.join() t2.join()
In this program, the functions reader1() and reader2() are executed simultaneously. This is managed by the host operating system. So you don't need to know much about how it works. If a blocking operation occurs in one thread, it does not affect the other thread. Thread scheduling in its entirety is beyond the scope of this book. However, see the threading module section later in this chapter for some additional examples.
Concurrent execution with asyncio The asyncio module provides an alternative concurrency implementation to threads. Internally it is based on an event loop using I/O polling. However, the high-level programming model is similar to threads through the use of special asynchronous functions. Here is an example: import asyncio async def reader1(sock): loop = asyncio.get_event_loop() while (data := await loop.sock_recv(sock, 8192)): print('reader1 got:', data) async def reader2 ( sock): loop = asyncio.get_event_loop() while (data := await loop.sock_recv(sock, 8192)): print('reader2 got:', data) async def main(sock1, sock2):
loop = asyncio.get_event_loop() t1 = loop.create_task(reader1(sock1)) t2 = loop.create_task(reader2(sock2)) # Espera a que terminen las tareas await t1 await t2 ... # Ejecuta asyncio.run( principal (calcetín1, calcetín2))
Full details on using asyncio would require a separate book. The main thing you should know is that many libraries and frameworks offer support for "asynchronous" operations. Usually this means that concurrent execution is supported via asyncio or a similar module. Much of the code will likely involve asynchronous functions and related features.
Standard library modules A large number of standard library modules are often used in combination with various I/O-related tasks. This section provides a brief overview of commonly used modules and some examples where applicable. The complete reference material can be found online or in an IDE and will not be repeated here. The main goal of this section is to point you in the right direction by giving you the module names to use along with some examples of very common programming tasks that each module encompasses. Many of the examples are presented as interactive Python sessions. These are experiments that we encourage you to try for yourself.
asyncio module The asyncio module provides support for concurrent I/O operations using I/O polling and an underlying event loop. Its primary use is in code that spans networks and distributed systems. Here is an example of a TCP echo server using low-level sockets:
import asyncio from socket import * async def echo_server(direction): loop = asyncio.get_event_loop() sock = socket(AF_INET, SOCK_STREAM) sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) sock.bind(address) sock.listen(5) sock.setblocking(False) print('Servidor escuchando en', dirección) with sock: while True: client, addr = await loop.sock_accept(sock) print('Conexión desde', addr) loop.create_task(echo_client(loop, cliente)) async def echo_client(loop, client): with client: while True: data = await loop.sock_recv(client, 10000) if not data: break await loop.sock_sendall(client, b'Got:' + data) print ('Conexión cerrada') if __name__ == '__main__': loop = asyncio.get_event_loop() loop.create_task(echo_server(loop, ('', 25000))) loop.run_forever()
To test this code, use a program like nc or telnet to connect to port 25000 on your computer. The code should reflect the text you entered. If you connect to multiple terminal windows more than once, you'll find that the code can handle all of the connections at the same time. Most applications that use asyncio probably work at a higher level than sockets. However, you still have to make use of it in such applications
special asynchronous functions and interact in some way with the underlying event loop.
binascii module The binascii module has functions to convert binary data to various text-based representations such as hexadecimal and base64. For example: >>> binascii.b2a_hex(b'hello') b'68656c6c6f' >>> binascii.a2b_hex(_) b'hello' >>> binascii.b2a_base64(b'hello') b'aGVsbG8=\n ' >>> binascii.a2b_base64(_) b'Hello' >>>
Similar functionality can be found in the base64 module and in the byte methods hex() and fromhex(). For example: >>> a = b'hello' >>> a.hex() '68656c6c6f' >>> bytes.fromhex(_) b'hello' >>> import base64 >>> base64.b64encode(a) b'aGVsbG8=' >>>
CGI Module So let's say you just want to place a simple form on your website. Maybe it's a sign-up form for your weekly Cats and Categories newsletter. Sure, you could install the latest web framework and spend all your time tinkering with it. However, you could probably write a simple, old-school style CGI script. The CGI engine is designed to do just that.
Suppose you have the following form snippet on a webpage:
To register, please provide a contact name and email address.
Here is a CGI script that receives the form data on the other end. #!/usr/bin/env python import cgi try: form = cgi.FieldStorage() name = form.getvalue('name') email = form.getvalue('email') # Validate answers and do whatever. .. # generate (or redirect) HTML output print("Status: 302 Moved\r") print("Location: https://www.mywebsite.com/thanks.html\r") print("\r" ) except Exception as e: print("Status: 501 Error\r") print("Content-type: text/plain\r") print("\r") print("Some error occurred.\r ")
Will writing a CGI script like this get you a job at an internet startup? Probably not. Will it solve your actual problem? Probably.
configparser module So-called .ini files are a common format for encoding program configuration information in a human-readable form. Here's an example: # config.ini; A comment [section1] name1 = value1 name2 = value2 [section2] ; Alternative syntax name1: value1 name2: value2
The configparser module is used to read .ini files and extract values. Here is a simple example: import configparser # create a config parser and read a cfg file = configparser.ConfigParser() cfg.read('conig.ini') # extract values to = cfg.get('section1', 'name1 ') b = cfg.get('Section2', 'Name2') ...
A variety of advanced features are available, including string interpolation features, the ability to merge multiple .ini files, provide default values, and more. See the official documentation for more examples.
The csv module is used to read/write CSV (Comma-Separated Value) files, such as are often created by programs such as Microsoft Excel or exports from a database. To use it, open a file and then wrap it in an extra layer of CSV encoding/decoding. Example: import csv # Reading a CSV file into a list of tuples def read_csv_data(filename): with open(filename) as file: rows = csv.reader(file) # The first row is usually a header. This is read from headers = next(rows) # Now read in the rest of the data for row in rows: # Do something with row... # Write Python data to a CSV file def write_csv_data(filename, headers, rows) : with open(filename, "w") as file: out = csv.writer(file) out.writerow(headers) out.writerows(rows)
A commonly used convenience feature is to use DictReader() instead. This interprets the first line of a CSV file as a header and returns each line as a dictionary instead of a tuple. import csv def find_nearby(filename): with open(filename) as file: rows = csv.DictReader(file) for row in rows: lat = float(rows['latitude']) lon = float(rows[ 'length'] ) if close enough (lat, lon): print (row)
The csv module doesn't do much more with the CSV data than reading or writing it. The main advantage is that the module knows how to encode/decode the data correctly, including handling many edge cases with quotes, special characters and other details. So this is a module that allows you to write simple scripts to clean data or prepare it for use with other programs. If you want to perform multiple data analysis tasks with CSV data, you should consider using a third-party package like the popular Pandas library.
module errno Whenever a system-level error occurs, Python reports it with an exception, which is a subclass of OSError. Some of the more common types of system errors are represented by separate subclasses of OSError (e.g. PermissionError, FileNotFoundError, etc.). However, there are hundreds of other errors that can occur in practice. For these, each OSError exception carries an errno-numeric attribute that can be examined. The errno module provides symbol constants corresponding to these error codes. They are often used when writing highly specialized exception handlers. For example, here is an exception handler that checks if a device is out of space: import errno def write_data(file, data): try: file.write(data) except OSError as e: if e.errno = = errno.ENOSPC : print("Out of disk space!" else: raise # Some other error. Propagate
fcntl module The fcntl module is used to perform low-level I/O control operations on Unix using the fcntl() and ioctl() system calls. This is also the module for
Use it when you want to do some sort of file locking, a problem sometimes associated with concurrency and distributed systems. Here is an example how to open a file in combination with mutex locking on all processes with fcntl.flock(): import fcntl as file with open("somefile", "r"): try: fcntl.flock( file .fileno ( ), fcntl.LOCK_EX) # Use the file ... final: fcntl.flock(file.fileno(), fcntl.LOCK_UN)
Hashlib Module The Hashlib module provides functions to calculate cryptographic hashes such as MD5, SHA-1, etc. The following example illustrates the use of the module: >>> h = hashlib.new('sha256') >>> h.update( b'Hello') # Feed data >>> h.update(b'World') > >> h.summary() b'\xa5\x91\xa6\xd4\x0b\xf4 @J\x01\x173\xcf \xb7\xb1\x90\xd6,e\xbf\x0b\xcd\xa3+W \ xb2w\xd9\xad\x9 f\x14n >>> h.hexdigest() 'a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f>14 >h.size.size' >>
http package The http package contains a large amount of code related to the low-level implementation of the Internet protocol HTTP. It can be used to implement
Both servers and clients. However, most of this package is considered outdated and too low for daily work. Serious programmers working with HTTP are more likely to use third-party libraries like Requests, httpx, Django, Flask, etc. However, a useful easter egg of the http package is Python's ability to run a standalone web server. Change to a directory with a collection of files and type: bash $ python -m http.server HTTP server on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
At this point Python will send the files to your browser if you point it to the correct port. You wouldn't use this to run a website, but it can be useful for testing and debugging web-related programs. For example, the author has used this to locally test programs that contain a mix of HTML, Javascript, and WebAssembly.
io Module The io module mainly contains the class definitions used to implement the file objects returned by the open() function. It is not common to access these classes directly. However, the module also contains some classes that are useful to "fake" a file as strings and bytes. This can be useful in the context of testing and other applications where you need to provide a "file" but got data in some other way. The StringIO() class provides a file-like interface to strings. For example, you can write the output to a string: # function expects a file def Greeting(file): file.write('Hello\n') file.write('World\n') # Call the function with a current file with open('out.txt', 'w') as file: Greeting(file)
# Call the function with a "fake" file import io file = io.StringIO() Greeting(file) # Get the resulting output output = file.getvalue()
Similarly, you can create a StringIO object and use it for reading: file = io.StringIO('hello\nworld\n') while (line := file.readline()): print(line, end='' )
The BytesIO() class has a similar function but is used to emulate binary I/O with bytes.
json module The json module can be used to encode and decode data in JSON format, which is commonly used in web applications and microservices APIs. There are two basic functions, dumps() and load(), to convert data. dumps() takes a Python dictionary and encodes it as a JSON Unicode string: >>> import json >>> data = { 'name': 'Mary A. Python', 'email': '[email protected]' } >>> s = json.dumps(data) >>> s '{"name": "Mary A. Python", "email": "[email protected]"}' >>>
The loads() function goes the other way: >>> d = json.loads(s) >>> d == data True >>>
Both the dumps() and load() functions have a significant number of options for controlling and interacting with aspects of the conversion
Instances of the Python class. That is beyond the scope of this section, but much information is available in the official documentation.
Logging Engine The logging engine is the de facto standard engine used for reporting program diagnostics and for print-style debugging. It can be used to direct output to a log file and offers a large number of configuration options. A common practice is to write code that instantiates Logger and prints messages like this: Import Logging Log = Logging.getLogger(__name__) # Function Using Logging Def Func(Arguments): log.debug("A debugging message" ) log.info("An informational message") log.warning("A warning message") log.error("An error message") log.critical("A critical message") # Log configuration (one occurs at program startup) if __name__ = = '__main__': Logging.basicConfig( level=logging.WARNING, filename="output.log" )
There are five built-in logging levels, ranked in order of increasing severity. When configuring the logging system, you specify a level that acts as a filter. Only reports of this level or higher severity are reported. The log offers a large number of configuration options, mainly related to the backend's handling of log messages. Normally you don't need to know this when writing your application code: use debug(), info(), warning() and similar methods on any Logger instance. Any special configuration takes place in a special place (ie, in a main() function or in a main code block) during program startup.
os module The os module provides a portable interface to common operating system functions, typically associated with the process environment, files, directories, permissions, and so on. The programming interface is closely based on C programming and standards such as POSIX. In practice most of this modulus is probably too low to be used directly in a typical application. However, if you're ever faced with the problem of performing an obscure, low-level system operation (e.g. "How do I open a tty?"), you'll most likely find the functionality here.
os.path Module The os.path module is a legacy module for manipulating path names and performing common operations on the file system. Its functionality has largely been superseded by the newer pathlib module, but because it's still so widely used, you'll still see it in a lot of code. A fundamental problem solved by this module is the portable handling of path separators on Unix (forward slash /) and Windows (backslash \). Functions like os.path.join() and os.path.split() are often used to split and rejoin filenames: >>> filename = '/Users/beazley/Desktop/old/data.csv' >> > os.path.split() ('/Users/beazley/Desktop/old', 'data.csv') >>> os.path.join('/Users/beazley/Desktop', 'out.txt') '/Users/beazley/Desktop/away.txt' >>>
Here is a code sample using these functions: import os.path def clean_line(line): # Align a line (whatever) return line.strip().upper() + '\n'
def clean_data(filename): dirname, basename = os.path.split() newname = os.path.join(dirname, basename+'.clean') with open(newname, 'w') as out_f: with open(filename, 'r') como in_f: für Zeile in in_f: out_f.write(clean_line(line))
The os.path module also has a number of functions like isfile(), isdir() and getsize() for testing the file system and getting file metadata. For example, this function returns the total size, in bytes, of a single file or all files in a directory: import os.path def compute_usage(filename): if os.path.isfile(filename): return os .path. getsize(filename) elif os.path.isdir(filename): return sum(compute_usage(os.path.join(filename, name)) for name in os.listdir(filename)) else: runtimeError('File type not allowed' )
pathlib module The pathlib module is the modern way to manipulate pathnames in a high-level and portable way. It brings together a variety of file-oriented features in one place and uses an object-oriented interface. The main object is the Path class. For example: from pathlib import Path filename = Path('/Users/beazley/old/data.csv')
Once you have a p instance of Path, you can perform various operations on it to manipulate the filename. For example: >>> filename.name 'data.csv'
>>> filename.parent path('/users/beazley/old') >>> filename.parent/'newfile.csv' path('/users/beazley/old/newfile.csv') >>> filename.parts ('/', 'Users', 'beazley', 'old', 'data.csv') >>> filename.with_suffix('.csv.clean') Path('/Users/beazley/old/data .csv .clean') >>>
Instances also have file metadata retrieval, directory listing retrieval, and other similar capabilities. Here's a re-implementation of the compute_usage() function from the previous section. street
import pathlib def compute_usage(filename): pathname = pathlib.Path(filename) if pathname.is_file(): return pathname.stat().st_size elif pathname.is_dir(): return sum(path.stat().st_size for path en pathname.rglob('*') if path.is_file()) return pathname.stat().st_size else: raise RuntimeError('Tipo de archivo no compatible')
re module The re module is used to perform text matching, search, and replace operations using regular expressions. Here is a simple example: >>> text = 'Today is 03/27/2018. Tomorrow is March 28th, 2018.' >>> # Find all occurrences of a date >>> import re >>> re.findall(r'\d+/\d+/\d+', text) ['3/27/2018', '3/28 / 2018 ']
>>> # Replace all occurrences of a date with replacement text >>> re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2 ', text) 'Today is March 27th, 2018. Tomorrow is 2018-3-28.' >>>
Shutil Module The Shutil module is used to perform some common tasks that you might otherwise perform in the shell. This includes copying and deleting files, working with archives, etc. For example, to copy a file, you could use: import shutdown shutdown.copy(srcfile, dstfile)
The Shutil module is widely used as a safer and more portable alternative to executing shell commands directly with the os.system() function.
Select module The select module is used for simple polling of multiple I/O streams. That is, it can be used to monitor a collection of file descriptors for incoming data or for outgoing data capacity. The following example shows a typical usage: import select # Collections of objects representing file descriptors.
# Integers or objects with a fileno() method. want_read = [ ... ] want_write = [ ... ] check_exceptions = [ ... ] # Timeout (or none) timeout = None # I/O probe can_read, can_write, have_exceptions = \ select.select (want_to_read, want_to_write , check_exceptions, timeout) # Do I/O on files in can_read: do_read(file) on files in can_write: do_write(file) # Handle exceptions on files in have_exceptions: handle_exception(file)
In this code, three sets of file descriptors are constructed. These sets correspond to reading, writing, and exceptions. These are passed to select() along with an optional timeout. select() returns three subsets of the passed arguments. These subsets represent the files on which the requested operation can be performed. For example, a file returned in can_read() has incoming data pending. The select() function is a standard, low-level system call that is commonly used to look for system events and to implement asynchronous I/O frameworks such as B. the integrated asyncio module. In addition to select(), the select module also provides poll(), epoll(), kqueue(), and similar variant functions that provide similar functionality. The availability of these features varies by operating system. The selectors module provides a top-level interface for selecting, which can be useful in certain contexts. An example was given earlier (p. x).
smtp module The smtp module implements the client side of SMTP, which is commonly used to send email messages. A common use of the module is to write a script that does just that: send someone an email. Here is an example: import smtplib fromaddr = "[email protected]" toaddrs = ["[email protected]" ] Amount = 123.45 msg = f"""From: {fromaddr}\r \r Paying {amount} bitcoin or other. We see.\r """ server = smtplib.SMTP('localhost') serv.sendmail(fromaddr, toaddrs, msg) serv.quit()
There are additional features to handle passwords, authentication and other matters. However, if you are running a script on a computer and that computer is configured to support email, the example above will usually suffice.
Socket Module The socket module provides low-level access to network programming functions. The interface is modeled after the standard BSD socket interface commonly associated with C system programming. The following example shows how to make an outgoing connection and receive a response: from socket import socket, AF_INET, SOCK_STREAM sock = socket(AF_INET, SOCK_STREAM ) socket. connect(('python.org', 80)) sock.send(b 'GET /index.html HTTP/1.0\r\n\r\n') parts = []
while True: part = sock.recv(10000) if not part: break parts.append(part) response = b''.join(response) print(response)
The following example shows a simple echo server that accepts client connections and sends back any data it receives. To test this server, run it and then connect using a command like telnet localhost 25000 or nc localhost 25000 in a separate terminal session. from socket import socket, AF_INET, SOCK_STREAM def echo_server(address): sock = socket(AF_INET, SOCK_STREAM) sock.bind(address) sock.listen(1) while True: client, addr = sock.accept() echo_handler(client, addr) def echo_handler(client, addr): print('Connection from:', addr) with client: while True: data = client.recv(10000) if not data: break client.sendall(data) print('Connection closed ') if __name__ == '__main__': echo_server(('', 25000))
There is no connection process for UDP servers. However, a server must still bind the socket to a known address. This is a typical example of what a UDP server/client looks like:
# udp.py from socket import socket, AF_INET, SOCK_DGRAM def run_server(address): sock = socket(AF_INET, SOCK_DGRAM) sock.bind(address) address/port while True: msg, addr = sock.recvfrom(2000) # . .. hacer algo respuesta = b'world' sock.sendto(response, addr) back def run_client(address): sock = socket(AF_INET, SOCK_DGRAM) sock.sendto(b'hello', address) response, addr = sock. recvfrom(2000) print("Recibido:", Antwort) sock.close()
#1. Create a UDP socket #2. Link to
#3. Get a message
#4. Send a reply
#1. Create a UDP socket #2. Send a message #3. Get a reply
if __name__ == '__main__': import sys if len(sys.argv) != 4: raise SystemExit('Usage: udp.py [-client|-server] hostname port') dirección = (sys.argv[2] , int(sys.argv[3])) if sys.argv[1] == '-server': run_server(direction) elif sys.argv[1] == '-client': run_client(direction)
struct module The struct module is used to convert data between Python and binary data structures (represented as Python byte strings). These data structures are
Commonly used when interacting with functions written in C, binary file formats, network protocols, or binary communications over serial ports. As an example, suppose you need to construct a binary message with the following binary format, as described by a C data structure: # message format: all values are big endian struct Message { unsigned short msgstr; // 16-bit unsigned integer string unsigned int; // 32-bit sequence number float x; // 32-bit float float y; // 32-bit float }
So geht das mit dem struct-Modul: >>> import struct >>> data = struct.pack('>HIff', 123, 456, 1.23, 4.56) >>> data b'\x00{\x00\ x00 \x00-?\x9dp\[email protected]\x91\xeb\x85' >>>
To decode binary data, use struct.unpack: >>> struct.unpack('>HIff', data) (123, 456, 1.2300000190734863, 4.559999942779541) >>>
Note: Rare precision problems with floating point values are due to the loss of precision that occurs when converting to 32-bit values. Python represents floating-point values as 64-bit double-precision values.
Threads Module The Threads module is used to run separate programs as threads, but with control over the execution environment, including I/O handling, termination, etc. There are two general uses for the module. If you run a separate program and want to collect all of its output at once, use check_output(). For example:
import subprocess # Ejecute el comando 'netstat -a' y recopile su salida try: out = subprocess.check_output(['netstat', '-a']) außer subprocess.CalledProcessError as e: print("Fallo:", e )
The data returned by check_output() is represented as bytes. If you want to convert it to text, make sure you apply correct decoding: text = out.decode('utf-8')
It is also possible to configure a pipeline and interact with a thread in more detail. To do this, use the Popen class like this: import subprocess p = subprocess.Popen(['wc'], stdin=subprocess.PIPE, stdout=subprocess.PIPE) # Send data to thread p.stdin.write(b ' hello World\nthis is a test\n') p.stdin.close() # read back the data = p.stdout.read() print(out)
A Popen p instance has stdin and stdout attributes that can be used to communicate with the thread.
tempfile module The tempfile module supports the creation of temporary files and directories. Here's an example of creating a temporary file: import tempfile
con tempfile.TemporaryFile() como f: f.write(b'Hello World') f.seek(0) data = f.read() print('Got:', data)
By default, temporary files are opened in binary mode, allowing both reading and writing. The with statement is also commonly used to define a scope for using the file. At the end the file will be deleted. If you want to create a temporary directory, use: with tempfile.TemporaryDirectory() as the directory name: # Use the directory directorydirectory...
As with a file, the directory and all of its contents are removed at the end of the with block.
Text wrapping module The text wrapping module can be used to format text to fit a specific terminal width. Maybe it's a bit of a special purpose, but sometimes it can be useful to clean up the text for the output when generating reports. There are two interesting features. takes text and wraps it to fit a given column width. The function returns a list of strings. For example: Wrap()
import textwrap text = """look into my eyes look into my eyes the eyes the eyes the eyes don't look around the eyes don't look around the eyes look into my eyes you're under """ wraped = textwrap.wrap(text , width=81)
print('\n'.join(wrapped)) # creates: # look into my eyes look into my eyes the # eyes the eyes the eyes don't look around the # eyes don't look around the eyes look # in my eyes you' you are low
The indent() function can be used to indent a block of text. For example: print(textwrap.indent(text, ' ')) # produces: # look me in the eye # look me in the eye # the eyes the eyes the eyes # don't look around the eyes # don't look around the eyes # look in my eyes you are down
Thread Engine The thread engine is used to run code concurrently. This problem often occurs with I/O processing in network programs. Thread scheduling is a broad subject, but the following examples illustrate solutions to common problems. Here's an example of how to start and wait for a thread: import threading import time def countdown(n): while n > 0: print('T-minus', n) n -= 1 time.sleep(1) t = threading .Thread(target=countdown, args=[10]) t.start() t.join() # Waiting for the thread to end
If you never wait for the thread to finish, make it daemon by providing an additional daemon flag like this: t = threading.Thread(target=countdown, args=[10], daemon=True)
If you want to terminate a thread, you must do so explicitly with a flag or variable intended for that purpose. The thread needs to be programmed to check this. import threading import time must_stop = False def countdown(n): while n > 0 and not must_stop: print('T-minus', n) n -= 1 time.sleep(1)
Si los subprocesos van a mutar los datos compartidos, protéjalos con un bloqueo. import threading class Counter: def __init__(self): self.value = 0 self.lock = threading.Lock() def increment(self): with self.lock: self.value += 1 def decrement(self): with self .lock: auto.valor -= 1
When a thread needs to wait for another thread to do something, use an event. Import thread import time
def step1(optional): print('Step 1') time.sleep(5) if.set() def step2(optional): if.wait() print('Step 2') ift = enable.Event() enable .Thread(target=step1, args=[möglich]).start() Threading.Thread(target=step2, args=[möglich]).start()
If you want the threads to communicate, use a queue: import threading import queue import time def producer(q): for i in range(10): print('Producing:', i) q.put(i) print(' Finished ') q.put(None) def Consumer(q): while True: item = q.get() if item is None: break print('Consuming:', item) print('Goodbye') q = queue () thread.Thread(target=producer, args=[q]).start() thread.Thread(target=consumer, args=[q]).start()
Time module The time module is used to access the time-related functions of the system. The following selected functions are the most useful: Sleep (seconds)
Let Python sleep for a specified number of seconds, specified as a floating point. Hour()
Returns the current system time in UTC as a floating point number. This is the number of seconds since the epoch (usually January 1, 1970 for Unix systems). Use localtime() to convert it to an appropriate data structure to extract useful information. local time ([sec])
Returns a struct_time object representing the local time on the system, or the time represented by the floating-point seconds passed as an argument. The resulting structure has the attributes tm_year, tm_mon, tm_mday, tm_hour, tm_min, tm_sec, tm_wday, tm_yday, and tm_isdst. gmtime([secs])
Same as localtime(), except that the resulting structure represents the time in UTC (or Greenwich Mean Time). ctime([s])
Convert a time represented in seconds to a printable text string. Useful for debugging and logging. Ascent time (tm)
Convert a time structure represented by localtime() to a text string suitable for printing. The datetime module is most commonly used to represent dates and times for date-related calculations and time zone handling.
urllib package The urllib package is used to make client-side HTTP requests. Perhaps the most useful function is urllib.request.urlopen(), which can be used to get simple web pages. For example: >>> from urllib.request import urlopen >>> u = urlopen('http://www.python.org') >>> data = u.read() >>>
If you want to encode form parameters you can use urllib.parse.urlencode() as shown here: from urllib.parse import urlencode from urllib.request import urlopen form = { 'name': 'Mary A. Python', 'email ' : '[email protected]' } data = urlencode(form) u = urlopen('http://httpbin.org/post', data.encode('utf-8')) response = u.read()
The urlopen() function works well for simple websites and APIs that involve HTTP or HTTPS. However, it becomes quite cumbersome to use when access also involves cookies, advanced authentication schemes, and other layers. To be honest, most Python programmers use third-party libraries, such as B. requests or httpx to handle these situations. You should too. The urllib.parse subpackage has additional functions to manipulate the URLs themselves. For example, the urlparse() function can be used to parse a URL: >>> url = 'http://httpbin.org/get?name=Dave&n=42' ParseResult(scheme='http', netloc='httpbin .org', path='/get', parameters='', query='name=Dave&n=42 >>>
unicodedata The unicodedata module is used for more advanced operations on Unicode text strings. For example, there are often multiple representations of the same Unicode text. For example, the character U+00F1 (ñ) could be composed entirely as a single character U+00F1, or broken down into a sequence of multiple characters U+006e U+0303 (n, ~). This can cause strange problems in programs that expect strings that visually represent the same thing to actually have the same representation. Consider the following example using dictionary keys: >>> d = {} >>> d['Jalape\xf1o'] = 'hot' >>> d['Jalape\u0303o'] = 'mild' >>> d { 'jalapeño': 'hot', 'jalapeño': 'mild' } >>>
At first glance it looks like it was intended to be an operator error - how can a dictionary have two identical but separate keys? The answer lies in the fact that the keys consist of different sequences of Unicode characters. If consistent handling of identically represented Unicode strings is a problem, they should be normalized. The unicodedata.normalize() function can be used to ensure consistent character representation. For example, unicodedata.normalize('NFC', s) ensures that all characters in s are fully composite and are not represented as a composite sequence of characters. With unicodedata.normalize('NFD' en s are completely decomposed.
The unicodedata module also has functions for testing character properties such as uppercase, numbers, and spaces. General character properties can be retrieved using the unicodedata.category(c) function. For example, unicodedata.category('A') returns 'Lu', which means the character is uppercase. For more information about these values, see the official Unicode Character Database at https://www.unicode.org/ucd/.
xml package The xml package is a large collection of modules for processing XML data in various ways. However, if your main goal is to read an XML document and extract information from it, the easiest way to use it is to use the xml.etree subpackage. Suppose you have an XML document in a Recipe.xml file like the following:
Large chopped avocados Chopped tomato Chopped white onion Freshly squeezed lemon juice
Combine all ingredients and mix by hand until desired consistency is achieved. Serve with an ice cold beer and enjoy.
To extract specific elements from it: from xml.etree.ElementTree import ElementTree doc = ElementTree(file="recipe.xml") title = doc.find('title')
print(title.text) # fallback (get just the text of the item) print(doc.findtext('description')) # iterate over multiple items for the item in doc.findall('ingredients/item'): num = item .get( 'num') units = item.get('units', '') text = item.text.strip() print(f'{num} {units} {text}')
Final Words I/O is a fundamental part of writing a useful program. Given its popularity, Python can work with literally any data format, encoding, or document structure used. Although the standard library may not support this, you can almost certainly find a third-party module to solve the problem for you. When thinking about the "big picture," it can be more helpful to think about the edges of your application. On the outer boundary between your program and reality, problems related to data encoding often arise. This is especially true for text and Unicode data. Much of the complexity of Python's I/O handling is aimed precisely at this specific problem (ie, supporting different coding policies, error handling, etc.). It is also important to note that text data and binary data are strictly separated into different objects. So if you know what you're working with, you can better understand the bigger picture. A secondary consideration with I/O is the overall scoring model. Python code is currently divided into two worlds: regular synchronous code and asynchronous code, as often associated with the asyncio module (characterized by the use of asynchronous functions and async/await syntax). Asynchronous code, which is often promoted, almost always requires the use of dedicated libraries that can work in that environment. This in turn usually forces you to write your application code in the "asynchronous" style as well. In all honesty, you should probably avoid asynchronous coding
unless you absolutely know you need it, and unless you're really sure, you almost certainly don't. Best tuned Python speaking universe code in a normal synchronous style that is much easier to reason, debug and test. You should choose that.
Built-in Functions and Standard Library This chapter is a compact reference to Python's built-in functions. These functions are always available without the need for an import statement. The chapter ends with a short description of some useful standard library modules.
Returns the absolute value of x. Everyone
Returns True if all values of the iterables evaluate to True. Returns True if s is empty. any
Returns True if any of the iterable values evaluate to True. Returns False if s is empty. asci(x)
Creates a printable representation of object x like repr(), but uses only ASCII characters in the result. Non-ASCII characters are converted to proper escape sequences. This can be used to display unicode strings in a terminal or shell that doesn't support unicode. trash can(x)
Returns a string containing the binary representation of the integer x. boolean([x])
A type that represents the Boolean values True and False. When used to convert x, returns True if x evaluates to true using standard truth test semantics (i.e. non-zero number, non-empty list, etc.). Otherwise False is returned. False is also the default value returned when bool() is called with no arguments. The bool class inherits from int so that the boolean values True and False can be used as integers with values 1 and 0 in mathematical calculations. breakpoint()
Sets a manual debugger breakpoint. If found, control is passed to pdb, the Python debugger. bytearray([x])
A type that represents a mutable array of bytes. When instantiated, x can be an iterable sequence of integers ranging from 0 to 255, an 8-bit string or byte literal, or an integer specifying the size of the byte array (in which case each entry is 0 initialized). byte array(s, encoding)
An alternate calling convention for creating a bytearray instance from characters in a string s, where encoding specifies the character encoding to use in the conversion. bytes([x])
A type that represents an immutable array of bytes. byte(s, encoding)
An alternate calling convention for creating bytes from a string s, where encoding specifies the encoding to use in the conversion. Table 1 shows the operations supported by both bytes and byte arrays. Table 1: Operations on bytes and byte arrays
Byte arrays also support the methods in Table 2. Table 2: Additional operations on byte arrays
Returns True if obj is callable as a function. cr(x)
Converts the integer x, representing a Unicode code point, to a single string. class method (function)
This decorator creates a class method for the func function. It is normally only used within class definitions, where it is called implicitly with @classmethod. Unlike a normal method, a class method takes the class as its first argument, not an instance. compile (string, filename, type)
Compiles a String into a Code object for use with exec() or eval(). string is a character string containing valid Python code. If this code spans multiple lines, the lines must end with a single newline ('\n') and not with platform-specific variants (e.g. '\r\n' on Windows). filename is a string containing the name of the file in which the string was defined (if any). kind is 'exec' for a sequence of statements, 'eval' for a single expression, or 'single' for a single executable statement. The resulting code object returned can be passed directly to exec() or eval() instead of a string. Complex([Real[, Image]])
A type that represents a complex number with real and imaginary, real and imaginary components that can be supplied as any numeric type. If imag is omitted, the imaginary part is set to zero. If real is passed as a string, the string is parsed and converted to a complex number. In this case, image should be omitted. If real is another object type, the value of real.__complex__() is returned. If no arguments are given, 0j is returned. Table 3 shows the methods and attributes of the complex. Table 3: Complex attributes
Removes an attribute from an object. attr is a string. Same as them
Object.attr.
A type representing a dictionary. If no argument is given, an empty dictionary is returned. If m is an association object (such as another dictionary), a new dictionary is returned that has the same keys and values as m. For example, if m is a dictionary, dict(m) creates a shallow copy of it. If m is not an assignment, it must support iterations in which a sequence of (key, value)
pairs arise. These pairs are used to complete the dictionary. dict() can also be called with keyword arguments. For example, dict(foo=3, bar=7) creates the dictionary { foo : 3, bar : 7 }. Table 4 shows the operations supported by dictionaries. Table 4: Operations on dictionaries
Returns an ordered list of attribute names. If the object is a module, it contains the list of symbols defined in that module. If object is an object of type or class, it returns a list of attribute names. Names are usually taken from the object's __dict__ attribute, if defined, but other sources can be used. If no argument is given, the names of the current local symbol table are returned. It should be noted that this function is mainly used for informational purposes (e.g. it is used interactively on the command line). It should not be used for formal program analysis as the information obtained may be incomplete. In addition, custom classes can define a special method __dir__() that changes the result of this function. divmod(a, b)
Returns the quotient and the remainder of the long division as a tuple. For integers, the value (a // b, a % b) is returned. For floats, (math.floor(a / b), a % b) is returned. This function cannot be called with complex numbers. enumerate (iter, start=0)
Given an iterable object, iter returns a new iterator (of type enumerate) that produces tuples containing a count and the value produced by iter. For example, if iter produces a, b, c, then enumerate(iter) produces (0,a), (1,b), (2,c). The optional start changes the initial value of the account. eval(expr [, global [, local]])
Evaluate an expression. expr is a string or a code object created by compile(). global and local are map objects that define the global and local namespaces, respectively, for the operation. If omitted, the expression is evaluated against the values of globals() and locals() as executed in the caller's environment. It is more common for global and local words to be specified as dictionaries, but advanced applications can provide custom mapping objects. exec(code [, global [, locals]])
Executing Python statements. code is a string, bytes, or code object created by compile(). global and local define the global and local namespaces,
or for the operation. If omitted, the code is executed with the values of globals() and locals() as they are executed in the caller's environment. filter(function, iterable)
Creates an iterator that returns the elements in Iterable for which Function(Element) evaluates to True. float([x])
A type that represents a floating-point number. If x is a number, it becomes a float. If x is a string, it is parsed into a float. x.__float__() is called for all object objects. If no argument is given, 0,0 is returned. Table 5 shows the methods and attributes of the floats. Table 5: Methods and attributes of floats
Converts the value to a string formatted according to the format specification string in format_spec. This operation calls value.__format__(), which is
You are free to interpret the format specification as you see fit. For simple data types, the format specifier usually includes an alignment character of '' or 'ˆ', a number (denoting the width of the field), and a character code of 'd', 'f', or 's' for integer, floating point, or string values . For example, a format specification of 'd' formats an integer, a specification of '8d' right-justifies an integer in an 8-character field, and '