Advanced Log Filtering
Querying the logs
Understanding field references
The log tables are queried with subset of BigQuery SQL syntax. They all have three columns: event_message
, timestamp
, and metadata
.
column | description |
---|---|
timestamp | time event was recorded |
event_message | the log's message |
metadata | information about the event |
The metadata
column is an array of JSON objects that stores important details about each recorded event. For example, in the postgres table, the metadata.parsed.error_severity
field indicates the error level of an event. To work with its values, you need to unnest
them using a cross join
.
This approach is commonly used with JSON and array columns, so it might look a bit unfamiliar if you're not used to working with these data types.
_10select_10 event_message,_10 parsed.error_severity,_10 parsed.user_name_10from_10 postgres_logs_10 -- extract first layer_10 cross join unnest(postgres_logs.metadata) as metadata_10 -- extract second layer_10 cross join unnest(metadata.parsed) as parsed;
Expanding results
Logs returned by queries may be difficult to read in table format. A row can be double-clicked to expand the results into more readable JSON:
Filtering with regular expressions
The Logs use BigQuery Style regular expressions with the regexp_contains function. In its most basic form, it will check if a string is present in a specified column.
_10select_10 cast(timestamp as datetime) as timestamp,_10 event_message,_10 metadata_10from postgres_logs_10where regexp_contains(event_message, 'is present');
There are multiple operators that you should consider using:
Find messages that start with a phrase
^
only looks for values at the start of a string
_10-- find only messages that start with connection_10regexp_contains(event_message, '^connection')
Find messages that end with a phrase:
$
only looks for values at the end of the string
_10-- find only messages that ends with port=12345_10regexp_contains(event_message, '$port=12345')
Ignore case sensitivity:
(?i)
ignores capitalization for all proceeding characters
_10-- find all event_messages with the word "connection"_10regexp_contains(event_message, '(?i)COnnecTion')
Wildcards:
.
can represent any string of characters
_10-- find event_messages like "hello<anything>world"_10regexp_contains(event_message, 'hello.world')
Alphanumeric ranges:
[1-9a-zA-Z]
finds any strings with only numbers and letters
_10-- find event_messages that contain a number between 1 and 5 (inclusive)_10regexp_contains(event_message, '[1-5]')
Repeated values:
x*
zero or more x
x+
one or more x
x?
zero or one x
x{4,}
four or more x
x{3}
exactly 3 x
_10-- find event_messages that contains any sequence of 3 digits_10regexp_contains(event_message, '[0-9]{3}')
Escaping reserved characters:
\.
interpreted as period .
instead of as a wildcard
_10-- escapes ._10regexp_contains(event_message, 'hello world\.')
or
statements:
x|y
any string with x
or y
present
_10-- find event_messages that have the word 'started' followed by either the word "host" or "authenticated"_10regexp_contains(event_message, 'started host|authenticated')
and/or/not statements in SQL:
and
, or
, and not
are all native terms in SQL and can be used in conjunction with regular expressions to filter results
_10select_10 cast(timestamp as datetime) as timestamp,_10 event_message,_10 metadata_10from postgres_logs_10where_10 (regexp_contains(event_message, 'connection') and regexp_contains(event_message, 'host'))_10 or not regexp_contains(event_message, 'received');
Filtering and unnesting example
Filter for Postgres
_12select_12 cast(postgres_logs.timestamp as datetime) as timestamp,_12 parsed.error_severity,_12 parsed.user_name,_12 event_message_12from_12 postgres_logs_12 cross join unnest(metadata) as metadata_12 cross join unnest(metadata.parsed) as parsed_12where regexp_contains(parsed.error_severity, 'ERROR|FATAL|PANIC')_12order by timestamp desc_12limit 100;
Limitations
Log tables cannot be joined together
Each product table operates independently without the ability to join with other log tables. This may change in the future.
The with
keyword and subqueries are not supported
The parser does not yet support with
and subquery statements.
The ilike
and similar to
keywords are not supported
Although like
and other comparison operators can be used, ilike
and similar to
are incompatible with BigQuery's variant of SQL. regexp_contains
can be used as an alternative.
The wildcard operator *
to select columns is not supported
The log parser is not able to parse the *
operator for column selection. Instead, you can access all fields from the metadata
column:
_10select_10 cast(postgres_logs.timestamp as datetime) as timestamp,_10 event_message,_10 metadata_10from_10 <log_table_name>_10order by timestamp desc_10limit 100;