Job Queues¶
Warning
This feature is in early development stage and not ready for production.
“Job Queue” is a convenient tool to run queries in background. You can keep working on your interactive session while running one or more long-running queries asynchronously. You will get notified by HipChat or Slack as soon as your queries have completed (Notifications).
Submitting Queries¶
import pandas_td as td
# Create a queue with name (used for notifications)
q1 = td.create_queue(name='q1')
# Create a query engine as usual
engine = td.create_engine('hive:sample_datasets')
# Run query in the queue
t1 = q1.submit_query('SELECT ... FROM www_access', engine)
# Query result can be retrieved later as DataFrame
df = t1.result()
Magic Functions¶
In [1]: %%td_hive sample_datasets -a q1
...: select count(1) cnt from www_access
...:
Queued as q1[0]
In [2]: q1[0].result()
Out[2]:
cnt
0 5000
A convenient way to retrieve the result is to use -o
along with -a
:
In [3]: %%td_hive sample_datasets -a q1 -o df1
...: select count(1) cnt from www_access
...:
Queued as q1[1]
In [4]: df1 # the value will be stored when your job has finished
Out[4]:
cnt
0 5000